Skip to content

Commit fd62724

Browse files
JorjMcKiejamie-lemon
authored andcommitted
update comments
1 parent 7cca8cd commit fd62724

1 file changed

Lines changed: 9 additions & 2 deletions

File tree

docs/pymupdf4llm/ocr-plugins.rst

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -130,14 +130,21 @@ You can override this logic in the following ways:
130130
def my_ocr_function(page, pixmap=None, dpi=300, language="eng"):
131131
# analyze the page content and perform OCR only if necessary
132132
analysis = analyze_page(page)
133+
134+
# inspect the items of the analysis dictionary to make your own
135+
# decision about whether to perform OCR or not, e.g.:
133136
if not analysis["needs_ocr"]:
137+
# accept decision NOT to perform OCR:
134138
return None
135139

136-
# if OCR is recommended, you can decide differently based on your own insights, e.g.
140+
# if OCR is recommended, you can decide differently based on
141+
# your own insights, e.g. we might want to accept previous OCR
142+
# results and skip OCR if there are already text spans created
143+
# from previous OCR executions (render mode 3):
137144
if analysis["reason"] == "ocr_spans":
138-
# we might want to accept previous OCR:
139145
return None
140146

147+
# execute desired OCR engine
141148
rapidocr_api.exec_ocr(page, pixmap=pixmap, dpi=dpi, language=language)
142149
return None
143150

0 commit comments

Comments
 (0)