Optical character recognition in PDF

Optical character recognition allows converting images containing text to editable PDF text format, which supports document text search, copying, edition and all other PDF text functionality. Text recognition can be performed only if it is not locked in PDF document permissions.

To use optical character recognition, choose Document -> OCR menu item. Set the following parameters in the dialogue window:

OCR settings

  • Page Range Set pages where optical character recognition must be performed.
  • Languages Set language(s) of recognized text. In order to optimize text recognition quality, it is best to choose minimal number of languages.

If text recognition is used for the first time, the languages list will be empty. To add languages, press Install languages button.

  • Install languages  Check marks to choose required languages. The following window lists languages, which recognition is supported in Master PDF Editor.

OCR language settings

  • Font Family Choose font family, which will be used in the document after the text is recognized. When choosing auto, the application will choose the most appropriate font family for the current document.
  • Searchable Text If this option is chosen, recognized text will be available for search and copying only. It will be inserted into the document as an invisible layer under its image.
  • Editable Text With this option, recognized text will be available for editing. The text will be inserted in front of the image that contains it. The image itself will be covered with background color.

There are Advanced settings in the lower part of the OCR Engine window.

OCR advanced settings

  • Deskew  Straighten and deskew all the content on the page automatically. Also, a scanned document content can be also deskewed.
  • Minimal confidence level  A numerical value indicating the degree to which the engine is certain that it has recognized the component correctly.
  • Force manual text editing if confidence level not achieved  If this option is chosen, a dialogue window for text edition will be opened during text recognition. It will display:
  • Original  A piece of image with text. Automatically recognized text corresponding to the image. The dialogue window will successively show each part of the PDF document image with corresponding recognized text. This allows to edit text before inserting it into the document.
  • Text  Automatically recognized text corresponding to the image.

OCR Recognized text

  • Yes  Automatically recognized/edited text will be inserted into document. The dialogue window will display next image and corresponding text.
  • Yes to All  All images will be automatically recognized and written into the document. This dialogue window won“t appear again during this recognition.
  • Not Text  The image does not contain text. Cancel text insertion for current image.
  • Cancel  Cancel text recognition.

 

Read more about Master PDF Editor