OCR Workflow Configuration

OCR Settings – General Tab

The OCR Workflow configuration options specify the precise behavior of the OCR module as it executes through the workflow of the chosen document type. The OCR module also is used to convert image files from TIFF to PDF (Image Only) format without performing OCR; in short behaving like a PDF maker..

Engine Options

Enable image pre-processing (Deskew, Despeckle, etc.)

If image processing has not been performed in earlier steps, enable to enhance image quality and potentially improve OCR accuracy.

Enable auto rotation

Attempt to right images to potentially improve OCR accuracy, if not already done so in earlier steps.

 

 

 

 

User Dictionary

Enabling this option allows the user to add custom words to user’s own dictionary. This may be helpful when performing OCR on specialized documents such as medical documents.

Click “Setup” to add words.

 

 

 

 

 

 

 

 

 

 

 

Output Options

Output Type

Adobe PDF (Image Only)

Converts tif images to PDF without performing OCR.

Adobe PDF (Image with Hidden Text)

Performs OCR then stores the OCR text as hidden text within the PDF file.

Text

Performs OCR but outputs only the OCR result in a text file.

OCR File Tag

Enter a tag to associate with the OCR output file.

Output OCR as single page

Selecting this option produces each image as single page PDF; otherwise the output is a multipage file.

Include Folder Separators in Output

If data is included on the Folder Separator which is important to the user during Quality Assurance or Index but is NOT desired to be left in the output viewed by the end user; de-selecting this option will remove the Folder Separator sheet in memory before outputting the file.

Include Document Separators in Output

If data is included on the Document Separator which is important to the user during Quality Assurance or Index but is NOT desired to be left in the output viewed by the end user; de-selecting this option will remove the Document Separator sheet in memory before outputting the file.

Do not output items marked with Skip flag

Any page/document/folder tagged with a Skip Flag will not be included in the output.


OCR Settings – PDF Tab

PDF File Options

PDF Conformance
  • PDF 1.4
  • PDF 1.5+
  • PDF/A-1b
Compress PDF
Create Linearalized PDF (Fast Web View)
Optimized file size by reducing images resolution

PDF Document Field Options

The standard PDF Document Fields are: Title, Subject, Author and Keywords. The user can select any System, Batch, Folder or Document index field to populate the desired PDF Document Fields inside the created PDF file.

 

 

Available PDF Document Fields:

  • Title
  • Subject
  • Author
  • Keywords
  • Custom Fields
Enable PDF Custom fields

Selecting Setup will launch the following screen.

Move To

This option will select the specified row within PDF Custom Fields setup and is useful when working with a large number of data fields.

 

 

 

 

Select the check box in the Include column to place both the Field Name of that row and its corresponding data value in the Custom Tab of a PDF as shown below.

To the left is an example of custom fields that were included in a PDF document visible from the Document Properties within Adobe Reader.

 

 

 

 

 

 

 

 

 

 

 

 

 

 


OCR Settings – Text Tab

This tab becomes active when the selected output type is set to Text.