Classification Workflow

There is quite a bit of planning that needs to go into the configuration of the Classification Module, and below are the three major planning focus areas:

  • Page Validation – When examining forms, you need to decide the type of page validation you will require when processing forms. Page validation in the Classification engine defines separation and page merging functionality.
  • Forms Identification – Currently in PSI:Capture, you can define and classify forms based on OCR match criteria or barcode recognition. This is the most critical planning step, and will ultimately define how pages are classified and documents are created.
  • Data Extraction – the ultimate goal in classification is to identify the correct Form ID, and then extract data based on the assigned Record Type.

Classification Form Definitions

The Form Definition in Classification allows you to define all the characteristics of a form, how to identify or classify, and provides key methods for how PSI:Capture will behave when a classification occurs. In the Classification Module, you can Add, Edit, Copy and Delete Form Definitions, as well as change their order. You can also import your Classification Form Definitions, also known as a taxonomy, through the Import option.

NOTE: In version 5.4.2 and above, we added a button to allow you to view global usage of each form. To avoid confusion we changed the "Check Usage" button to "Doc Type Usage" to differentiate the two buttons. When clicking on "View Usage" the following window will pop up allowing you to run a query by timeframe.

Adding Form Definitions

Clicking the Add button will open the Form Definition dialog. As mentioned, this provides an interface for defining all the characteristics of a form. Within this configuration interface, you have the standard template toolbar which allows you to load or scan a template image, as well as a set of zooming tools.

Form Settings

  • Form ID – The Form ID is the name of the form these characteristics define. Note: This name will be available as a variable, and be placed in a linked index field.
  • Group – The Group allows you to create subsets of forms and currently is purely for organization within the configuration.
  • Record Type – This dropdown will link to the configured Record Types on step 3 of the configuration wizard, and allows the linking of the Form Definition to the chosen Record.
  • Description – allows a user defined description of the form.
  • Page Count – for forms of specified page lengths, this count will be utilized in page validation.
  • Usage Ranking Behavior* - this option allows you to keep the current use ranked position or override usage ranking settings so that the selected form gets process in the beginning or end of the queue.
    • Use Ranked position

    • Override Ranking and process Form at the beginning of the Form list

    • Override Ranking and process Form at the end of the Form list

*Usage Ranking Behavior available in versions 5.4.2 and above.

Rules

Classification Rules

The Classification Rules section of the module provides the ability to input one or more rules that will define your form. Below are the options:

Match – you can choose a positive or negative match for your rule, and combine them to build a series of rules that will define your form. For instance, you may have a form that has “Form OFS 2” on the top, but there are two versions, with different locations for the required data. One form has “Version 2” on the bottom, one does not. You can use a negative rule to make sure the form without Version 2 is properly identified.

Rule Type – currently there are two types of rules, OCR Text and Barcode.

Rule Value – the Rule Value provides an entry point for a regular expression to match either the barcode value or an OCR expression. This will trigger the classification and setting of Record Type.

Rule Match Behavior – If you have multiple rules, this drop down will provide a means to logically combine them to define the overall match. You can either choose to match on the first rule matched, or make the combination of all your rules required.

Note: The order of rules can be used to your advantage as rules are processed in the order of entry.

Last Page Classification Rules

If Last Page Rule processing is enabled and a Form Definition contains Last Page Rules, then when that Form is classified, all other Page Validation and classification is disabled and classification will only search for a matching last page for that form. Once is it is found, all pages up to that page will be added to that Form and classification will switch back to normal processing looking for matches for all defined forms. We will also handle the special case where the first page of a Form is also a last page.

If a Form Definition does not contain Last Page Rules, then the selected option under Page Validation will be used (Loose, Strict, None). This allows users to mix both types of validation in case they aren't able to use Last Page Rules for all of their forms.

Table Extraction-Line Items

Form Qualifiers

This allows classification based on the page orientation or the size of the form. This can be useful as an additional criteria for defining a form, or can be used by itself with no rules to define a form. An example might be when scanning checks and check stubs, you can assign a record type of Check when certain page size criteria are met.

Import

Clicking Import button on Classification Module settings will now display a dialog allowing you to choose which type of import to perform:

Database Import Feature

The Database Import feature is available in version 5.4.1 and above.

Database Import

How to import via database:
  1. Set up the Database Connection. This uses standard dialogs used throughout the product.
  2. Import Definition
    1. Form ID is required
    2. Form ID, Description and Rules all use the standard Build Custom Value dialog to build those values from different database fields/constants.
    3. The other fields are all optional including Rules. Setting up Rules during this step applies them universally across all imported forms. By making Rules optional, it allows the user to come back later and add rules to individual forms.
    4. When defining Rules, users can either use the values from the table as is or run the values through the Regex Builder to generate codes necessary. This behavior is controlled for each rule separately using the “Convert to Regular Expression” option. The global Regex Options can be accessed using the Regular Expression Options button.
  3. Import Options
    1. Duplicate Form ID Behavior – User can either skip creation of a form if a duplicate is found or add the rules to an existing form.
    2. “Mark Imported Classification Form Definitions as Not Validated….” – if selected, this option will import the form as Not Validated. If the corresponding option on the Classification Definition settings is selected (see below), documents that match these Non Validated Forms will be treated as Exceptions to be processed on the Classification Validation dialog. To validate the Form, the user will open the Form in the ACE dialog. When they save out of ACE, the form will be validated for that document, any others in the batch of that type of Form and all future documents classified as that Form type.
    3. "Do not create Classification Form Definitions that have no rules" - If selected no rule will be added and the form will not be created. The system will warn the user and let them know which form definitions were not made.
Sample Database Import

Custom Text File Import

All users need to do is "Browse" to the location of your text file and click the "Import" button.

XML Import

This allows you to select an XML file that you have exported previously from the Form Definitions export option.

Classification Options

Page Validation

The type of forms that you are processing will determine the type of Page Validation you choose. Page validation in the Classification Module determines how pages will be combined and validated during the classification process. You have several choices in how pages are treated once a Form Type is identified/matched to a classification rule. In validation methods that require page count, counts are referenced from the Form ID Definition. Below are the types of validation and an explanation of the behavior:

  • Loose Page Count Validation – in this validation method, once a Form is identified, susequent pages will be added to the form until the page count is reached, or until another Form is identified. This method can be utilized with both fixed page count forms as well as varibale length documents like invoices in mixed batches.
  • Strict Page Count Validation – In Strict mode, the product will count form pages, and if they do not equal the page count defined in the Form Definition, an exception will occur.
  • No Page Count Validation – in this method, page counts are totally ignored, and the combining of pages can occur based on one of the chosen options:
    • Combine with non-classified document with previously classified document
    • Combine with classified document with previously classified document if same Form ID

 

When a Form is matched that has Last Page Rules defined...

This option enables/disables Last Page Rule processing. 

If Last Page Rule processing is enabled and a Form Definition contains Last Page Rules, then when that Form is classified, all other Page Validation and classification is disabled and classification will only search for a matching last page for that form. Once is it is found, all pages up to that page will be added to that Form and classification will switch back to normal processing looking for matches for all defined forms. We will also handle the special case where the first page of a Form is also a last page.

If a Form Definition does not contain Last Page Rules, then the selected option under Page Validation will be used (described in the above section). This allows users to mix both types of validation in case they aren't able to use Last Page Rules for all of their forms.

Classification should fail on a Document if...

When checked this will validate that the page count on Documents that have run through Classification restructuring matches the page count defined on the form that Document was classified as. If the page counts don't match, the Document will fail classification and an alert with the appropriate error message will be attached to that document. Since it fails classification, the Classification Validation dialog will be displayed with that document marked as failed.

Form Processing Setup (this section only available in versions 5.2 and above)

This option allows the choice of processing forms in either the order the forms are defined as (default) or by group. When selecting "By Group", the forms will be separated into their groups and each group will be processed in the order the groups are defined. The "By Usage Ranking" option processing the forms that are matched most often first, allowing for faster processing.

Form Processing Order

  • Defined Order
  • By Group
  • By Usage Ranking*
  • By Usage Ranking and by Groups*

*These options available in 5.4.2 and above.

Calculate Usage Ranking using (only available when usage ranking is selected)

  • All Usage
  • Usage from the last 3 months
  • Usage from the last 6 months
  • Usage from the last 12 months

Page Processing

Run Classification on:

  • All Pages of document (Default)
  • First Page of Document Only
  • Custom Page List of Document

Enter Pages to Classify

Allows you to enter a list of pages that classification should be run on. NOTE: Available if "Custom Page List of Document" is selected

Group Processing Options

This allows the ability to filter which Classification forms are run on a document type by selecting from a list of form groups.

OCR Text Classification Settings

The Classification module works by extracting a specified amount of header and footer text from the processed page, and then searching for match terms. The module allows you to adjust these settings to take in more or less text, depending on the structure of the forms you are processing.

You can set your text area to be either set in lines or area, and then define the amount of either you want to consume. NOTE: The more area you consume, the more time it will take the engine to process your text. This can become a performance issue if you are processing large areas or number of lines.

Other Settings

Index Field to populate with Form ID

This option allows users to choose to assign the Form ID to an index field or populate the Page Information dialog with Classification OCR Text for troubleshooting.

Classification Exceptions Processing

If there are any exceptions during the validation process, you can either interactively fix them in the classification module, or auto-reject to an exceptions batch based on your selection in the Exceptions Processing section below:

Example: Exceptions that occurred during the classification process. This will be covered later in Exceptions Processing.

Page Processing Settings

Group Processing Settings

Available in version 5.4+

We have moved Classification Exceptions Processing to its own tab and added a new feature called Accelerated Classification Engine (ACE). Below is a description of the ACE and what options are available.

Accelerated Classification Engine Options

Enable Accelerated Classification Engine – to enable the Accelerated Classification Engine (ACE) select this check box. ACE can only be enabled if Exceptions Processing is enabled.

Default Display Mode – the ACE configuration dialog has 2 modes: Standard and Advanced. This option lets you specify the default mode that the ACE dialog should open in.

Display Mode Options – the Display Mode Options button opens a dialog to customize which settings are displayed in the Standard or Advanced modes.

Allow user to switch Display Modes – this option controls whether the user will be allowed to switch between Standard and Advanced modes. If this is unchecked, then the ACE dialog will be opened in the mode specified by the Default Display Mode.

Default Zone Definition Profile Option – this option controls what the Zone Definition Profile Option should be set to by default when opening the ACE dialog.

Zone Definition Profile to preload copy of – If a Zone Definition Profile is selected, it will be copied and loaded automatically as the starting point for the new Zone Definition Profile when opening the ACE dialog.

Before running Classification processing on a Batch, update it's copy of the Document Type with the Classification Form Definitions, Zone Profiles and Record Types from the Document Type in Configuration – If this option is enabled, Classification Form Definitions, Zone Profiles and Record Types will be updated on the Batch based on the current settings on the Document Type in Configuration. This will ensure that the Batch has the latest versions of the settings that can be defined by the Accelerated Classification Engine.

New Classification Form Save Locations – Controls where new Classification Forms defined in ACE will be saved. By default, the forms will be saved based on the forms currently defined on the Document Type using the following rules:

  1. If the Document Type contains only local Forms, the new Form will be saved to the Document Type
  2. If the Document Type contains only global Forms, the new Form will be saved to the Global Forms collection
  3. If the Document Type contains both local and global Forms, the user will be prompted on where to store the new Form

This behavior can be overridden by selecting one of the other three options. No matter what option is selected for this setting, if the "Process Classification using all global Classification Form Definitions that are defined when Classification runs" option is enabled, the new Form will always be saved to the Global Forms collection

Treat documents that are classified as Non-Validated Forms as Exceptions to be processed through the Accelerated Classification dialog. (available in version 5.4.1+)

If this option is enabled, Forms that match a Non-Validated form during Classification will be marked as an Exception and displayed on the Classification Validation dialog with an icon indicating that it is not validated.

Accelerated Classification Engine Display Mode Options

The Display Mode Options dialog allows the user to customize the mode that settings on the Accelerated Classification Engine are displayed in each display mode. The following display option can be set for each customizable setting in the list:

  1. Advanced Only – setting will only be displayed when in Advanced mode.
  2. Both – setting will show in both Standard and Advanced mode.
  3. None – setting will not show in either Standard or Advanced mode.

 

Data Extraction and Classification

Once a document is classified, and a Record Type assigned, custom data extraction rules can be applied for that particular type of document. Through the use of shared and unique fields tied to Record Types, all the different methods of data population are available. There are several key features that leverage Record Type focused extraction:

  • Dynamic Regular Expressions – Advanced Data Extraction (ADE) now allows specific regular expressions to be configured based on the Record Type.
  • Zone Profiles – allow zone OCR-based templates that are linked to specific Record Types.