Classification Workflow

There is quite a bit of planning that needs to go into the configuration of the Classification Module, and below are the three major planning focus areas:

  • Page Validation – When examining forms, users need to decide the type of page validation required when processing forms. Page validation in the Classification engine defines separation and page merging functionality.
  • Forms Identification – Currently in PSIcapture, users can define and classify forms based on OCR match criteria or barcode recognition. This is the most critical planning step, and will ultimately define how pages are classified and documents are created.
  • Data Extraction – The ultimate goal in classification is to identify the correct Form ID, and then extract data based on the assigned Record Type.

Classification Form Definitions

The Form Definition in Classification allows users to define all the characteristics of a form, how to identify or classify, and provides key methods for how PSIcapture will behave when a classification occurs. In the Classification Module, users can Add, Edit, Copy and Delete form definitions, as well as change their order. Users can also import Classification Form Definitions, also known as a taxonomy, through the Import option. 

Classification Table Display

  • Form ID - This is the name of the form. 
  • Record Type - A record type could be something like an invoice, quote, purchase order, etc. This is another way to separate your forms.
  • Group Type - A group type could be something like manufacturing, tax, HR, construction, etc. This allows the user to group forms together per industry for instance.
  • Validated - This shows whether a Form has been validated or not.
  • Zone Profile Triggered - This shows which Zone Profile is triggered when the application recognizes a form. The gear icon allows users to edit the Zone Profile from Classification Settings.

    Zone Profile Editing Options - When an user clicks on the gear icon they get 3 options:

    • Edit the Zone Profile associated with the Form. If multiple profiles are associated to the current Form the option of choosing which profile to edit is available.

    • Create a new Zone Profile for the Form.
    • Select a different Zone Profile not associated with the Form.

At the bottom of the forms list there is an area where the user can see a few statistics. These statistics tell the user how many total forms there are, how many validated forms there are and what percentage of them are actually validated, how many record types there are, and how many groups of forms there are. Users can also choose whether to show if a form has been validated via a checkbox in the far right column of the forms list.

NOTE: The View Usage button allows users to view global usage of each form. When clicking on View Usage the following window will pop up allowing users to run a query by timeframe.

Adding Form Definitions

Clicking the Add button will open the Form Definition dialog. As mentioned, this provides an interface for defining all the characteristics of a form. Within this configuration interface, users have the standard template toolbar which allows them to load or scan a template image, as well as a set of zooming tools.

Form Settings

  • Form ID – The Form ID is the name of the form these characteristics define. Note: This name will be available as a variable, and be placed in a linked index field.
  • Group – The Group allows users to create subsets of forms and currently is purely for organization within the configuration.
  • Record Type – This dropdown will link to the configured Record Types on step 3 of the configuration wizard, and allows the linking of the Form Definition to the chosen Record.
  • Description – Allows a user defined description of the form.
  • Page Count – For forms of specified page lengths, this count will be utilized in page validation.
  • Usage Ranking Behavior - This option allows users to keep the current use ranked position or override usage ranking settings so that the selected form gets process in the beginning or end of the queue.
    • Use Ranked position

    • Override Ranking and process Form at the beginning of the Form list

    • Override Ranking and process Form at the end of the Form list

Rules

Classification Rules

The Classification Rules section of the module provides the ability to input one or more rules that will define the form. Below are the options:

  • Match – Users can choose a positive or negative match for the rule, and combine them to build a series of rules that will define the form. For instance, users may have a form that has “Form OFS 2” on the top, but there are two versions, with different locations for the required data. One form has “Version 2” on the bottom, one does not. Users can use a negative rule to make sure the form without Version 2 is properly identified.
  • Rule Type – Currently there are two types of rules, OCR Text and Barcode.
  • Search Region - This allows the user to select where on the page the OCR text is searched for.
  • Index Value - This allows the user to select which index field to set the value of using the classification rule.
  • Rule Value – The Rule Value provides an entry point for a regular expression to match either the barcode value or an OCR expression. This will trigger the classification and setting of Record Type.
  • Rule Match Behavior – If users have multiple rules, this drop down will provide a means to logically combine them to define the overall match. Users can either choose to match on the first rule matched, or make the combination of all the rules required.

    Note: The order of rules can be used to the user's advantage as rules are processed in the order of entry.

Tool Icons

OCR TextBarcode

  - When clicked a pop-up window comes up allowing the user to choose what text will be used to identify the Form.

 - When clicked the Barcode Recognition window pops up allowing the user to choose what barcode will be used to identify the Form.
 - When clicked the application will verify that it recognizes the text or barcode defined. - When clicked the edit Regex window pops up allowing the user to edit the regex for the rule.

 - Deletes the rule.

Last Page Classification Rules

If Last Page Rule processing is enabled and a Form Definition contains Last Page Rules, then when that Form is classified, all other Page Validation and classification is disabled and classification will only search for a matching last page for that form. Once is it is found, all pages up to that page will be added to that Form and classification will switch back to normal processing looking for matches for all defined forms. We will also handle the special case where the first page of a Form is also a last page.

If a Form Definition does not contain Last Page Rules, then the selected option under Page Validation will be used (Loose, Strict, None). This allows users to mix both types of validation in case they aren't able to use Last Page Rules for all of their forms.

Table Extraction-Line Items

Form Qualifiers

This allows classification based on the page orientation or the size of the form. This can be useful as an additional criteria for defining a form, or can be used by itself with no rules to define a form. An example might be when scanning checks and check stubs, users can assign a record type of Check when certain page size criteria are met.

Import

Clicking Import button on Classification Module settings will now display a dialog allowing users to choose which type of import to perform:

Database Import

How to import via database:
  1. Set up the Database Connection. This uses standard dialogs used throughout the product.
  2. Import Definition
    1. Form ID is required
    2. Form ID, Description and Rules all use the standard Build Custom Value dialog to build those values from different database fields/constants.
    3. The other fields are all optional including Rules. Setting up Rules during this step applies them universally across all imported forms. By making Rules optional, it allows the user to come back later and add rules to individual forms.
    4. When defining Rules, users can either use the values from the table as is or run the values through the Regex Builder to generate codes necessary. This behavior is controlled for each rule separately using the “Convert to Regular Expression” option. The global Regex Options can be accessed using the Regular Expression Options button.
  3. Import Options
    1. Duplicate Form ID Behavior – Users can either skip creation of a form if a duplicate is found or add the rules to an existing form.
    2. “Mark Imported Classification Form Definitions as Not Validated….” – If selected, this option will import the form as Not Validated. If the corresponding option on the Classification Definition settings is selected (see below), documents that match these Non Validated Forms will be treated as Exceptions to be processed on the Classification Validation dialog. To validate the Form, the user will open the Form in the ACE dialog. When they save out of ACE, the form will be validated for that document, any others in the batch of that type of Form and all future documents classified as that Form type.
    3. "Do not create Classification Form Definitions that have no rules" - If selected no rule will be added and the form will not be created. The system will warn the user and let them know which form definitions were not made.
Sample Database Import

Custom Text File Import

All users need to do is Browse to the location of the text file and click the Import button.

XML Import

This allows users to select an XML file that they have exported previously from the Form Definitions export option. NOTE: In versions 6.0.2.x and below this import option is only available in the Classification Configuration settings of the main configuration.

Export

This allows users to export an XML file from Classification Workflow Settings.

Classification Settings - General

Form Processing Setup

This option allows the choice of processing forms in either the order the forms are defined as (default) or by group. When selecting By Group, the forms will be separated into their groups and each group will be processed in the order the groups are defined. The By Usage Ranking option processing the forms that are matched most often first, allowing for faster processing.

Form Processing Order

  • Defined Order
  • By Group
  • By Usage Ranking
  • By Usage Ranking and by Groups

Calculate Usage Ranking using (only available when usage ranking is selected)

  • All Usage
  • Usage from the last 3 months
  • Usage from the last 6 months
  • Usage from the last 12 months

Page Processing

Run Classification on:

  • All Pages of document (Default)
  • First Page of Document Only
  • Custom Page List of Document

Enter Pages to Classify

Allows users to enter a list of pages that classification should be run on. NOTE: Available if Custom Page List of Document is selected.

Group Processing Options

This allows the ability to filter which Classification forms are run on a Capture Profile by selecting from a list of form groups.

Indexing Options

  • Index field to populate Form ID -This option allows users to choose to assign the Form ID to an index field or populate the Page Information dialog with Classification OCR Text for troubleshooting.
  • Index field to populate with Group Name - This option allows users to assign the Group Name to an index field.
  • Index field to populate with Record Type - This option allows users to assign the Record Type to an index field.
  • Index field to populate with Description - This options allows users to assign the Description to an index field.

OCR Text Classification Settings

The Classification module works by extracting a specified amount of header and footer text from the processed page, and then searching for match terms. The module allows users to adjust these settings to take in more or less text, depending on the structure of the forms they are processing.

Users can set the text area to be either set in lines or area, and then define the amount of either they want to consume. NOTE: The more area users consume, the more time it will take the engine to process the text. This can become a performance issue if users are processing large areas or number of lines.

OCR Text Viewing Options

This options allow the user to view the OCR text in either the Page Information dialog or the Classification Validation dialog.

Classification - Page Validation

Page Validation Options

The type of forms that users are processing will determine the type of Page Validation to choose. Page validation in the Classification Module determines how pages will be combined and validated during the classification process. Users have several choices in how pages are treated once a Form Type is identified/matched to a classification rule. In validation methods that require page count, counts are referenced from the Form ID Definition. Below are the types of validation and an explanation of the behavior:

  • Loose Page Count Validation – In this validation method, once a Form is identified, susequent pages will be added to the form until the page count is reached, or until another Form is identified. This method can be utilized with both fixed page count forms as well as varibale length documents like invoices in mixed batches.
  • Strict Page Count Validation – In Strict mode, the product will count form pages, and if they do not equal the page count defined in the Form Definition, an exception will occur.
  • No Page Count Validation – In this method, page counts are totally ignored, and the combining of pages can occur based on one of the chosen options:
    • Combine with non-classified document with previously classified document
    • Combine with classified document with previously classified document if same Form ID

Last Page Rule validation options

  • When a Form is matched that has Last Page Rules defined...

This option enables/disables Last Page Rule processing. 

If Last Page Rule processing is enabled and a Form Definition contains Last Page Rules, then when that Form is classified, all other Page Validation and classification is disabled and classification will only search for a matching last page for that form. Once is it is found, all pages up to that page will be added to that Form and classification will switch back to normal processing looking for matches for all defined forms. We will also handle the special case where the first page of a Form is also a last page.

If a Form Definition does not contain Last Page Rules, then the selected option under Page Validation will be used (described in the above section). This allows users to mix both types of validation in case they aren't able to use Last Page Rules for all of their forms.

  • Classification should fail on a Document if...

When checked this will validate that the page count on Documents that have run through Classification restructuring matches the page count defined on the form that Document was classified as. If the page counts don't match, the Document will fail classification and an alert with the appropriate error message will be attached to that document. Since it fails classification, the Classification Validation dialog will be displayed with that document marked as failed.

  • Start new Document if classification of a new form...

When this box is checked a new document will be created if a new Form ID is triggered before the Last Page Rule is processed.

  • Only start a new Document if Form ID...

When checked a new Form ID is created ONLY if the Form ID is different than the one being checked.

Classification - Exceptions Processing

If there are any exceptions during the validation process, users can either interactively fix them in the classification module, or auto-reject to an exceptions batch based on the selection in the Exceptions Processing section below:

Example: Exceptions that occurred during the classification process. This will be covered later in Exceptions Processing.

We have moved Classification Exceptions Processing to its own tab and added a new feature called Accelerated Classification Engine (ACE). As of PSIcapture 7.4, we have added additional automation options, including: Form ID auto-naming, Automated Rule creation, and Automated Last Page Rule creation. Below is a description of the ACE and what options are available.

Accelerated Classification Engine Options

  • Enable Accelerated Classification Engine (Checkbox) – To enable the Accelerated Classification Engine (ACE) select this check box. ACE can only be enabled if Exceptions Processing is enabled.
  • General (Tab)
    • Display Mode Options (Section)
      • Default Display Mode (Selector) – The ACE configuration dialog has 2 modes: Standard and Advanced. This option lets you specify the default mode that the ACE dialog should open in.
        • Allow user to switch Display Modes (Checkbox) – This option controls whether the user will be allowed to switch between Standard and Advanced modes. If this is unchecked, then the ACE dialog will be opened in the mode specified by the Default Display Mode.
        • Advanced (Button)
           

          • Accelerated Classification Engine Display Mode Options - The Display Mode Options dialog allows the user to customize the mode that settings on the Accelerated Classification Engine are displayed in each display mode. The following display option can be set for each customizable setting in the list:

            • Advanced Only – setting will only be displayed when in Advanced mode.
            • Both – setting will show in both Standard and Advanced mode.
            • None – setting will not show in either Standard or Advanced mode.
               
    • Zone Definition Profile Options (Section)

      • Default Zone Definition Profile Selection (Selector) This option controls what the Zone Definition Profile Option should be set to by default when opening the ACE dialog. Choose between creating a new zone profile definition or use an existing definition.

      • Zone Definition Profile to preload copy of (Selector) – If a Zone Definition Profile is selected, it will be copied and loaded automatically as the starting point for the new Zone Definition Profile when opening the ACE dialog.

      • When editing an existing Classification Form... (Checkbox) - When enabled and a user opens ACE on the Classification Validation dialog for an existing form, the application will attempt to load the Zone Profile triggered by the current Classification Form. If one is not found, it then looks for a Zone Profile that triggers off Record Type with the Record Type of the Form being edited. If a Zone Profile is found, it will be loaded into ACE and the user can modify it.

    • New Classification Form save location (Section) – Controls where new Classification Forms defined in ACE will be saved. User options are: 

      • (Default) Automatically save the new Classification Form based on how the current Classification Forms are defined on the Capture Profile.
        • Note: This behavior can be overridden by selecting one of the other three options. No matter what option is selected for this setting, if the "Process Classification using all global Classification Form Definitions that are defined when Classification runs" option is enabled, the new Form will always be saved to the Global Forms collection.
      • Always prompt the user to select where to save the new Classification Form.
      • Always save the new Classification Form to the Global Forms collection.
      • Always save the new Classification Form to the Capture Profile.
    • Other Options (Section)
      • Before running Classification processing on a Batch, update its copy of the Capture Profile with the Classification Form Definitions, Zone Profiles and Record Types from the Capture Profile in Configuration (Checkbox) – If this option is enabled, Classification Form Definitions, Zone Profiles and Record Types will be updated on the Batch based on the current settings on the Capture Profile in Configuration. This will ensure that the Batch has the latest versions of the settings that can be defined by the Accelerated Classification Engine.
      • Treat documents that are classified as Non-Validated Forms as Exceptions to be processed through the Accelerated Classification dialog. ... (Checkbox) – If this option is enabled, Forms that match a Non-Validated form during Classification will be marked as an Exception and displayed on the Classification Validation dialog with an icon indicating that it is not validated.
  • Form ID Auto-Naming (Tab)


  • Form ID auto naming methods (Section)
    • Point and click entry (Selector) - This setting lets users control the populating of a Form ID using point and click. The Form ID will be populated with the raw value that is clicked on not with the generated Regex that the rule is populated with.
      • Do not populate Form ID using point and click.
      • Populate Form ID if it is blank.
      • Populate Form ID if it is blank and create a new Rule.
    • Automated Naming (Selector) - This selector provides an option to enforce formatting guidelines on point and click naming of forms when enabled.
      • Automated Form ID Configuration (Dialog)
        • Automated Form IDs (Section) - This section allows you to configure how your automated naming behaves using Regular Expression (RegEx) values. This section works in conjunction with section below as the parent of a parent-child relationship.
          • Functions: You may add or delete form ID behaviors, as well as move rules up and down in priority.
          • Description: A text field for you to describe your automated form ID
          • Preview: This field displays values built from the panel below.
        • Sections for the selected Automated Form ID (Section) - This allows you to build out your automated Form ID generator, described in sections. For instance, you may opt to generate a Form ID with a combination of date, a value matched on page, constant value, and an index field. This section works in conjunction with the section above, as a child in a parent-child relationship.
          • Selection Type (Selector) - choose between Matching OCR Value, Create Date, Constant, and Index Field.
            • Matching OCR Value - matches the OCR value specified.
              • Match Expression - enter the value to be matched using a Regular Expression (RegEx) value.
              • Process on (Selector) - choose whether to process on all pages of a document or first page of a document only.
              • Output format - select or enter the desired format for your output.
              • Required (checkbox) - choose whether this section is required or option.
            • Create Date - allows you to specify the date as a valid part of the Form ID.
            • Constant Value - allows you to specify a constant value as part of the Form ID.
            • Index Field - allows you to specify an index field as part of the Form ID.
              • Index Field (Selector) - Select the index field to use as part of the Form ID.
              • Output Format - Determine the output format for the Form ID.
        • Options (Section)
          • Form ID Selection (Selector) - allows you to determine how the Form ID generation behaves for end-users.
            • Use first valid Form ID but allow user to modify
            • User first valid Form ID and don't allow user to modify
            • Allow user to choose form all valid Form IDs
          • Exceptions Processing (Section) - determine how this function should operate when no valid expressions are found within your documents.
            • If no valid Form IDs are generated, warn the user but still allow Form to be configured.
            • If no valid Form IDs are generated, do not warn the user and allow Form to be configured.
            • If no valid Form IDs are generated, mark the Document for rejection to an Exception Batch.

  • Formatting Options (Section)
    • Auto Casing (Selector) - presents several options to apply automatic casing to captured data.
      • Do not apply auto casing
      • All characters to uppercase
      • All characters to lowercase
      • Title case - first letter of each word capitalized
      • Sentence case - first letter of each sentence capitalized, rest lowercase.
    • Character Filtering Options (Section)
      • Character Filter (Selector) - presents options to apply character filtering
        • All Characters
        • Alpha Only (a-z, A-Z)
        • Numeric Only (0-9)
        • Numeric Extended (0-9, $%#+-)
        • Date (0-9, /-)
        • Extended Characters Only
        • Standard Printable Characters
      • Enable Extended Characters (Checkbox), available once other than All Characters selected
        • Enter Characters: Choose additional characters to adjust
        • Invalid character action (Selector)
          • Do No Correct
          • Remove
          • Auto Correct
          • Replace with Marker
    • Words to Remove (Section) - allows you to enter entire words in the form of Regular Expressions (RegEx) value for removal. Case may be optionally matched.
  • Automated Rule creation (Tab)

 

  • Automated Rules (Section) - allows you to automatically generate classification rules based on matching values within your documents.
    • Automated Rules configuration
      • Functions -Add, Edit, and Delete automated rules, as well as move rules up or down in priority.
      • Description - Name your Automated Rule
      • Match Expression - Provide a matching expression in the form of a Regular Expression (RegEx) value.
      • Process on (Selector) - allows you to determine whether to search for the expression on all pages, first page of document only, or last page of document only.
    • Options (Section)
      • Rule Creation (Selector)
        • Generate a rule only for the first expression successfully matched
        • Generate rules fora ll the expressions successfully matched
    • Exceptions Processing (Section) - determine how this function should operate when no valid expressions are found within your documents.
      • If no valid rules are generated, warn the user but still allow Form to be configured
      • If no valid rules are generated, do not warn the user and allow Form to be configured
      • If no valid rules are generated, mark the document for rejection to an exceptions batch
  • Automated Last Page Rule creation (Tab)

 

  • Automated Last Page Rules (Section) - allows you to automatically generate last page classification rules based on matching values within your documents.
    • Automated Last Page Rules configuration
      • Functions -Add, Edit, and Delete automated last page rules, as well as move last page rules up or down in priority.
      • Description - Name your Automated Lat Page Rule
      • Match Expression - Provide a matching expression in the form of a Regular Expression (RegEx) value.
      • Process on (Selector) - allows you to determine whether to search for the expression on all pages, first page of document only, or last page of document only.
  • Options (Section)
    • Rule Creation (Selector)
      • Generate a last page rule only for the first expression successfully matched
      • Generate last page rules fora ll the expressions successfully matched
  • Exceptions Processing (Section) - determine how this function should operate when no valid expressions are found within your documents.
    • If no valid rules are generated, warn the user but still allow Form to be configured
    • If no valid rules are generated, do not warn the user and allow Form to be configured
    • If no valid rules are generated, mark the document for rejection to an exceptions batch


Data Extraction and Classification

Once a document is classified, and a Record Type assigned, custom data extraction rules can be applied for that particular type of document. Through the use of shared and unique fields tied to Record Types, all the different methods of data population are available. There are several key features that leverage Record Type focused extraction:

  • Dynamic Regular Expressions – Advanced Data Extraction (ADE) now allows specific regular expressions to be configured based on the Record Type.
  • Zone Profiles – Allow zone OCR-based templates that are linked to specific Record Types.