-
GUI for clean-up where the image and the content extracted from OCR are displayed side-by-side.
-
Data and image synchronization at character level is unchanged throughout the process.
-
Provision for clean-up of low-confidence (suspect) characters that are flagged in the common format.
-
Spell-check option with customized dictionary based on both language and content.
-
Interactive identification and correction of special characters which were not defined in the project symbol list.
-
Ability to run project-based validation rules like punctuation check, emphasis (bold, italic, underline) verification, spacing rules with character-level image synchronization.
-
Collect metrics on number of suspect characters reported and corrected for productivity measurement.
-
Collect metrics on number of error words reported and corrected for productivity measurement.
-
Collate the list of words not available in the dictionary for review and to build subject-specific dictionary.
-
Track the changes made by the operator to monitor quality and operational efficiency. In addition, operator information is stored for all changes, which would help in review and feedback.
-
Collect details on corrected suspect characters for analysis and to train the OCR engine to improve its efficiency.