OCR Text Scanner Pro : Convert An Image To Text Pro V1.5.4 [Patched] [Latest]
January 24, 2022SentiSight.ai image labeling and recognition platform has been updated.There are performance improvements for the OCR text recognition model.The cloud platform now allows to download Places Classification and OCR pre-trained models.Also, there are multiple improvements for the labeling tool and project management, including simultaneous multiple model training.
OCR Text Scanner pro : Convert an image to text Pro v1.5.4 [Patched] [Latest]
February 8, 2021SentiSight.ai image labeling and recognition platform now is able to perform text recognition from the uploaded images (also known as OCR).The text recognition is available as a pre-trained model with 75 languages support.
Symphony OCR is part of Symphony Suite, The Complete Imaging Solution. Symphony OCR is a back-end OCR engine. It will locate all image-only PDF and TIF files in your document management system and convert them to fully text searchable PDFs by adding an invisible layer of text over the image. Symphony OCR typically runs on a back-end PC or server (for Worldox sites, this is typically the Indexer PC).
Grayscale and color images are larger (in bits) than black and white images. This means that a 5-page document scanned to PDF in color/grayscale has more bits than the same 5-page document scanned to PDF in black and white. Symphony OCR, however, applies the same layer of text to both documents and that text would increase the same # of bits on each of the two scans. So, the percentage of size Symphony OCR adds actually goes DOWN the higher quality the scan gets.
Optical Character Recognition (OCR) converts images into characters for text searching. Because OCR is time and resource intensive, performing OCR during scanning significantly reduces your efficiency. Symphony OCR performs the OCR task in a background process, allowing you to turn OCR off during scanning. This procedure covers how to disable OCR when scanning using Adobe Acrobat.
For the most part documents that do not contain an image or text don't need to be processed. One example of this might be your company's PDF letterhead template. By default these PDF documents will not be processed and will be placed in the "No Image or Text" Document List. How can a PDF have no image and no text?
By default, Symphony OCR does not process .tif files. The reason this is not enabled by default is that because it is not possible to add an invisible layer of text to a .tif file, Symphony OCR actually converts the tif files to pdf. This provides firms with the option to enable if they choose to do so.
The Processor will process the .tif file by converting the .tif file to a .pdf file. Why? The .tif format does not allow the invisible layer of text to be added. Therefore, it must be converted to a .pdf file when processed.
Symphony OCR converts image only PDF files into text searchable PDF files. Once the PDF file contains text, users can immediately search *within* the PDF. However, the Worldox text indexer must update before the text contents of the PDF become available for text-in-file searching (i.e. searching *for* the document).
6.6.92 - - Bug fix - NetDocuments made an API change on 4/20/2018 that caused our "Open" links to not take the user to the document in ND 6.6.90 - - Bug fix - SharePoint sites that had spaces in their name were resulting in "Illegal character in path" Finder errors 6.6.89 - - Added ability to reprocess documents in the Processing (TOPROCESS) list (just in case they need to be re-analyzed manually) 6.6.88 - - Statistics screen now displays estimated time to process backlog based on 4 cores (1.2 seconds/page) if there is no OCR performance data to baseline against (i.e. analysis only licenses). The label on the estimate will show "(assuming 4 CPU cores)" in this case. 6.6.87 - - Bug fix - Fixed OutOfMemoryError under high analysis or OCR load 6.6.85 - - Bug fix (kinda) - Legacy files in Worldox with at the beginning were appearing in the corrupted list. Worldox indexed searches were sometimes returning files with at the beginning (these are generally temp files that shouldn't have been part of the indexes and certainly shouldn't be OCRed) 6.6.84 - - Added accessibility message for Worldox integration informing the user that WD versions prior to 20180412 do not support paths longer than 255 characters - Added accessibility message for Worldox integration for sites with WD versions later than 20180412 that WD does not support paths longer than 380 characters 6.6.82 - - Added conditional logic to Worldox integration to allow spaces after filename and before the file extension (WD code running after 20170601 allows spaces) 6.6.81 - - Improved detection of pages that should be ocr'ed even though they have excessive text in margins - Improved detection of pages that should not be ocr'ed even if they have full image on the page and rendered text beneath the image - Added additional columns to Document Details screen, page analysis results 6.6.76- - Bug fix - setting for limiting maximum number of cores used during OCR was not being honored - Changed Processor configuration UI for clarity around maximum allowed parallel processing settings 6.6.73 - - Bug fix - SOCR doesn't shut down at NetDocuments sites that were actively searching for documents 6.6.71 - - Bug fix - NetDocuments configuration screen not showing error/warning details for Analyzer-only licenses - Bug fix - NetDocuments integration was having preserveModifiedInfo set to false after setup with Analyzer-only licenses 6.6.69 - - Added separate NetDocuments connection buttons for US, EU and AU vaults 6.6.68 - Bug fix - Welcome Wizard didn't work properly if there were features in the license that required DMS configuration to work properly (# of ND or SP users, for example) - Removed the 'Manage' button from Issues list in welcome wizard - Improved error message for invalid tenant URLs 6.6.64 - - Bug fix - blank pages without any content stream were marked as corrupted instead of blank 6.6.59 - - Bug fix - the user interface prevented setting the SharePoint legacy search frequency to 0. This is now allowed. 6.6.58 - - If SharePoint legacy search frequency is set to 0, the legacy search will be skipped 6.6.57 - - Bug fix - Fixed a bug with Rollback so it would work with the first button click (from the document detail page) - Added the bulk operation "Rollback" to the Processed documents page. When clicked, all documents in the current search will be rolled back to their non-OCRed version - Added the bulk operation "Reanalyze" to the Reprocessing documents page. When clicked, all documents in the current search will be moved to the Analyzing bucket 6.6.55 - - Reworked the processing metrics on the Analyzer, Processor and Summary pages to show more useful data in a more user friendly fashion - Removed all support for Bonus Page tracking 6.6.54 - - Moved to jWDAPI 1.0.22 to have the WorldoxSession fast fail if the user is invalid - Backed out the previous Worldox invalid user fast fail code 6.6.53 - - Fixed bug where wrong page tracker was being used by the Analyzer 6.6.52 - - Modified processor core algorithm to not consider physical cores on the machine. Solely determined by the license now 6.6.51 - - Fixed bug in the ProcessorManager config that was causing maxThreads to be persisted and thus override calculated maxThreads - Updated the version check url to be https to work with the new version update https redirection 6.6.50 - - Tracked WorldoxConnection failure due to invalid user, and quick fail on repeated calls to the connection until a valid user is provided. 6.6.48 - - Moved to jlicensing 1.0.10 to support new license expiration warning logic 6.6.46 - - Created a ProcessorManager that will not create Processor tasks that do the work. These tasks will be added to an executor service so we can run multiples in parallel. A ProcessorManager will be created for each document processing type (analysis, ocr, rollback) - Split the processor mgmt (stop, start) config into a new section, and provided migration for it - Renamed the processors to AnalyzerProcessor, OCRProcessor and RollbackProcessor. Supporting classes followed suit - Added a WorkingFolderProvider to track and manage working folders for the managers - Created ProcessorThreadPoolExecutor for use by the ProcessorManager. It has the ability to block task addition until a thread is available - Refactored the OCRProvider to remove the generic parts, since we only support a since ocr engine - Refactored the page count handling - Implemented dripMode in the ProcessorManager - Deleted OldPageCountFeature - Made processor factory config classes immutable - Added more support for Processor status - Added a maxDripsBeforeHalting setting to ProcessorManager, to allow dripMode to halt after X docs are processed, rather than just 1 - Fixed bug in the Analyzer config screen that wasn't persisting the isAllowMsgAttachments setting - Added custom message support to the ErrorTracker so we could get better error messages in the UI - Updated Processor and Analyzer web pages to show a list of documents being processed - Updated statistic verbage per changes - Modified page statistics to divide results by the number of running threads 6.6.45e - - Bug fix - SOCR was checking to make sure 8.3 filename information was available for all versions of Worldox integration. We now only check if the version is prior to the WDU10 release (which fixed 8.3 realted issues in WDAPI) 6.6.45d - - Modified calls to NDAPI for creating new versions to ensure we don't modified lastModified info 6.6.45c - - Performance improvement when retrieving number of active users in NetDocuments integration 6.6.45b - - Add support for unlimited user count licensing for NetDocuments integration 6.6.44 - - Bug fix - some malformed PDFs (huge page catalogs, large number of pages) could result in out of memory exceptions