OCR (Optical Character Recognition) is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining.
As digital repositories and content management systems evolve, they bear full-text capabilities allowing users to retrieve more accurate search results than relying on "traditional" metadata.
Also, OCR allows a document to be saved in different word processing formats and plain text or even it can be copied and pasted partially or in whole. Furthermore, for a scanned document, in order to be properly archived in PDF or PDF/A format (ISO 19005-1:2005), an additional hidden layer of text information under the image is mandatory.
Realiscape Typorama is one of the leading OCR companies, being capable of producing raw (unedited) or professionally edited text output from many languages.
Especially for the greek language, It has also developed production quality recognition solutions for old-style greek text (polytonic), with accuracy levels that exceed 99.5% (or 5 misrecognised characters out of 1000). Also in this service can include professional editing and output either in monotonic or polytonic greek.
Using distributed systems, output capability exceeds multiple thousand pages per day on raw (unedited) text.