The article highlights a lack of OCR systems capable of processing historical Kurdish documents written in Arabic-Persian script. Studies on OCR applications for other languages, particularly Ottoman, indicate substantial challenges, with neural network models achieving varying degrees of success. Specific references to the complexities of Ottoman documents suggest that systems relying solely on character recognition may be inadequate. The document emphasizes the significance of treating words as images for effective recognition and proposes leveraging innovative techniques for object classification and word image matching.
To the best of our knowledge, currently, there is no OCR system that can accurately extract text from old Kurdish publications written in Arabic-Persian script.
The authors developed a model using artificial neural networks with a dataset of 28 different Ottoman machine-printed documents, achieving a recognition accuracy of 95% with known fonts.
It may not be possible to obtain satisfactory results using character recognition-based systems due to the characteristics of Ottoman documents, suggesting the importance of storing documents as images.
The bag-of-visual-terms approach was demonstrated to be effective in classifying objects and scenes, which led to its adoption in matching word images.
Collection
[
|
...
]