Training Tesseract OCR on Kurdish Historical Documents | HackerNoon
Briefly

Historical publications were collected from Zaytoon Public Library in Erbil, but due to their fragile condition, digitization was challenging. The Zheen Center for Documentation and Research in Sulaymaniyah successfully digitized these documents using advanced technologies. Image processing included correcting skew, cropping, and converting to binary format. The dataset contained single-line images of historical text, with transcription files created manually to ensure accuracy and preservation of the documents' content.
The Zaytoon Public Library in Erbil provided access to historical publications, though fragile conditions complicated digital transfer.
Zheen Center for Documentation and Research in Sulaymaniyah effectively digitized early Kurdish publications using specialized scanning technologies.
Image processing involved correcting skew, automatic cropping, and conversion of images to binary format for effective handling.
The final dataset consisted of single-line images of historical documents alongside manual transcription files created through text editing programs.
Read at Hackernoon
[
|
]