Improving OCR Accuracy in Historical Archives with Deep Learning

"Vamvakas et al. (2008) presented a complete OCR methodology for recognizing historical documents, effective for both machine-printed and handwritten formats through adaptable processes."

"The OCR methodology involves three steps: first, a training database is created; second, a top-down segmentation approach detects text; third, recognition applies the segmentation to new documents."

"The methodology includes image binarization and enhancement for preprocessing, followed by a clustering scheme to group similar shaped characters, enabling user interaction to correct errors."

"Results showed the model achieved 83.66% accuracy, with plans for future optimization through enhanced segmentation methods and feature types."

The OCR methodology presented by Vamvakas et al. (2008) is designed for recognizing both machine-printed and handwritten historical documents. It involves three main steps: creating a training database from a set of documents, applying a top-down segmentation approach to detect text elements, and recognizing new documents using the established database. Preprocessing includes image enhancement, with a clustering scheme grouping characters by shape. Results indicated an accuracy of 83.66%, with optimizations planned for future segmentation processes and feature enhancements to improve recognition outcomes.

#ocr #historical-documents #machine-learning #text-recognition #kurdish-language

Read at Hackernoon

Unable to calculate read time

Collection

[

...

]

Improving OCR Accuracy in Historical Archives with Deep Learning | HackerNoonImproving OCR Accuracy in Historical Archives with Deep Learning | HackerNoon Briefly

Improving OCR Accuracy in Historical Archives with Deep Learning | HackerNoon
Improving OCR Accuracy in Historical Archives with Deep Learning | HackerNoon
Briefly