Improving OCR Accuracy in Historical Archives with Deep Learning | HackerNoon
Briefly

The OCR methodology presented by Vamvakas et al. (2008) is designed for recognizing both machine-printed and handwritten historical documents. It involves three main steps: creating a training database from a set of documents, applying a top-down segmentation approach to detect text elements, and recognizing new documents using the established database. Preprocessing includes image enhancement, with a clustering scheme grouping characters by shape. Results indicated an accuracy of 83.66%, with optimizations planned for future segmentation processes and feature enhancements to improve recognition outcomes.
Vamvakas et al. (2008) presented a complete OCR methodology for recognizing historical documents, effective for both machine-printed and handwritten formats through adaptable processes.
The OCR methodology involves three steps: first, a training database is created; second, a top-down segmentation approach detects text; third, recognition applies the segmentation to new documents.
The methodology includes image binarization and enhancement for preprocessing, followed by a clustering scheme to group similar shaped characters, enabling user interaction to correct errors.
Results showed the model achieved 83.66% accuracy, with plans for future optimization through enhanced segmentation methods and feature types.
Read at Hackernoon
[
|
]