What is Excerptor?
Excerptor is a specialized tool designed to extract highlighted or handwritten text from physical books. By using image processing and optical character recognition (OCR) technology, it converts marked text from books into digital format, making it easy to edit and save. This tool is valuable for quickly extracting key information from numerous books, enhancing research and learning efficiency.
Excerptor is particularly useful for students, researchers, writers, and anyone who needs to extract information from books. Students and researchers can use it to quickly gather critical data for their work, while writers can use it to organize and edit quoted text. General users can also digitize important parts of their personal book collections.
Here’s how you can use Excerptor:
1. Prepare the physical book with the text you want to extract and take clear images of its pages.
2. Place the images in the designated input folder within Excerptor.
3. Run the Excerptor program and select whether you want to identify highlighted text or handwritten notes.
4. Excerptor will automatically handle image preprocessing, straightening curved pages, and converting text to editable formats.
5. Review the results and manually correct any errors if necessary.
6. Save the extracted text to an output folder or further edit and process it as needed.
7. Optionally, archive the original images in a specified folder.
Key Features:
Identifies highlighted text in physical books.
Recognizes handwritten notes in books.
Processes images by adjusting white balance and reducing noise.
Corrects page curvature.
Converts images to editable text using OCR.
Supports model training with YOLO for text region segmentation.
Provides an interface for correcting OCR errors.
Handles multiple pages efficiently through batch processing.