Notes from learning OCR

Examples using Tesseract and OpenCV to convert pictures to text.

Installing tesseract adds support for English. To support additional languages, install it separately or install all available languages.

introduction.py is an introduction script to show how to parse text from an image with interactive python.

oppskrifter.py is an example of converting pictures with recipes in Norwegian to text files and searchable PDF files.

vildanden.py is an example of converting a picture of the starting act of Vildanden by Henrik Ibsen, into a text file and a searchable PDF. See the image from Vildanden on Project Gutenberg provided under Public Domain in the USA.

Setup

On linux

sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install libtesseract-dev

and install dependencies in requirements file with pip install -r requirements.txt

converted .jpg files to .png for easier conversion using ffmpeg using ffmpeg -i input.jpg output.png

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
introduction.py		introduction.py
oppskrifter.py		oppskrifter.py
requirements.txt		requirements.txt
vildanden.py		vildanden.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notes from learning OCR

Setup

About

Releases

Packages

Languages

License

tobiasmcvey/learn-ocr

Folders and files

Latest commit

History

Repository files navigation

Notes from learning OCR

Setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages