This script extracts all text (including duplicates) from a video by processing its frames and using Tesseract OCR for text recognition. The extracted text is saved in a file named Output.txt
in the same directory as the script.
Ensure you have the following installed on your system before running the script:
Make sure you have Python installed. If not, download and install it from:
After installation, verify it using:
python --version
Install the necessary dependencies using pip:
pip install opencv-python pytesseract
Tesseract OCR is required for extracting text from images.
- Visit: Tesseract OCR Download
- Download the Windows installer (
tesseract-ocr-setup.exe
). - Install it, and note the installation path (e.g.,
C:\Program Files\Tesseract-OCR
).
- Open Windows Search and type
Environment Variables
. - Click on Edit the system environment variables.
- Under System variables, find Path, then click Edit.
- Click New, and add:
C:\Program Files\Tesseract-OCR
- Click OK to save changes.
- Test if Tesseract is working by running this command in Command Prompt (cmd):
If installed correctly, it will show version details.
tesseract --version
Ensure your video file is placed in the same directory as your script or specify the correct path.
If Tesseract is not in your PATH, add this line at the beginning of your script:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
Make sure the path matches where you installed Tesseract.
Execute the Python script by running:
python video_to_text.py
After execution, you will see:
Text extraction complete! Check 'Output.txt' for results.
- Extracted text (including duplicates) will be saved in a file named
Output.txt
inside the same directory as the script.
- Process every frame: Remove
if frame_count % 10 == 0:
to extract text from all frames. - Save unique text only: Store text in a Python
set()
before writing to the file. - Improve OCR Accuracy: Preprocess frames (convert to grayscale, increase contrast, etc.).
If you encounter any issues, ensure:
- Python, OpenCV, and Tesseract are correctly installed.
- The correct Tesseract path is used.
- The video file is accessible and readable.
Happy coding! 🚀