Video Text Extraction Using Python & Tesseract OCR

📌 Overview

This script extracts all text (including duplicates) from a video by processing its frames and using Tesseract OCR for text recognition. The extracted text is saved in a file named Output.txt in the same directory as the script.

🔧 Prerequisites

Ensure you have the following installed on your system before running the script:

1️⃣ Python Installation

Make sure you have Python installed. If not, download and install it from:

Python Official Website

After installation, verify it using:

python --version

2️⃣ Install Required Python Packages

Install the necessary dependencies using pip:

pip install opencv-python pytesseract

3️⃣ Download & Install Tesseract OCR

Tesseract OCR is required for extracting text from images.

📥 Download Tesseract for Windows:

Visit: Tesseract OCR Download
Download the Windows installer (tesseract-ocr-setup.exe).
Install it, and note the installation path (e.g., C:\Program Files\Tesseract-OCR).

🔧 Add Tesseract to System PATH (Windows Only)

Open Windows Search and type Environment Variables.
Click on Edit the system environment variables.
Under System variables, find Path, then click Edit.
Click New, and add:
C:\Program Files\Tesseract-OCR
Click OK to save changes.
Test if Tesseract is working by running this command in Command Prompt (cmd):
```
tesseract --version
```
If installed correctly, it will show version details.

🚀 Running the Script

1️⃣ Place the video file

Ensure your video file is placed in the same directory as your script or specify the correct path.

2️⃣ Modify the script to set Tesseract Path

If Tesseract is not in your PATH, add this line at the beginning of your script:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

Make sure the path matches where you installed Tesseract.

3️⃣ Run the script

Execute the Python script by running:

python video_to_text.py

After execution, you will see:

Text extraction complete! Check 'Output.txt' for results.

📂 Output

Extracted text (including duplicates) will be saved in a file named Output.txt inside the same directory as the script.

🛠 Customization & Enhancements

Process every frame: Remove if frame_count % 10 == 0: to extract text from all frames.
Save unique text only: Store text in a Python set() before writing to the file.
Improve OCR Accuracy: Preprocess frames (convert to grayscale, increase contrast, etc.).

🤝 Support

If you encounter any issues, ensure:

Python, OpenCV, and Tesseract are correctly installed.
The correct Tesseract path is used.
The video file is accessible and readable.

Happy coding! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Sample		Sample
.gitignore		.gitignore
Agent.py		Agent.py
Readme.md		Readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Text Extraction Using Python & Tesseract OCR

📌 Overview

🔧 Prerequisites

1️⃣ Python Installation

2️⃣ Install Required Python Packages

3️⃣ Download & Install Tesseract OCR

📥 Download Tesseract for Windows:

🔧 Add Tesseract to System PATH (Windows Only)

🚀 Running the Script

1️⃣ Place the video file

2️⃣ Modify the script to set Tesseract Path

3️⃣ Run the script

📂 Output

🛠 Customization & Enhancements

🤝 Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Mista-Log/Video-to-Text-AI-Agent-

Folders and files

Latest commit

History

Repository files navigation

Video Text Extraction Using Python & Tesseract OCR

📌 Overview

🔧 Prerequisites

1️⃣ Python Installation

2️⃣ Install Required Python Packages

3️⃣ Download & Install Tesseract OCR

📥 Download Tesseract for Windows:

🔧 Add Tesseract to System PATH (Windows Only)

🚀 Running the Script

1️⃣ Place the video file

2️⃣ Modify the script to set Tesseract Path

3️⃣ Run the script

📂 Output

🛠 Customization & Enhancements

🤝 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages