A simple computer vision project that scans documents from images and converts them into clean, readable PDF files.
Built using OpenCV and Streamlit.
- Detects document edges automatically
- Corrects perspective (flattens the page)
- Removes shadows and background noise
- Enhances readability (black & white scan)
- Supports multiple images
- Exports scanned pages as a single PDF
- Python
- OpenCV → image processing
- NumPy → numerical operations
- Streamlit → web app interface
- Pillow → PDF generation
The app follows this pipeline:
-
Image Upload
- User uploads one or more images.
-
Document Detection
- Uses contour detection to find the document edges.
- Filters shapes to detect a 4-sided boundary.
-
Perspective Transform
- Converts tilted document into a flat top-down view.
-
Preprocessing
- Converts to grayscale
- Removes shadows using background normalization
- Applies denoising
-
Thresholding
- Converts image into clean black & white for readability
-
PDF Generation
- All processed images are combined into a single PDF
doc_scanner/
├── app.py
├── src/
│ ├── scanner.py
│ ├── utils.py
├── data/
│ ├── input/
│ ├── output/
├── requirements.txt
├── README.md
git clone https://github.com/YOUR_USERNAME/doc_scanner.git
cd doc_scanner
conda create --prefix ./venv python=3.10
conda activate ./venv
pip install -r requirements.txt
streamlit run app.py
Then open the link shown in terminal.
-
Upload one or more images
-
The app will automatically:
- detect the document
- scan and enhance it
-
Click Download PDF to get the final output
Detects boundaries of objects in the image.
Finds shapes and identifies the document region.
Maps the document into a flat rectangular view.
- Noise removal
- Shadow removal
- Contrast improvement
Converts image into high-contrast black & white.
- Live camera scanning
- Automatic cropping
- OCR (text extraction)
Ahan Mondal