Skip to content

Commit 8c785df

Browse files
committed
minor improvements
1 parent add18eb commit 8c785df

File tree

4 files changed

+7
-1
lines changed

4 files changed

+7
-1
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ Bash is still the most-used shell. And the scipts comprise mostly of simple cond
6969
- https://0xacab.org/jvoisin/mat2
7070
- https://github.com/NicolasBernaerts/ubuntu-scripts/blob/master/pdf/pdf-repair
7171
- https://scantailor.org/ (unmantained)
72+
- [more tools for PDF in my blog post](https://johannesfilter.com/python-and-pdf-a-review-of-existing-tools/)
7273

7374
## Development
7475

ocr_pdf.sh

+2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ set -x
1616
# https://ocrmypdf.readthedocs.io/en/latest/docker.html#adding-languages-to-the-docker-image
1717
# https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages
1818
# eng - English, deu - German, spa - Spanish, fra - French, por - Portuguese, chi_sim - Chinese simplified
19+
# to use OCR with more than one language: -l deu+eng
20+
#
1921
#
2022
# Please report issues at https://github.com/jfilter/pdf-scripts/issues
2123
#

setup.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -29,5 +29,6 @@ fi
2929
if [ -f /etc/lsb-release ]; then
3030
# not sure, TODO
3131
apt-get update && apt-get install -y parallel ghostscript mupdf-tools qpdf poppler-utils detox libimage-exiftool-perl imagemagick
32-
apt-get install -y containerd docker.io
32+
# not sure about Docker installation
33+
apt-get install -y docker-ce docker.io
3334
fi

utils/pages_with_text.py

+2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
from pdflib import Document
44

5+
# TODO: use import `pdftotext`, pdflib is hard to install
6+
57
parser = argparse.ArgumentParser(
68
description="checks for presence of absence of text on images"
79
)

0 commit comments

Comments
 (0)