Skip to content

Commit 087d748

Browse files
committed
Drop python2 support
1 parent 102a584 commit 087d748

25 files changed

+150
-125
lines changed

Diff for: .gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,6 @@ tests/gif/standardized_text.txt
7171
tests/jpg/standardized_text.txt
7272
tests/tiff/standardized_text.txt
7373
tests/pdf/ocr_text.txt
74+
75+
# PyCharm
76+
.idea/

Diff for: .travis.yml

+4-13
Original file line numberDiff line numberDiff line change
@@ -3,35 +3,26 @@ os: linux
33

44
language: python
55
python:
6-
- "2.7"
76
- "3.7"
87

98
# install system dependencies here with apt-get.
109
before_install:
1110
- sudo ./provision/debian.sh
12-
- python -m pip install --upgrade pip
11+
- python -m pip install --upgrade pip setuptools wheel
1312

1413
# install python dependencies including this package in the travis
1514
# virtualenv
1615
install:
17-
18-
- if [[ $TRAVIS_PYTHON_VERSION == 3.7 ]];
19-
then ./provision/python3.sh;
20-
fi
21-
- if [[ $TRAVIS_PYTHON_VERSION == 2.7 ]];
22-
then ./provision/python2.sh;
23-
fi
24-
- pip install .[pocketsphinx]
16+
- ./provision/python.sh
17+
- pip install .
2518

2619
# commands to run the testing suite. if any of these fail, travic lets us know
2720
script:
2821
- cd tests && make && cd -
2922
- nosetests --with-coverage --cover-package=textract
3023
- cd tests && pytest && cd -
3124
# - pycodestyle textract/ bin/textract
32-
- if [[ $TRAVIS_PYTHON_VERSION == 3.7 ]];
33-
then cd docs && make html && cd -;
34-
fi
25+
- cd docs && make html && cd -;
3526

3627
# commands to run after the tests successfully complete
3728
after_success:

Diff for: Vagrantfile

+1-2
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,7 @@ Vagrant.configure("2") do |config|
2727
vb.customize ["modifyvm", :id, "--ioapic", "on"]
2828
vb.customize ["modifyvm", :id, "--cpus", "2"]
2929
vb.customize ["modifyvm", :id, "--memory", "2048"]
30-
override_config.vm.box = "trusty64"
31-
override_config.vm.box_url = "https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box"
30+
override_config.vm.box = "ubuntu/focal64"
3231
end
3332

3433
# steps for provisioning so that these provisioning steps are

Diff for: bin/textract

100644100755
File mode changed.

Diff for: docs/changelog.rst

+5
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ latest changes in development for next release
1010
----------------------------------------------
1111

1212
.. THANKS FOR CONTRIBUTING; ADD YOUR UNRELEASED CHANGES HERE!
13+
1.7.0
14+
-------------------
15+
16+
* Dropped python2 support
17+
1318
1.6.5
1419
-------------------
1520

Diff for: docs/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@
5858
# built documents.
5959
#
6060
# The short X.Y version.
61-
release = version = "1.6.5"
61+
release = version = "1.7.0"
6262

6363
# The language for content autogenerated by Sphinx. Refer to documentation
6464
# for a list of supported languages.

Diff for: docs/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ file types by either mentioning them on the `issue tracker
8686

8787
* ``.wav`` via `SpeechRecognition`_ and `pocketsphinx`_
8888

89-
* ``.xlsx`` via `xlrd <https://pypi.python.org/pypi/xlrd>`_
89+
* ``.xlsx`` via `openpyxl <https://pypi.python.org/pypi/openpyxl>`_
9090

9191
* ``.xls`` via `xlrd <https://pypi.python.org/pypi/xlrd>`_
9292

Diff for: provision/python3.sh renamed to provision/python.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ fi
1212
pip install -U pip
1313

1414
# Install the requirements for this package as well as this module.
15-
pip install -r requirements/python-dev3
15+
pip install -r requirements/python-dev
1616
pip install -r requirements/python-doc

Diff for: provision/python2.sh

-15
This file was deleted.

Diff for: provision/travis-mock.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# if its a problem.
77
# http://docs.travis-ci.com/user/languages/python/#Travis-CI-Uses-Isolated-virtualenvs
88
sudo apt-get update -qq
9-
sudo apt-get install -y python-pip python-dev build-essential
9+
sudo apt-get install -y python3-pip python3-dev build-essential
1010

1111
# install pep8 and nose for testing
1212
sudo pip install pep8 nose

Diff for: requirements/debian

+3-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ make
99

1010
# these packages are required by python-docx, which depends on lxml
1111
# and requires these things
12-
python-dev
12+
python3-dev
1313
libxml2-dev
1414
libxslt1-dev
1515

@@ -48,3 +48,5 @@ swig
4848
# libxslt1-dev for compiling lxml.
4949
# https://github.com/deanmalmgren/textract/issues/19
5050
zlib1g-dev
51+
52+
python-is-python3

Diff for: requirements/python

+11-10
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
# This file contains all python dependencies that are required by the textract
22
# package in order for it to properly work.
33

4-
argcomplete~=1.10.0
5-
beautifulsoup4~=4.8.0
6-
chardet==3.*
7-
docx2txt~=0.8
8-
extract-msg<=0.29.* #Last with python2 support
9-
pdfminer.six==20191110 #Last with python2 support
10-
python-pptx~=0.6.18
11-
six~=1.12.0
12-
SpeechRecognition~=3.8.1
13-
xlrd~=1.2.0
4+
argcomplete>=1.10.0
5+
beautifulsoup4>=4.8.0
6+
chardet>=3.*
7+
docx2txt>=0.8
8+
extract-msg>=0.29.*
9+
pdfminer.six>=20191110
10+
python-pptx>=0.6.18
11+
six>=1.12.0
12+
SpeechRecognition>=3.8.1
13+
xlrd>=1.2.0
14+
openpyxl>=2.0.0
File renamed without changes.

Diff for: requirements/python-dev2

-16
This file was deleted.

Diff for: requirements/python-doc

+2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# this only includes packages that are needed for documentation build.
22

3+
jinja2<3.1
34
sphinx==2.1.2
45
sphinx_rtd_theme==0.4.3
56
sphinx-argparse==0.2.5
7+
pocketsphinx==0.1.15

Diff for: setup.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 1.6.5
2+
current_version = 1.7.0
33
commit = True
44
tag = True
55

Diff for: setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def parse_requirements(requirements_filename):
4242

4343
setup(
4444
name=textract.__name__,
45-
version="1.6.5",
45+
version="1.7.0",
4646
description="extract text from any document. no muss. no fuss.",
4747
long_description=long_description,
4848
url=github_url,

Diff for: tests/Dockerfile

+8-9
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
1-
FROM ubuntu:12.04
1+
FROM ubuntu:20.04
22
MAINTAINER Shawn Milochik <[email protected]>
33
ENV DEBIAN_FRONTEND noninteractive
4-
ENV REFRESHED_AT 2014-08-12b
4+
ENV REFRESHED_AT 2022-08-17
55
RUN apt-get update
6-
RUN apt-get install python-pip -y
7-
ADD . /src
8-
WORKDIR /src
9-
RUN /bin/bash /src/provision/debian.sh
10-
RUN /bin/bash /src/provision/python.sh
6+
RUN apt-get install python3-pip -y
7+
ADD . /app
8+
WORKDIR /app
9+
RUN /bin/bash /app/provision/debian.sh
10+
RUN /bin/bash /app/provision/python.sh
1111
RUN adduser --disabled-password --gecos "" --home=/home/textract textract
12-
VOLUME ["/home/textract/src"]
1312
ENV PATH $PATH:/home/textract/src/bin
1413
ENV PYTHONPATH /home/textract/src
1514
USER textract
16-
ENTRYPOINT ["/home/textract/src/tests/run.py"]
15+
ENTRYPOINT ["/home/textract/src/tests/docker_entry.sh"]

Diff for: tests/docker_entry.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
# This script gets called from within the
44
# Docker container.
55

6-
./tests/run.py
6+
cd "$(dirname "$0")" && make && pytest && cd -

Diff for: tests/pdf/raw_text-m=pdfminer.txt

+36-4
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,91 @@
11
I  love  word  documents.  They  are  lovely.  They  make  me  so  happy  I  could  smile.  And  
22
that’s  why  I  wrote  this  package.  
3-
 
43

54
Sample text is hard. That’s
65
where http://hipsum.co comes
76
in handy.
87

9-
 
10-
118
Semiotics church-key VHS, Truffaut cliche actually vegan. Cray Austin
9+
1210
pop-up disrupt letterpress, kitsch fixie Cosby sweater cliche craft beer
11+
1312
PBR&B. Gentrify cornhole Tonx McSweeney's, Shoreditch keffiyeh
13+
1414
ethnic Marfa 90's kogi American Apparel. Shabby chic distillery church-
15+
1516
key locavore beard, food truck chillwave sartorial deep v flannel authentic
17+
1618
Tumblr narwhal kogi organic. Cred vegan jean shorts Banksy forage
19+
1720
Neutra dreamcatcher, hashtag Bushwick polaroid pork belly flannel
21+
1822
keytar Portland post-ironic. Cred hoodie vegan, food truck leggings
23+
1924
Austin pour-over banjo trust fund before they sold out cray Intelligentsia
25+
2026
plaid typewriter. Williamsburg XOXO plaid Carles Austin tofu.
2127

2228
Carles Tonx keffiyeh, leggings 90's lo-fi kogi viral semiotics Brooklyn
29+
2330
biodiesel tousled bespoke kitsch. Vinyl Tonx art party Thundercats retro,
31+
2432
viral asymmetrical artisan bicycle rights bitters master cleanse Kickstarter
33+
2534
YOLO. Seitan street art semiotics twee skateboard, PBR&B VHS hashtag
35+
2636
meh. Thundercats semiotics shabby chic forage single-origin coffee retro,
37+
2738
3 wolf moon iPhone mumblecore 90's trust fund Intelligentsia. Beard
39+
2840
gluten-free seitan, VHS sartorial pork belly gastropub meh whatever
41+
2942
authentic synth. Beard single-origin coffee irony fixie, before they sold
3043

44+
 
45+
 
3146
out Pitchfork kitsch readymade. Helvetica butcher wayfarers, lomo artisan
47+
3248
hashtag Brooklyn four loko fanny pack 90's mustache 8-bit.
3349

3450
Meh jean shorts selfies, crucifix selvage Helvetica Carles PBR Vice
51+
3552
Banksy roof party master cleanse ugh PBR&B. Lo-fi freegan salvia photo
53+
3654
booth, Wes Anderson skateboard Odd Future. Etsy art party Bushwick
55+
3756
keffiyeh. Pork belly 3 wolf moon butcher mustache. YOLO raw denim lo-
57+
3858
fi, hoodie gentrify Schlitz 8-bit sriracha Shoreditch retro brunch.
59+
3960
Williamsburg farm-to-table beard, mlkshk Banksy fap kogi Etsy art party
61+
4062
squid semiotics. XOXO church-key Pitchfork mlkshk irony tote bag.
4163

4264
Farm-to-table brunch tattooed hoodie keytar, literally selvage authentic
65+
4366
trust fund deep v Thundercats Kickstarter narwhal locavore. Swag disrupt
67+
4468
chambray, leggings shabby chic gastropub YOLO plaid hoodie
69+
4570
Williamsburg Godard mixtape. Retro Godard keytar biodiesel, freegan
71+
4672
paleo Etsy you probably haven't heard of them Pitchfork Schlitz
73+
4774
readymade small batch cred. Pug trust fund paleo, 90's fixie typewriter
75+
4876
next level banjo. Banksy occupy authentic master cleanse Bushwick
77+
4978
fingerstache selfies, direct trade craft beer cliche +1 cray. Locavore four
79+
5080
loko biodiesel Neutra chia mlkshk. Fanny pack YOLO Portland, mlkshk
81+
5182
PBR&B single-origin coffee drinking vinegar 8-bit flannel gentrify
83+
5284
stumptown pop-up.
85+
5386
Oh. You need a little dummy text for your mockup? How quaint.
5487

5588
I bet you’re still using Bootstrap too…
5689

5790

58-
5991

Diff for: tests/run_docker_tests.sh

+2-7
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,10 @@
55
cd $(dirname $0)/..
66
base=$(pwd)
77

8-
image="textract/ubuntu12.04"
9-
10-
cp tests/Dockerfile ./Dockerfile
8+
image="textract/ubuntu20.04"
119

1210
# Note: For speed, the image won't be automatically rebuilt. If the dependencies
1311
# change and the existing image is outdated, just delete it with:
1412
# docker rmi <image name>
15-
docker images | grep $image || docker build -t $image .
13+
docker images | grep $image || docker build -t $image -f tests/Dockerfile .
1614
docker run --rm -v $base:/home/textract/src $image
17-
18-
rm ./Dockerfile
19-

0 commit comments

Comments
 (0)