Skip to content

Commit 05f332d

Browse files
committed
Added a script to fetch models and changed pipeline to use model/tokenizer loaded in from a local file w/o cacheing in
~/.cache in addition to a Dockerfile to build an image to run the question answering API.
1 parent fb4e3d6 commit 05f332d

7 files changed

+111
-15
lines changed

.dockerignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
models

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
models

Dockerfile

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Pull an image of Ubuntu 20.04 as the base for this container.
2+
FROM ubuntu:20.04
3+
4+
# Set /app to be the root directory of the container.
5+
WORKDIR /app
6+
7+
# Let's go ahead and copy requirements.txt into container.
8+
COPY ./requirements.txt /app/requirements.txt
9+
10+
# Now we install our dependencies. Don't need sudo. We're root.
11+
RUN apt-get update -y && \
12+
apt-get install -y python3-pip python3-dev && \
13+
pip3 install -r requirements.txt
14+
15+
# Copy the rest of the code in after we've downloaded
16+
# and installed the dependencies. That way code changes
17+
# will not require reinstalling PyTorch for every typo.
18+
COPY . /app
19+
20+
# This is the program that is run as an entrypoint to container
21+
# I often comment this part out when testing so I can run:
22+
# docker run -it image_name:version_tag /bin/bash
23+
# and take a look around inside the container.
24+
ENTRYPOINT ["python3"]
25+
26+
# We're going to execute python3 question_answering_api.py
27+
# like we are in the /app directory whenever we use this container.
28+
# This also needs to be commented out to run:
29+
# docker run -it image_name:version_tag /bin/bash
30+
# for debugging purposes. Uncommenting and running:
31+
# docker build -t image_name:version_tag .
32+
# will not take long at all unless you change requirements.txt.
33+
CMD ["question_answering_api.py"]

README.md

+41-13
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,11 @@
33
Hello all! This is a little example of using :hugs: [huggingface transformers](https://github.com/huggingface/transformers) and [Flask-RESTful](https://flask-restful.readthedocs.io/en/latest/index.html) to create a question answering API.
44

55
### Install
6-
1. The only requirements are [Git](https://www.digitalocean.com/community/tutorials/how-to-install-git-on-ubuntu-20-04) and [Python3](https://docs.python-guide.org/starting/install3/linux/) with [pip](https://pip.pypa.io/en/stable/installing/) installed in a Linux environment. If you are using Windows I recommend [installing Ubuntu for Windows](https://ubuntu.com/tutorials/ubuntu-on-windows). If you don't have pip installed, you can open a terminal and enter:
7-
```bash
8-
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
9-
python3 get-pip.py
10-
```
11-
2. In the same or a new terminal enter:
12-
```bash
13-
cd /path/to/question_answering_api # Where ever you forked it to. I don't know!
14-
python3 -m pip install requirements.txt
15-
```
16-
and that should install the requirements for the Question Answering API.
6+
The only real requirement is a Linux environment. If you are using Windows I recommend [installing Ubuntu for Windows](https://ubuntu.com/tutorials/ubuntu-on-windows). To install the needed software dependencies run:
7+
```bash
8+
cd /path/to/question_answering_api
9+
bash install_dependencies.sh
10+
```
1711

1812
### Usage
1913
1. #### Start the API server
@@ -54,9 +48,43 @@ Context:
5448
largest and most biodiverse tract of tropical rainforest
5549
in the world, with an estimated 390 billion individual
5650
trees divided into 16,000 species.
57-
5851
Question:
5952
Which name is also used to describe the Amazon rainforest in English?
6053
Answer:
6154
Amazonia.
62-
```
55+
```
56+
57+
### Docker
58+
59+
To run the API inside a container you need to take the following steps:
60+
1. #### Install docker
61+
Follow the instructions [here](https://docs.docker.com/engine/install/) to install docker on your system.
62+
2. #### Download the model and tokenizer
63+
We don't want to put large machine learning models inside our containers if we don't have to, so we fetch the models from huggingface.co so we can mount them inside a volume for docker. Open a terminal and run:
64+
```bash
65+
cd /path/to/question_answering_api
66+
bash fetch_model.sh
67+
```
68+
This will pull the model and save it to a directory we can mount as a volume for our container.
69+
3. #### Build the container
70+
In the same or a new terminal, run:
71+
```bash
72+
cd /path/to/question_answering_api # Optional if you're in repo root already.
73+
# Build container and name image qa-api with version tag v1.
74+
docker build -t qa-api:v1
75+
```
76+
4. #### Start the container
77+
In the same terminal, type in:
78+
```bash
79+
docker run \
80+
-p 5000:5000 \ # Map port 5000 in container to port localhost:5000
81+
-v /path/to/question_answering_api/models:/app/models \ # Use abspath!
82+
qa-api:v1
83+
```
84+
5. #### Run the client
85+
In a new terminal window (just like before, we need two open), run the following:
86+
```bash
87+
# Make sure you're in repo root!
88+
python3 question_answering_api.py
89+
```
90+
And that's it! If you want to host your container in the cloud now it's as easy as saying `docker push`.

install_dependencies.sh

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#! /bin/bash
2+
3+
echo "Installing software dependencies"
4+
echo "Requires sudo..."
5+
sudo apt-get update -y
6+
sudo apt-get install -y \
7+
git-lfs \
8+
python3-pip \
9+
python3-dev
10+
pip3 install -r requirements.txt
11+
12+
echo "Fetching huggginface question answering model..."
13+
mkdir models
14+
cd models
15+
16+
echo "It complains that git lfs clone is the same as git clone"
17+
echo "but it isn't"
18+
MODEL_NAME=distilbert-base-cased-distilled-squad
19+
git lfs clone https://huggingface.co/$MODEL_NAME
20+
echo "Model and tokenizer have been downloaded to models/${MODEL_NAME}!"
21+
echo "Have a great day!"
22+
# Move back into the repo root
23+
cd -

question_answering_api.py

+11-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22
import time
33
from flask import Flask, request
44
from flask_restful import Resource, Api
5-
from transformers import pipeline
5+
from transformers import (
6+
pipeline,
7+
DistilBertTokenizerFast,
8+
DistilBertForQuestionAnswering
9+
)
610

711
# Initialize API.
812
app = Flask(__name__)
@@ -14,8 +18,13 @@ def load_model():
1418
Create a transformers pipeline for question answering inference.
1519
'''
1620
print(' * Loading model...')
21+
model_dir = 'models'
22+
model_name = 'distilbert-base-cased-distilled-squad'
23+
model_path = f'./{model_dir}/{model_name}'
1724
start = time.time()
18-
nlp = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')
25+
tokenizer = DistilBertTokenizerFast.from_pretrained(model_path)
26+
model = DistilBertForQuestionAnswering.from_pretrained(model_path)
27+
nlp = pipeline('question-answering', model=model, tokenizer=tokenizer)
1928
print(f' * Model loaded in {time.time()-start} seconds!')
2029
return nlp
2130

requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
numpy
12
Flask
23
Flask-RESTful
34
transformers

0 commit comments

Comments
 (0)