bert_ner_tf_serving_estimator

Repositiory with experimental code for solving NER (Named Entity Recognition) task with BERT (Bidirectional Encoder Representations from Transformers) and CRF (Conditional Random Fields).

Use cases of the project

If you want to reuse ready-to-go model of BERT for NER prediction in TF Serving context.
If you want to train your own NER-model or uptrain existing.
If you have your own BIO dataset and would like to fit the model for it.

Reuse BERT NER model for Predictions

Reuse with TF Serving

under your virtual environment run pip install -r requirements.txt
instal tensorflow_serving
From project folder launch docker image:

sudo docker run -t --rm -p 8501:8501 -v "${PWD}/res:/models" -e MODEL_NAME='BERT_NER_ESTIMATOR' -e MODEL_PATH='/models/BERT_NER_ESTIMATOR' --name='BERT_NER_ESTIMATOR' tensorflow/serving

Now you can poll Estimator with curl:

curl -d '{"instances": [{"input_ids": [212, 14, 513, 3,11], "input_masks": [1,1,1,1,1], "y_masks": [1,1,1,1,1]}]}' -X POST http://localhost:8501/v1/models/BERT_NER_ESTIMATOR:predict

Output:

{
    "predictions": [["B-PER", "O", "O", "B-PER", "O"]
    ]
}

Reuse in testing mode

If you would like to test component predictions with easy to debug python scripts:

python ner_predict.py

Prepare custom dataset for training by BERT NER Estimator

TF Estimator is a core of the BERT Ner component, unfortunately BERT reuires specific preprocessing implemented in bert_ner_preprocessor.py. So if you have data in CONLL-2003 NER format (suppose it is placed in file <BIO_DATASET_PATH>) you need to cnvert it to tensorflow friendly tfrecord format.

Convert dataset from BIO-markup into TF records dataset

General usage: cat <BIO_DATASET_PATH> | ./bio2tf.py <TFRECORD_DATASET_PATH>

Examples:

cat data/train.txt | ./bio2tf.py data/train.tfrecord

cat data/valid.txt | ./bio2tf.py data/valid.tfrecord

UpTrain existing model

If you want to fit the component for your data (supposing you have the same set of entites to be predicted by NER) you can convert your dataset into tfrecord format. And when you have tfrecord dataset (placed in <TFRECORD_DATASET_PATH>) you can launch training process by following command:

python ner_train.py --train_dataset data/train.tfrecord --model_save_path res/BERT_NER_ESTIMATOR --training_steps 2

python ner_train.py --batch_size 29 --model_save_path res/BERT_NER_ESTIMATOR --train_dataset data/train_lowercased.tfrecord --training_steps 2

Training own component

If you want to create a new model CRF and fit it for data you can specify model_save_path attribute in CLI options python ner_train.py --train_dataset data/train.tfrecord --model_save_path res/my_NER_estimator --training_steps 20

Evaluation

python ner_evaluate.py --dataset data/valid.tfrecord

Data Manipulation

Dataset lowercaser

If you want to lowercase dataset from BIO markup you can use script dataset_lowercaser.py.

dataset_lowercaser.py --help for hints of usage

Example:

python dataset_lowercaser.py --cased_dataset data/train.txt --target_path data/train_lowercased.txt

If you want undeterministic lowercasing you can run with option lowercasing_probability:

python dataset_lowercaser.py --cased_dataset data/train.txt --target_path data/train_lowercased.txt --lowercasing_probability=0.8

Useful Links:

https://guillaumegenthial.github.io/serving-tensorflow-estimator.html

https://medium.com/@yuu.ishikawa/serving-pre-modeled-and-custom-tensorflow-estimator-with-tensorflow-serving-12833b4be421

Docker & DevOps Cheatsheet

sudo docker ps

sudo docker stop BERT_NER_ESTIMATOR

curl -d '{"instances": [{"input_ids": [212, 14, 513, 3,11], "input_masks": [1,1,1,1,1], "y_masks": [1,1,1,1,1]}]}' -X POST http://localhost:8501/v1/models/BERT_NER_ESTIMATOR:predict

sudo docker run -t --rm -p 8501:8501 -v "${PWD}/resources/models_for_serving:/models" -e MODEL_NAME='BERT_NER_ESTIMATOR' -e MODEL_PATH='/models/BERT_NER_ESTIMATOR' --name='BERT_NER_ESTIMATOR' tensorflow/serving

tensorboard --logdir=/home/alx/Workspace/dp_bert_ner/

sudo docker run -t --rm -p 8501:8501 -v "${PWD}/res:/models" -e MODEL_NAME='BERT_NER_ESTIMATOR' -e MODEL_PATH='/models/BERT_NER_ESTIMATOR' --name='BERT_NER_ESTIMATOR' tensorflow/serving

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
model		model
tools4augmentation		tools4augmentation
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
bert_ner_core.py		bert_ner_core.py
bert_ner_preprocessor.py		bert_ner_preprocessor.py
bio2tf.py		bio2tf.py
get_bert_estimator.py		get_bert_estimator.py
ner_train.py		ner_train.py
requirements.txt		requirements.txt
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bert_ner_tf_serving_estimator

Use cases of the project

Reuse BERT NER model for Predictions

Reuse with TF Serving

Reuse in testing mode

Prepare custom dataset for training by BERT NER Estimator

Convert dataset from BIO-markup into TF records dataset

Examples:

UpTrain existing model

Training own component

Evaluation

Data Manipulation

Dataset lowercaser

Useful Links:

Docker & DevOps Cheatsheet

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bert_ner_tf_serving_estimator

Use cases of the project

Reuse BERT NER model for Predictions

Reuse with TF Serving

Reuse in testing mode

Prepare custom dataset for training by BERT NER Estimator

Convert dataset from BIO-markup into TF records dataset

Examples:

UpTrain existing model

Training own component

Evaluation

Data Manipulation

Dataset lowercaser

Useful Links:

Docker & DevOps Cheatsheet

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages