Skip to content
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

Commit 01122db

Browse files
sxjscienceleezu
andauthored
[Numpy] Numpy version of GluonNLP (#1225)
* numpy version * Enable Github Actions * Update unittests.yml * Update unittests.yml * Update setup.py * fix test * Update README.md * Update test_models_bert.py * Update tmpdir * Enable codecov * fix a commit id * Separate codecov per platform * Revert "Update tmpdir" This reverts commit 6625af9. pytest-dev/pytest#1120 * Remove files * add symlinks * update Merge conversion toolkits update unittests by fixing the version update datasets add scripts Delete __init__.py add src update Update setup.py Update setup.py update all tests revise test cases Update unittests.yml Update initializer.py Create preprocessing.py Update __init__.py Update attention_cell.py Update prepare_wmt.py move ubuntu + windows to TODO * Update unittests.yml * fix alpha in sentencepiece * fix bug * update * fix README * Update unittests.yml * Update README.md * update Co-authored-by: Leonard Lausen <[email protected]>
1 parent de7b23d commit 01122db

File tree

146 files changed

+28464
-11
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

146 files changed

+28464
-11
lines changed

.flake8

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[flake8]
2+
max-line-length = 100
3+
max-complexity = 18
4+
exclude = tests,__init__.py

.github/workflows/unittests.yml

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: continuous build
2+
3+
on: [push, pull_request]
4+
5+
defaults:
6+
run:
7+
shell: bash
8+
9+
jobs:
10+
unittest:
11+
runs-on: ${{ matrix.os }}
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
# TODO Add ubuntu test by "ubuntu-latest", Add windows test by using "windows-latest"
16+
os: [macos-latest]
17+
python-version: [ '3.6', '3.7', '3.8' ]
18+
steps:
19+
- name: Checkout repository
20+
uses: actions/checkout@v2
21+
22+
# Install OS specific dependencies
23+
- name: Install Linux dependencies
24+
if: matrix.os == 'ubuntu-latest'
25+
# TODO https://github.com/apache/incubator-mxnet/issues/18293
26+
run: sudo apt-get install libopenblas-dev
27+
28+
- name: Setup python
29+
uses: actions/setup-python@v2
30+
with:
31+
python-version: ${{ matrix.python-version }}
32+
architecture: x64
33+
- name: Install Other Dependencies
34+
run: |
35+
python -m pip install --user --upgrade pip
36+
python -m pip install --user setuptools pytest pytest-cov
37+
python -m pip install --upgrade cython
38+
python -m pip install --pre --user mxnet==2.0.0b20200604 -f https://dist.mxnet.io/python
39+
python -m pip install --user -e .[extras]
40+
- name: Test project
41+
run: |
42+
python -m pytest --cov=./ --cov-report=xml --durations=50 tests/
43+
- name: Upload coverage to Codecov
44+
uses: codecov/codecov-action@v1
45+
with:
46+
env_vars: OS,PYTHON

.gitmodules

-10
This file was deleted.

.pytype.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ inputs =
55
src/gluonnlp
66

77
# Python version (major.minor) of the target code.
8-
python_version = 3.5
8+
python_version = 3.6

README.md

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# GluonNLP + Numpy
2+
3+
Implementing NLP algorithms using the new numpy-like interface of MXNet. It's also a testbed for the next-generation release of GluonNLP.
4+
5+
This is a work-in-progress.
6+
7+
8+
# Features
9+
10+
- Data Pipeline for NLP
11+
- AutoML support (TODO)
12+
- Pretrained Model Zoo
13+
- Fast Deployment
14+
- [TVM](https://tvm.apache.org/) (TODO)
15+
- AWS Integration
16+
17+
18+
# Installation
19+
First of all, install the latest MXNet. You may use the following commands:
20+
21+
```bash
22+
23+
# Install the version with CUDA 10.1
24+
pip install -U --pre mxnet-cu101==2.0.0b20200604 -f https://dist.mxnet.io/python
25+
26+
# Install the cpu-only version
27+
pip install -U --pre mxnet==2.0.0b20200604 -f https://dist.mxnet.io/python
28+
```
29+
30+
31+
To install, use
32+
33+
```bash
34+
pip install -U -e .
35+
36+
# Also, you may install all the extra requirements via
37+
pip install -U -e .[extras]
38+
39+
# In case you are using zsh, try to use the following command for installing
40+
pip install -U -e ."[extras]"
41+
```
42+
43+
If you find that you do not have the permission, you can also install to the user folder:
44+
45+
```bash
46+
pip install -U -e . --user
47+
```
48+
49+
For Windows users, we recommend to use the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/about).
50+
51+
52+
# Access the Command-line Toolkits
53+
54+
To facilitate the researcher and the engineers, we provide command-line-toolkits for
55+
downloading and preprocessing the NLP datasets. For more details, you may refer to
56+
[GluonNLP Datasets](./scripts/datasets) and [GluonNLP Preprocessing Tools](./scripts/preprocess).
57+
58+
```bash
59+
# CLI for downloading / preparing the dataset
60+
nlp_data help
61+
62+
# CLI for accessing some common data preprocessing scripts
63+
nlp_preprocess help
64+
65+
# Also, you can use `python -m` to access the toolkits
66+
python -m gluonnlp.cli.data help
67+
python -m gluonnlp.cli.preprocess help
68+
69+
```
70+
71+
# Run Unittests
72+
You may go to [tests](tests) to see all how to run the unittests.

conftest.py

+208
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
"""conftest.py contains configuration for pytest.
18+
19+
Configuration file for tests in tests/ and scripts/ folders.
20+
21+
Note that fixtures of higher-scoped fixtures (such as ``session``) are
22+
instantiated before lower-scoped fixtures (such as ``function``).
23+
24+
"""
25+
26+
import logging
27+
import os
28+
import random
29+
30+
import numpy as np
31+
import mxnet as mx
32+
import gluonnlp
33+
import pytest
34+
35+
36+
def pytest_sessionfinish(session, exitstatus):
37+
if exitstatus == 5: # Don't fail if no tests were run
38+
session.exitstatus = 0
39+
40+
41+
# * Random seed setup
42+
def pytest_configure():
43+
"""Pytest configuration hook to help reproduce test segfaults
44+
45+
Sets and outputs rng seeds.
46+
47+
The segfault-debug procedure on a module called test_module.py is:
48+
49+
1. run "pytest --verbose test_module.py". A seg-faulting output might be:
50+
51+
[INFO] np, mx and python random seeds = 4018804151
52+
test_module.test1 ... ok
53+
test_module.test2 ... Illegal instruction (core dumped)
54+
55+
2. Copy the module-starting seed into the next command, then run:
56+
57+
MXNET_MODULE_SEED=4018804151 pytest --log-level=DEBUG --verbose test_module.py
58+
59+
Output might be:
60+
61+
[WARNING] **** module-level seed is set: all tests running deterministically ****
62+
[INFO] np, mx and python random seeds = 4018804151
63+
test_module.test1 ... [DEBUG] np and mx random seeds = 3935862516
64+
ok
65+
test_module.test2 ... [DEBUG] np and mx random seeds = 1435005594
66+
Illegal instruction (core dumped)
67+
68+
3. Copy the segfaulting-test seed into the command:
69+
MXNET_TEST_SEED=1435005594 pytest --log-level=DEBUG --verbose test_module.py:test2
70+
Output might be:
71+
72+
[INFO] np, mx and python random seeds = 2481884723
73+
test_module.test2 ... [DEBUG] np and mx random seeds = 1435005594
74+
Illegal instruction (core dumped)
75+
76+
3. Finally reproduce the segfault directly under gdb (might need additional os packages)
77+
by editing the bottom of test_module.py to be
78+
79+
if __name__ == '__main__':
80+
logging.getLogger().setLevel(logging.DEBUG)
81+
test2()
82+
83+
MXNET_TEST_SEED=1435005594 gdb -ex r --args python test_module.py
84+
85+
4. When finished debugging the segfault, remember to unset any exported MXNET_ seed
86+
variables in the environment to return to non-deterministic testing (a good thing).
87+
"""
88+
89+
module_seed_str = os.getenv('MXNET_MODULE_SEED')
90+
if module_seed_str is None:
91+
seed = np.random.randint(0, np.iinfo(np.int32).max)
92+
else:
93+
seed = int(module_seed_str)
94+
logging.warning('*** module-level seed is set: '
95+
'all tests running deterministically ***')
96+
print('Setting module np/mx/python random seeds, '
97+
'use MXNET_MODULE_SEED={} to reproduce.'.format(seed))
98+
99+
np.random.seed(seed)
100+
mx.npx.random.seed(seed)
101+
random.seed(seed)
102+
103+
# The MXNET_TEST_SEED environment variable will override MXNET_MODULE_SEED for tests with
104+
# the 'with_seed()' decoration. Inform the user of this once here at the module level.
105+
if os.getenv('MXNET_TEST_SEED') is not None:
106+
logging.warning('*** test-level seed set: all "@with_seed()" '
107+
'tests run deterministically ***')
108+
109+
110+
@pytest.hookimpl(tryfirst=True, hookwrapper=True)
111+
def pytest_runtest_makereport(item, call):
112+
"""Make test outcome available to fixture.
113+
114+
https://docs.pytest.org/en/latest/example/simple.html#making-test-result-information-available-in-fixtures
115+
"""
116+
# execute all other hooks to obtain the report object
117+
outcome = yield
118+
rep = outcome.get_result()
119+
120+
# set a report attribute for each phase of a call, which can
121+
# be "setup", "call", "teardown"
122+
setattr(item, "rep_" + rep.when, rep)
123+
124+
125+
@pytest.fixture(scope='function', autouse=True)
126+
def function_scope_seed(request):
127+
"""A function scope fixture that manages rng seeds.
128+
129+
This fixture automatically initializes the python, numpy and mxnet random
130+
number generators randomly on every test run.
131+
132+
def test_ok_with_random_data():
133+
...
134+
135+
To fix the seed used for a test case mark the test function with the
136+
desired seed:
137+
138+
@pytest.mark.seed(1)
139+
def test_not_ok_with_random_data():
140+
'''This testcase actually works.'''
141+
assert 17 == random.randint(0, 100)
142+
143+
When a test fails, the fixture outputs the seed used. The user can then set
144+
the environment variable MXNET_TEST_SEED to the value reported, then rerun
145+
the test with:
146+
147+
pytest --verbose -s <test_module_name.py> -k <failing_test>
148+
149+
To run a test repeatedly, install pytest-repeat and add the --count argument:
150+
151+
pip install pytest-repeat
152+
pytest --verbose -s <test_module_name.py> -k <failing_test> --count 1000
153+
154+
"""
155+
156+
seed = request.node.get_closest_marker('seed')
157+
env_seed_str = os.getenv('MXNET_TEST_SEED')
158+
159+
if seed is not None:
160+
seed = seed.args[0]
161+
assert isinstance(seed, int)
162+
elif env_seed_str is not None:
163+
seed = int(env_seed_str)
164+
else:
165+
seed = np.random.randint(0, np.iinfo(np.int32).max)
166+
167+
post_test_state = np.random.get_state()
168+
np.random.seed(seed)
169+
mx.random.seed(seed)
170+
random.seed(seed)
171+
172+
seed_message = ('np/mx/python random seeds are set to '
173+
'{}, use MXNET_TEST_SEED={} to reproduce.')
174+
seed_message = seed_message.format(seed, seed)
175+
176+
# Always log seed on DEBUG log level. This makes sure we can find out the
177+
# value of the seed even if the test case causes a segfault and subsequent
178+
# teardown code is not run.
179+
logging.debug(seed_message)
180+
181+
yield # run the test
182+
183+
if request.node.rep_setup.failed:
184+
logging.info("Setting up a test failed: {}", request.node.nodeid)
185+
elif request.node.rep_call.outcome == 'failed':
186+
# Either request.node.rep_setup.failed or request.node.rep_setup.passed
187+
# should be True
188+
assert request.node.rep_setup.passed
189+
# On failure also log seed on INFO log level
190+
logging.info(seed_message)
191+
192+
np.random.set_state(post_test_state)
193+
194+
195+
# * Shared test fixtures
196+
@pytest.fixture(params=[True, False])
197+
def hybridize(request):
198+
return request.param
199+
200+
201+
@pytest.fixture(autouse=True)
202+
def doctest(doctest_namespace):
203+
doctest_namespace['np'] = np
204+
doctest_namespace['gluonnlp'] = gluonnlp
205+
doctest_namespace['mx'] = mx
206+
doctest_namespace['gluon'] = mx.gluon
207+
import doctest
208+
doctest.ELLIPSIS_MARKER = '-etc-'

pytest.ini

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[pytest]
2+
markers =
3+
seed: set the python, numpy and mxnet random seeds to a specified value for test reproducibility
4+
serial: mark a test that requires more resources to run that are thus only suitable for serial run.
5+
remote_required: mark a test that requires internet access.
6+
gpu: mark a test that requires GPU.
7+
integration: mark an integration test
8+
skip_master: mark a test that is temporarily skipped for mxnet master validation.

scripts/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)