MAINT: Fix build failures and execution timeouts #68

rossbar · 2021-03-16T04:54:21Z

This PR combines a plethora of tweaks and fixes to make the build/test/deploy infrastructure more reliable. The end result should be a reduction in CI failures due to things like resource requirements and execution timeouts.

Note this also includes some changes to the tutorials themselves, most notably the deep learning tutorials, to improve their usability in resource-constrained environments binder. Two changes to tutorial content need special attention:

The evaluation step of the mnist tutorial has been vectorized. This has the advantage of demonstrating vectorization, but also significantly improves run-time performance.
Reducing the number of episodes in the RL tutorial from 10 -> 3 (along with batch size). This is entirely a run-time argument: running the deep-rl tutorial in e.g. CI or binder would take on the order 10 minutes for the training step. Furthermore, since there is no tangible output from the training, I decided to reduce it further to make the tutorial more usable. @8bitmp3 your opinion on whether this is acceptable would be much appreciated.

Most of the other changes are related to minor build fixes (broken links, docs missing from toctrees, etc.) or additional CI infrastructure necessary for new dependencies (particularly atari-py). Once this PR is in we can also cache the tutorial dependencies, which will significantly reduce the build time as we can skip having to build the atari-py wheel from scratch every run. I'd also like to experiment with caching the tutorial execution during the build process, which depends on this as well.

Fixes sphinx warning about orphaned doc. Replicate text from readme in index.

rossbar · 2021-03-16T05:49:51Z

There's also an issue with the atari-py dependency - see #69. b7e8c3c is a temporary workaround for CI.

melissawm

LGTM, @rossbar ! Thanks!

8bitmp3 · 2021-03-20T15:35:35Z

@rossbar that's some magic! Thank you 🙏

Reducing the number of episodes in the RL tutorial from 10 -> 3 (along with batch size). This is entirely a run-time argument: running the deep-rl tutorial in e.g. CI or binder would take on the order 10 minutes for the training step. Furthermore, since there is no tangible output from the training, I decided to reduce it further to make the tutorial more usable. @8bitmp3 your opinion on whether this is acceptable would be much appreciated.

The original 10 steps were too low but we may it clear here: "...let's limit the number of episodes for training to 3. If you are using hardware acceleration (CPUs and GPUs), you can increase the number to 1,000 or beyond."

The evaluation step of the mnist tutorial has been vectorized. This has the advantage of demonstrating vectorization, but also significantly improves run-time performance.

That's amazing.

rossbar added 4 commits March 15, 2021 21:43

Fix broken external link to mybinder.

7e2e6ca

Add pairing tutorial to toctree in site.

876c5fc

Fixes sphinx warning about orphaned doc. Replicate text from readme in index.

Add requirements for rl tutorial.

71d0f26

Bump execution timeout for CI.

0404f51

rossbar changed the title ~~MAINT: Fix build failures related to pairing.md~~ MAINT: Fix build failures from new tutorial additions Mar 16, 2021

Add build dependencies for atari-py.

b7e8c3c

rossbar added 5 commits March 16, 2021 10:27

Bump circleci context timeout limit.

b89cafa

Add ffmpeg dep for atari-py.

7f5f31a

PERF: subsample test image set in mnist tutorial.

b3cb85e

Vectorize model evaluation in mnist tutorial.

3695ab6

Update wording and numbering in code comments.

e65ac70

rossbar mentioned this pull request Mar 16, 2021

ENH: Vectorize model evaluation in mnist tutorial #67

Closed

Only apply one-hot encoding to subsets.

43c9e02

rossbar changed the title ~~MAINT: Fix build failures from new tutorial additions~~ MAINT: Fix build failures and execution timeouts Mar 16, 2021

rossbar added 2 commits March 16, 2021 13:27

Fix broken image links in RL tutorial.

69075f0

PERF: reduce RL episodes and batch size from 10 to 3.

14eb968

melissawm reviewed Mar 17, 2021

View reviewed changes

melissawm merged commit 8d0f765 into numpy:main Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Fix build failures and execution timeouts #68

MAINT: Fix build failures and execution timeouts #68

rossbar commented Mar 16, 2021 •

edited

Loading

rossbar commented Mar 16, 2021

melissawm left a comment

8bitmp3 commented Mar 20, 2021

MAINT: Fix build failures and execution timeouts #68

MAINT: Fix build failures and execution timeouts #68

Conversation

rossbar commented Mar 16, 2021 • edited Loading

rossbar commented Mar 16, 2021

melissawm left a comment

Choose a reason for hiding this comment

8bitmp3 commented Mar 20, 2021

rossbar commented Mar 16, 2021 •

edited

Loading