Skip to content

Implement AD testing and benchmarking (hand rolled) #882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 14, 2025
Merged

Conversation

penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Apr 4, 2025

One of two options. The other one at #883.

This PR implements functionality for testing and benchmarking AD. It is largely copied over from my ModelTests repo where I've been playing around with this.

Closes #869

What does it contain?

It basically adds one function DynamicPPL.TestUtils.AD.run_ad. See the docstring for more info.

Why not an extension?

The only new dependencies are Statistics, which is stdlib, and Chairmarks, which itself has no non-stdlib dependencies. I therefore consider it unnecessary to add an extension (which would bring a number of drawbacks, e.g. reduced discoverability as users have to load the trigger packages themselves, us having to faff around with functions declared in src/ and extended in ext/, ...)

Why do I like this one more?

See #883.

Copy link
Contributor

github-actions bot commented Apr 4, 2025

Benchmark Report for Commit b107b92

Computer Information

Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  9.3 |                 1.6 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                601.0 |                43.1 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                423.7 |                46.3 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1180.2 |                29.1 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               3827.5 |                20.5 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1442.6 |                29.1 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |                919.0 |                 5.3 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5402.5 |                 4.1 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1080.6 |                 8.3 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              59302.8 |                 3.7 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8930.4 |                 9.6 |
|               Dynamic |        10 |    mooncake |             typed |   true |                133.8 |                11.8 |
|              Submodel |         1 |    mooncake |             typed |   true |                 25.0 |                 8.0 |
|                   LDA |        12 | reversediff |             typed |   true |                382.9 |                 6.1 |

Copy link

codecov bot commented Apr 4, 2025

Codecov Report

Attention: Patch coverage is 68.96552% with 9 lines in your changes missing coverage. Please review.

Project coverage is 84.80%. Comparing base (c7bdc3f) to head (ef5a1ce).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/test_utils/ad.jl 68.96% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #882      +/-   ##
==========================================
- Coverage   84.92%   84.80%   -0.13%     
==========================================
  Files          34       35       +1     
  Lines        3814     3843      +29     
==========================================
+ Hits         3239     3259      +20     
- Misses        575      584       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link

coveralls commented Apr 4, 2025

Pull Request Test Coverage Report for Build 14448023153

Details

  • 0 of 28 (0.0%) changed or added relevant lines in 1 file are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 84.892%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/test_utils/ad.jl 0 28 0.0%
Files with Coverage Reduction New Missed Lines %
src/varinfo.jl 3 83.83%
Totals Coverage Status
Change from base Build 14392526425: -0.1%
Covered Lines: 3259
Relevant Lines: 3839

💛 - Coveralls

@penelopeysm penelopeysm mentioned this pull request Apr 8, 2025
4 tasks
@yebai
Copy link
Member

yebai commented Apr 14, 2025

Thanks, @penelopeysm. This looks good!

@penelopeysm penelopeysm enabled auto-merge April 14, 2025 14:19
@penelopeysm penelopeysm added this pull request to the merge queue Apr 14, 2025
Merged via the queue into main with commit 60ee68e Apr 14, 2025
15 of 19 checks passed
@penelopeysm penelopeysm deleted the py/adtest1 branch April 14, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AD testing
4 participants