Skip to content

Commit 85ced2e

Browse files
authored
Update to Turing 0.38 (#599)
1 parent 20d304b commit 85ced2e

File tree

10 files changed

+533
-446
lines changed

10 files changed

+533
-446
lines changed

Manifest.toml

+352-359
Large diffs are not rendered by default.

Project.toml

+1-2
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,8 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
4848
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
4949
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
5050
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
51-
TuringBenchmarking = "0db1332d-5c25-4deb-809f-459bc696f94f"
5251
UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
5352
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
5453

5554
[compat]
56-
Turing = "0.37"
55+
Turing = "0.38"

_quarto.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ website:
3232
text: Team
3333
right:
3434
# Current version
35-
- text: "v0.37"
35+
- text: "v0.38"
3636
menu:
3737
- text: Changelog
3838
href: https://turinglang.org/docs/changelog.html
@@ -60,7 +60,7 @@ website:
6060
- usage/custom-distribution/index.qmd
6161
- usage/probability-interface/index.qmd
6262
- usage/modifying-logprob/index.qmd
63-
- usage/generated-quantities/index.qmd
63+
- usage/tracking-extra-quantities/index.qmd
6464
- usage/mode-estimation/index.qmd
6565
- usage/performance-tips/index.qmd
6666
- usage/sampler-visualisation/index.qmd
@@ -190,7 +190,7 @@ using-turing-external-samplers: tutorials/docs-16-using-turing-external-samplers
190190
using-turing-mode-estimation: tutorials/docs-17-mode-estimation
191191
usage-probability-interface: tutorials/usage-probability-interface
192192
usage-custom-distribution: tutorials/usage-custom-distribution
193-
usage-generated-quantities: tutorials/usage-generated-quantities
193+
usage-tracking-extra-quantities: tutorials/tracking-extra-quantities
194194
usage-modifying-logprob: tutorials/usage-modifying-logprob
195195

196196
contributing-guide: developers/contributing

tutorials/bayesian-time-series-analysis/index.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ end
175175
176176
function get_decomposition(model, x, cyclic_features, chain, op)
177177
chain_params = Turing.MCMCChains.get_sections(chain, :parameters)
178-
return generated_quantities(model(x, cyclic_features, op), chain_params)
178+
return returned(model(x, cyclic_features, op), chain_params)
179179
end
180180
181181
function plot_fit(x, y, decomp, ymax)

tutorials/gaussian-mixture-models/index.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -403,7 +403,7 @@ chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initia
403403
Given a sample from the marginalized posterior, these assignments can be recovered with:
404404

405405
```{julia}
406-
assignments = mean(generated_quantities(gmm_recover(x), chains));
406+
assignments = mean(returned(gmm_recover(x), chains));
407407
```
408408

409409
```{julia}

tutorials/gaussian-processes-introduction/index.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ posterior probability of success at any distance we choose:
146146

147147
```{julia}
148148
d_pred = 1:0.2:21
149-
samples = map(generated_quantities(m_post, chn)[1:10:end]) do x
149+
samples = map(returned(m_post, chn)[1:10:end]) do x
150150
return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
151151
end
152152
p = plot()

tutorials/hidden-markov-models/index.qmd

+2-2
Original file line numberDiff line numberDiff line change
@@ -123,11 +123,11 @@ The priors on our transition matrix are noninformative, using `T[i] ~ Dirichlet(
123123
end;
124124
```
125125

126-
We will use a combination of two samplers ([HMC](https://turinglang.org/dev/docs/library/#Turing.Inference.HMC) and [Particle Gibbs](https://turinglang.org/dev/docs/library/#Turing.Inference.PG)) by passing them to the [Gibbs](https://turinglang.org/dev/docs/library/#Turing.Inference.Gibbs) sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters.
126+
We will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters. (For API details of these samplers, please see [Turing.jl's API documentation](https://turinglang.org/Turing.jl/stable/api/Inference/).)
127127

128128
In this case, we use HMC for `m` and `T`, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for `s`, the state sequence. You may wonder why it is that we are not assigning `s` to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.
129129

130-
The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and [NUTS](https://turinglang.org/dev/docs/library/#Turing.Inference.NUTS) won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
130+
The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and NUTS won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
131131

132132
Time to run our sampler.
133133

usage/automatic-differentiation/index.qmd

+20-9
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ Pkg.instantiate();
1414

1515
## Switching AD Modes
1616

17-
Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl), [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Zygote](https://github.com/FluxML/Zygote.jl) for reverse-mode AD.
18-
`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake`, `Zygote`, or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake`, `import Zygote` or `import ReverseDiff`, alongside `using Turing`.
17+
Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl) for reverse-mode AD.
18+
`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake` or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake` or `import ReverseDiff`, alongside the usual `using Turing`.
1919

2020
As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently.
2121
Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`.
@@ -24,18 +24,16 @@ For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler con
2424

2525
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.
2626

27-
28-
2927
Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
3028

3129
Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches.
3230
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data.
3331
However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect.
3432
Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.
3533

36-
For `Zygote`, pass `adtype=AutoZygote()` to the sampler constructor.
34+
The previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` have been removed.
3735

38-
And the previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` are deprecated and removed.
36+
For `Mooncake`, pass `adtype=AutoMooncake(; config=nothing)` to the sampler constructor.
3937

4038
## Compositional Sampling with Differing AD Modes
4139

@@ -70,9 +68,22 @@ Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling
7068
If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is.
7169
Currently, this defaults to `ForwardDiff`.
7270

73-
The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using [`TuringBenchmarking`](https://github.com/TuringLang/TuringBenchmarking.jl):
71+
The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)):
7472

7573
```{julia}
76-
using TuringBenchmarking
77-
benchmark_model(gdemo(1.5, 2), adbackends=[AutoForwardDiff(), AutoReverseDiff()])
74+
using DynamicPPL.TestUtils.AD: run_ad, ADResult
75+
using ForwardDiff, ReverseDiff
76+
77+
model = gdemo(1.5, 2)
78+
79+
for adtype in [AutoForwardDiff(), AutoReverseDiff()]
80+
result = run_ad(model, adtype; benchmark=true)
81+
@show result.time_vs_primal
82+
end
7883
```
84+
85+
In this specific instance, ForwardDiff is clearly faster (due to the small size of the model).
86+
87+
We also have a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/).
88+
These models aim to capture a variety of different Turing.jl features.
89+
If you have suggestions for things to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)!

usage/generated-quantities/index.qmd

-68
This file was deleted.
+152
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
title: Tracking Extra Quantities
3+
engine: julia
4+
aliases:
5+
- ../../tutorials/usage-generated-quantities/index.html
6+
- ../generated-quantities/index.html
7+
---
8+
9+
```{julia}
10+
#| echo: false
11+
#| output: false
12+
using Pkg;
13+
Pkg.instantiate();
14+
```
15+
16+
Often, there are quantities in models that we might be interested in viewing the values of, but which are not random variables in the model that are explicitly drawn from a distribution.
17+
18+
As a motivating example, the most natural parameterization for a model might not be the most computationally feasible.
19+
Consider the following (efficiently reparametrized) implementation of Neal's funnel [(Neal, 2003)](https://arxiv.org/abs/physics/0009028):
20+
21+
```{julia}
22+
using Turing
23+
setprogress!(false)
24+
25+
@model function Neal()
26+
# Raw draws
27+
y_raw ~ Normal(0, 1)
28+
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
29+
30+
# Transform:
31+
y = 3 * y_raw
32+
x = exp.(y ./ 2) .* x_raw
33+
return nothing
34+
end
35+
```
36+
37+
In this case, the random variables exposed in the chain (`x_raw`, `y_raw`) are not in a helpful form — what we're after are the deterministically transformed variables `x` and `y`.
38+
39+
There are two ways to track these extra quantities in Turing.jl.
40+
41+
## Using `:=` (during inference)
42+
43+
The first way is to use the `:=` operator, which behaves exactly like `=` except that the values of the variables on its left-hand side are automatically added to the chain returned by the sampler.
44+
For example:
45+
46+
```{julia}
47+
@model function Neal_coloneq()
48+
# Raw draws
49+
y_raw ~ Normal(0, 1)
50+
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
51+
52+
# Transform:
53+
y := 3 * y_raw
54+
x := exp.(y ./ 2) .* x_raw
55+
end
56+
57+
sample(Neal_coloneq(), NUTS(), 1000)
58+
```
59+
60+
## Using `returned` (post-inference)
61+
62+
Alternatively, one can specify the extra quantities as part of the model function's return statement:
63+
64+
```{julia}
65+
@model function Neal_return()
66+
# Raw draws
67+
y_raw ~ Normal(0, 1)
68+
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
69+
70+
# Transform and return as a NamedTuple
71+
y = 3 * y_raw
72+
x = exp.(y ./ 2) .* x_raw
73+
return (x=x, y=y)
74+
end
75+
76+
chain = sample(Neal_return(), NUTS(), 1000)
77+
```
78+
79+
The sampled chain does not contain `x` and `y`, but we can extract the values using the `returned` function.
80+
Calling this function outputs an array:
81+
82+
```{julia}
83+
nts = returned(Neal_return(), chain)
84+
```
85+
86+
where each element of which is a NamedTuple, as specified in the return statement of the model.
87+
88+
```{julia}
89+
nts[1]
90+
```
91+
92+
## Which to use?
93+
94+
There are some pros and cons of using `returned`, as opposed to `:=`.
95+
96+
Firstly, `returned` is more flexible, as it allows you to track any type of object; `:=` only works with variables that can be inserted into an `MCMCChains.Chains` object.
97+
(Notice that `x` is a vector, and in the first case where we used `:=`, reconstructing the vector value of `x` can also be rather annoying as the chain stores each individual element of `x` separately.)
98+
99+
A drawback is that naively using `returned` can lead to unnecessary computation during inference.
100+
This is because during the sampling process, the return values are also calculated (since they are part of the model function), but then thrown away.
101+
So, if the extra quantities are expensive to compute, this can be a problem.
102+
103+
To avoid this, you will essentially have to create two different models, one for inference and one for post-inference.
104+
The simplest way of doing this is to add a parameter to the model argument:
105+
106+
```{julia}
107+
@model function Neal_coloneq_optional(track::Bool)
108+
# Raw draws
109+
y_raw ~ Normal(0, 1)
110+
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
111+
112+
if track
113+
y = 3 * y_raw
114+
x = exp.(y ./ 2) .* x_raw
115+
return (x=x, y=y)
116+
else
117+
return nothing
118+
end
119+
end
120+
121+
chain = sample(Neal_coloneq_optional(false), NUTS(), 1000)
122+
```
123+
124+
The above ensures that `x` and `y` are not calculated during inference, but allows us to still use `returned` to extract them:
125+
126+
```{julia}
127+
returned(Neal_coloneq_optional(true), chain)
128+
```
129+
130+
Another equivalent option is to use a submodel:
131+
132+
```{julia}
133+
@model function Neal()
134+
y_raw ~ Normal(0, 1)
135+
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
136+
return (x_raw=x_raw, y_raw=y_raw)
137+
end
138+
139+
chain = sample(Neal(), NUTS(), 1000)
140+
141+
@model function Neal_with_extras()
142+
neal ~ to_submodel(Neal(), false)
143+
y = 3 * neal.y_raw
144+
x = exp.(y ./ 2) .* neal.x_raw
145+
return (x=x, y=y)
146+
end
147+
148+
returned(Neal_with_extras(), chain)
149+
```
150+
151+
Note that for the `returned` call to work, the `Neal_with_extras()` model must have the same variable names as stored in `chain`.
152+
This means the submodel `Neal()` must not be prefixed, i.e. `to_submodel()` must be passed a second parameter `false`.

0 commit comments

Comments
 (0)