Update to Turing 0.38 (#599)

penelopeysm · web-flow · commit 85ced2e836c6 · 2025-05-13T10:30:31.000+01:00
diff --git a/Manifest.toml b/Manifest.toml
diff --git a/Project.toml b/Project.toml
@@ -48,9 +48,8 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
 StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
 StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
 Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
-TuringBenchmarking = "0db1332d-5c25-4deb-809f-459bc696f94f"
 UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
 Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
 
 [compat]
-Turing = "0.37"
+Turing = "0.38"
diff --git a/_quarto.yml b/_quarto.yml
@@ -32,7 +32,7 @@ website:
         text: Team
     right:
       # Current version
-      - text: "v0.37"
+      - text: "v0.38"
         menu:
           - text: Changelog
             href: https://turinglang.org/docs/changelog.html
@@ -60,7 +60,7 @@ website:
             - usage/custom-distribution/index.qmd
             - usage/probability-interface/index.qmd
             - usage/modifying-logprob/index.qmd
-            - usage/generated-quantities/index.qmd
+            - usage/tracking-extra-quantities/index.qmd
             - usage/mode-estimation/index.qmd
             - usage/performance-tips/index.qmd
             - usage/sampler-visualisation/index.qmd
@@ -190,7 +190,7 @@ using-turing-external-samplers: tutorials/docs-16-using-turing-external-samplers
 using-turing-mode-estimation: tutorials/docs-17-mode-estimation
 usage-probability-interface: tutorials/usage-probability-interface
 usage-custom-distribution: tutorials/usage-custom-distribution
-usage-generated-quantities: tutorials/usage-generated-quantities
+usage-tracking-extra-quantities: tutorials/tracking-extra-quantities
 usage-modifying-logprob: tutorials/usage-modifying-logprob
 
 contributing-guide: developers/contributing
diff --git a/tutorials/bayesian-time-series-analysis/index.qmd b/tutorials/bayesian-time-series-analysis/index.qmd
@@ -175,7 +175,7 @@ end
 
 function get_decomposition(model, x, cyclic_features, chain, op)
     chain_params = Turing.MCMCChains.get_sections(chain, :parameters)
-    return generated_quantities(model(x, cyclic_features, op), chain_params)
+    return returned(model(x, cyclic_features, op), chain_params)
 end
 
 function plot_fit(x, y, decomp, ymax)
diff --git a/tutorials/gaussian-mixture-models/index.qmd b/tutorials/gaussian-mixture-models/index.qmd
@@ -403,7 +403,7 @@ chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initia
 Given a sample from the marginalized posterior, these assignments can be recovered with:
 
 ```{julia}
-assignments = mean(generated_quantities(gmm_recover(x), chains));
+assignments = mean(returned(gmm_recover(x), chains));
 ```
 
 ```{julia}
diff --git a/tutorials/gaussian-processes-introduction/index.qmd b/tutorials/gaussian-processes-introduction/index.qmd
@@ -146,7 +146,7 @@ posterior probability of success at any distance we choose:
 
 ```{julia}
 d_pred = 1:0.2:21
-samples = map(generated_quantities(m_post, chn)[1:10:end]) do x
+samples = map(returned(m_post, chn)[1:10:end]) do x
     return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
 end
 p = plot()
diff --git a/tutorials/hidden-markov-models/index.qmd b/tutorials/hidden-markov-models/index.qmd
@@ -123,11 +123,11 @@ The priors on our transition matrix are noninformative, using `T[i] ~ Dirichlet(
 end;
 ```
 
-We will use a combination of two samplers ([HMC](https://turinglang.org/dev/docs/library/#Turing.Inference.HMC) and [Particle Gibbs](https://turinglang.org/dev/docs/library/#Turing.Inference.PG)) by passing them to the [Gibbs](https://turinglang.org/dev/docs/library/#Turing.Inference.Gibbs) sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters.
+We will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters. (For API details of these samplers, please see [Turing.jl's API documentation](https://turinglang.org/Turing.jl/stable/api/Inference/).)
 
 In this case, we use HMC for `m` and `T`, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for `s`, the state sequence. You may wonder why it is that we are not assigning `s` to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.
 
-The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and [NUTS](https://turinglang.org/dev/docs/library/#Turing.Inference.NUTS) won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
+The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and NUTS won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
 
 Time to run our sampler.
 
diff --git a/usage/automatic-differentiation/index.qmd b/usage/automatic-differentiation/index.qmd
@@ -14,8 +14,8 @@ Pkg.instantiate();
 
 ## Switching AD Modes
 
-Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl), [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Zygote](https://github.com/FluxML/Zygote.jl) for reverse-mode AD.
-`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake`, `Zygote`, or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake`, `import Zygote` or `import ReverseDiff`, alongside `using Turing`.
+Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl) for reverse-mode AD.
+`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake` or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake` or `import ReverseDiff`, alongside the usual `using Turing`.
 
 As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently.
 Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`.
@@ -24,18 +24,16 @@ For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler con
 
 For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.
 
-
-
 Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
 
 Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches.
 For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data.
 However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect.
 Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.
 
-For `Zygote`, pass `adtype=AutoZygote()` to the sampler constructor.
+The previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` have been removed.
 
-And the previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` are deprecated and removed.
+For `Mooncake`, pass `adtype=AutoMooncake(; config=nothing)` to the sampler constructor.
 
 ## Compositional Sampling with Differing AD Modes
 
@@ -70,9 +68,22 @@ Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling
 If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is.
 Currently, this defaults to `ForwardDiff`.
 
-The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using [`TuringBenchmarking`](https://github.com/TuringLang/TuringBenchmarking.jl):
+The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)):
 
 ```{julia}
-using TuringBenchmarking
-benchmark_model(gdemo(1.5, 2), adbackends=[AutoForwardDiff(), AutoReverseDiff()])
+using DynamicPPL.TestUtils.AD: run_ad, ADResult
+using ForwardDiff, ReverseDiff
+
+model = gdemo(1.5, 2)
+
+for adtype in [AutoForwardDiff(), AutoReverseDiff()]
+    result = run_ad(model, adtype; benchmark=true)
+    @show result.time_vs_primal
+end
 ```
+
+In this specific instance, ForwardDiff is clearly faster (due to the small size of the model).
+
+We also have a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/).
+These models aim to capture a variety of different Turing.jl features.
+If you have suggestions for things to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)!
diff --git a/usage/generated-quantities/index.qmd b/usage/generated-quantities/index.qmd
diff --git a/usage/tracking-extra-quantities/index.qmd b/usage/tracking-extra-quantities/index.qmd
@@ -0,0 +1,152 @@
+---
+title: Tracking Extra Quantities
+engine: julia
+aliases:
+  - ../../tutorials/usage-generated-quantities/index.html
+  - ../generated-quantities/index.html
+---
+
+```{julia}
+#| echo: false
+#| output: false
+using Pkg;
+Pkg.instantiate();
+```
+
+Often, there are quantities in models that we might be interested in viewing the values of, but which are not random variables in the model that are explicitly drawn from a distribution.
+
+As a motivating example, the most natural parameterization for a model might not be the most computationally feasible.
+Consider the following (efficiently reparametrized) implementation of Neal's funnel [(Neal, 2003)](https://arxiv.org/abs/physics/0009028):
+
+```{julia}
+using Turing
+setprogress!(false)
+
+@model function Neal()
+    # Raw draws
+    y_raw ~ Normal(0, 1)
+    x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
+
+    # Transform:
+    y = 3 * y_raw
+    x = exp.(y ./ 2) .* x_raw
+    return nothing
+end
+```
+
+In this case, the random variables exposed in the chain (`x_raw`, `y_raw`) are not in a helpful form — what we're after are the deterministically transformed variables `x` and `y`.
+
+There are two ways to track these extra quantities in Turing.jl.
+
+## Using `:=` (during inference)
+
+The first way is to use the `:=` operator, which behaves exactly like `=` except that the values of the variables on its left-hand side are automatically added to the chain returned by the sampler.
+For example:
+
+```{julia}
+@model function Neal_coloneq()
+    # Raw draws
+    y_raw ~ Normal(0, 1)
+    x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
+
+    # Transform:
+    y := 3 * y_raw
+    x := exp.(y ./ 2) .* x_raw
+end
+
+sample(Neal_coloneq(), NUTS(), 1000)
+```
+
+## Using `returned` (post-inference)
+
+Alternatively, one can specify the extra quantities as part of the model function's return statement:
+
+```{julia}
+@model function Neal_return()
+    # Raw draws
+    y_raw ~ Normal(0, 1)
+    x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
+
+    # Transform and return as a NamedTuple
+    y = 3 * y_raw
+    x = exp.(y ./ 2) .* x_raw
+    return (x=x, y=y)
+end
+
+chain = sample(Neal_return(), NUTS(), 1000)
+```
+
+The sampled chain does not contain `x` and `y`, but we can extract the values using the `returned` function.
+Calling this function outputs an array:
+
+```{julia}
+nts = returned(Neal_return(), chain)
+```
+
+where each element of which is a NamedTuple, as specified in the return statement of the model.
+
+```{julia}
+nts[1]
+```
+
+## Which to use?
+
+There are some pros and cons of using `returned`, as opposed to `:=`.
+
+Firstly, `returned` is more flexible, as it allows you to track any type of object; `:=` only works with variables that can be inserted into an `MCMCChains.Chains` object.
+(Notice that `x` is a vector, and in the first case where we used `:=`, reconstructing the vector value of `x` can also be rather annoying as the chain stores each individual element of `x` separately.)
+
+A drawback is that naively using `returned` can lead to unnecessary computation during inference.
+This is because during the sampling process, the return values are also calculated (since they are part of the model function), but then thrown away.
+So, if the extra quantities are expensive to compute, this can be a problem.
+
+To avoid this, you will essentially have to create two different models, one for inference and one for post-inference.
+The simplest way of doing this is to add a parameter to the model argument:
+
+```{julia}
+@model function Neal_coloneq_optional(track::Bool)
+    # Raw draws
+    y_raw ~ Normal(0, 1)
+    x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
+
+    if track
+        y = 3 * y_raw
+        x = exp.(y ./ 2) .* x_raw
+        return (x=x, y=y)
+    else
+        return nothing
+    end
+end
+
+chain = sample(Neal_coloneq_optional(false), NUTS(), 1000)
+```
+
+The above ensures that `x` and `y` are not calculated during inference, but allows us to still use `returned` to extract them:
+
+```{julia}
+returned(Neal_coloneq_optional(true), chain)
+```
+
+Another equivalent option is to use a submodel:
+
+```{julia}
+@model function Neal()
+    y_raw ~ Normal(0, 1)
+    x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
+    return (x_raw=x_raw, y_raw=y_raw)
+end
+
+chain = sample(Neal(), NUTS(), 1000)
+
+@model function Neal_with_extras()
+    neal ~ to_submodel(Neal(), false)
+    y = 3 * neal.y_raw
+    x = exp.(y ./ 2) .* neal.x_raw
+    return (x=x, y=y)
+end
+
+returned(Neal_with_extras(), chain)
+```
+
+Note that for the `returned` call to work, the `Neal_with_extras()` model must have the same variable names as stored in `chain`.
+This means the submodel `Neal()` must not be prefixed, i.e. `to_submodel()` must be passed a second parameter `false`.