Skip to content

Commit 6f364e1

Browse files
Remove references to host rep and fix grammar
1 parent 6a84d2c commit 6f364e1

1 file changed

Lines changed: 35 additions & 28 deletions

File tree

docs/tutorials/pigeons.md

Lines changed: 35 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,22 @@ import Tabs from '@theme/Tabs';
99
import TabItem from '@theme/TabItem';
1010

1111
## Introduction
12-
In this tutorial we will demonstrate how to use the Julia package [Pigeons.jl](https://pigeons.run/stable/) use parallel tempering to parallelize and improve MCMC inference in TreePPL.
12+
In this tutorial we will demonstrate how to use the Julia package [Pigeons.jl](https://pigeons.run/stable/) to parallelize and improve MCMC inference in TreePPL with parallel tempering.
1313
Parallel tempering is particularly powerful for combinatorial, multimodal and overparametrized posteriors, which abound in statistical phylogenetics.
1414

15-
We will use the constant rate birth death (CRBD) model as the running example.
15+
We will use the [constant rate birth-death](../examples/div.md#constant-rate-birth-death) (CRBD) model as the running example.
1616

1717
### What is parallel tempering and Pigeons?
1818
MCMC algorithms are inherently sequential algorithms, since the next iteration depends explicitly on what we are computing in the current.
1919
This makes it difficult to harness parallel hardware in modern high performance computers to speed up MCMC inference.
2020
Parallel tempering offers one solution to this problem by executing several communicating MCMC chains in parallel.
2121
These chains interpolate between the prior distribution, from which it is easy to sample and explore new regions of parameter space, and the posterior distribution, which is the distribution we want to sample from.
22-
The interpolation is achieved by _heating_ (tempering) the intermediary distribution by raising the likelihood to a power (temperature) between $0$ and $1$ – at $0$ we recover the prior and at $1$ we recover the posterior.
23-
The goal of parallel tempering is thus to effectively move samples from prior to posterior through the intermediary temperatures, and the success hinges both on using an effective communication scheme and making a suitable choice of intermediary temperatures.
22+
The interpolation is achieved by _tempering_ the posterior distribution, raising the model's likelihood to a power (called the temperature) that lies between $0$ and $1$.
23+
At temperature $0$ the likelihood is ignored so we recover the prior, and at temperature $1$ we use the full likelihood and recover the posterior.
24+
The goal of parallel tempering is thus to effectively move samples from prior to posterior through the intermediary temperatures, and the success hinges both on using an effective communication scheme and making suitable choice of intermediary temperatures.
2425

2526
[Pigeons.jl](https://pigeons.run/stable/) is a Julia package for sampling from difficult posterior distributions.
26-
Along with other algorithms for achieving this task, it implements a state-of-the-art variant of parallel tempering called _non-reversible parallel tempering_ that uses an efficient communication scheme and adaptively sets the heating schedule.
27+
Along with other algorithms for achieving this task, it implements a state-of-the-art variant of parallel tempering called _non-reversible parallel tempering_ that uses an efficient communication scheme and adaptively sets the heating schedule to maximize the communication between prior and posterior distributions.
2728
It is this algorithm that we can access with TreePPL models and that we will try out in this tutorial.
2829
With Pigeons it is possible to run parallel tempering both on several cores on the same computer and across several machines on an HPC cluster, for more details on the latter see the [Pigeons documentation on MPI usage](https://pigeons.run/stable/mpi/).
2930
Compared to classical variants of parallel tempering, Pigeon's performance does not deteriorate as the number of chains grows large.
@@ -32,30 +33,30 @@ Compared to classical variants of parallel tempering, Pigeon's performance does
3233

3334
### Installing Pigeons
3435
To proceed in this tutorial we will assume that you have the programming language [Julia](https://julialang.org/) installed on your machine.
35-
The latest version of Pigeons, along with some other packages used in this tutorial, can then be installed by starting the Julia REPL and running
36+
The latest version of Pigeons can then be installed by starting the Julia REPL and running
3637
```julia
3738
import Pkg;
3839
Pkg.install("Pigeons")
3940
```
4041

41-
If you want to plot the samples immediately in Julia you can also install the [MCMCChains](https://turinglang.org/MCMCChains.jl/stable/) and [StatsPlots](https://docs.juliaplots.org/dev/generated/statsplots/) packages, run
42+
If you want to plot the samples immediately in Julia you can also install the [MCMCChains](https://turinglang.org/MCMCChains.jl/stable/) and [StatsPlots](https://docs.juliaplots.org/dev/generated/statsplots/) packages by running
4243
```julia
4344
Pkg.install("StatsPlots", "MCMCChains")
4445
```
4546

4647
### Compiling and running the CRBD model
4748

48-
We first set up the paths to the TreePPL source code for the host repertoire model.
49+
We first set up the paths to the TreePPL source code for the CRBD model.
4950
Here we assumed that the [TreePPL repository](https:/github.com/treeppl/treeppl) is available at `./treeppl` in your working directory, please substitute this with the correct path on your machine.
5051
```julia
5152
# Path to the TreePPL repository
5253
treeppl_path = "./treeppl"
53-
# Path to the subdirectory containing the host repertorie model
54+
# Path to the subdirectory containing the CRBD model
5455
model_dir = "$treeppl_path/models/diversification"
55-
# The variant of the host reperotire model we want to execute
56+
# The name of CRBD model in the TreePPL model repository
5657
model_name = "crbd"
5758

58-
# Path to the host repertoire model
59+
# Path to the CRBD model
5960
model_path = "$(model_dir)/$(model_name).tppl"
6061
# Where to write the compiled binary
6162
bin_path = "$(model_dir)/data/$(model_name).out"
@@ -65,15 +66,15 @@ data_path = "$(model_dir)/data/testdata_$(model_name).json"
6566
output_path = "pigeons_output_dir"
6667
```
6768

68-
Next we want to compile the host repertoire model and create a `TreePPLTarget` struct.
69-
This object is later passed to the Pigeons parallel tempering algorithm and contains necessary information about the distribution you are sampling from.
69+
Next we want to compile the CRBD model and create a `TreePPLTarget` struct.
70+
This struct is later passed to the Pigeons parallel tempering algorithm and contains the information Pigeons need to orchestrate parallel tempering on your TreePPL model.
7071
For more information, try typing `?TreePPLTarget` in the Julia REPL.
71-
To compile the model, Pigeons also needs to have access to the TreePPL compiler; here we assume that it is available as `tpplc` in your `PATH`.
72+
To compile the model Pigeons also needs to have access to the TreePPL compiler, here we assume that it is available as `tpplc` in your `PATH`.
7273

7374
```julia
7475
using Pigeons
7576

76-
# Compile the host repertoire model
77+
# Compile the CRBD model
7778
tppl_bin = Pigeons.tppl_compile_model(
7879
model_path, bin_path;
7980
local_exploration_steps=10, # The number of MCMC moves to perform in TreePPL before swapping temperatures
@@ -84,8 +85,8 @@ tppl_bin = Pigeons.tppl_compile_model(
8485
tppl_target = Pigeons.tppl_construct_target(tppl_bin, data_path, output_path)
8586
```
8687

87-
With the TreePPL target ready, we are ready to launch Pigeons!
88-
The number of rounds determines how many iterations to run: each round successive round double the number of iterations in the previous round
88+
With the `TreePPLTarget` ready, we are ready to launch Pigeons!
89+
The number of rounds determines how many iterations (or scans) to run: each successive round doubles the number of iterations in the previous round.
8990
```julia
9091
pt = pigeons(target = tppl_target, n_rounds = 12, n_chains = 10, multithreaded=true)
9192
Pigeons.kill_child_processes(pt) # Kill the TreePPL processes after we are done
@@ -115,20 +116,20 @@ Pigeons.kill_child_processes(pt) # Kill the TreePPL processes after we are done
115116

116117
The output above gives some other useful information such as an estimate of the log-normalization-constant $\log(Z_1 / Z_0)$ between the prior distribution and the posterior.
117118
This is the model evidence in Bayesian statistics.
118-
An important technical detail is that this estimate is between the _constrained_ prior, i.e. the prior plus any hard constraints such as `weight 0` statements, and the posterior distribution; this will be updated in the future.
119-
This means that log-normalization estimate of Pigeons compared to that of TreePPL's SMC algorithms will be different on problems with hard constraints.
119+
An important technical detail is that this estimate is between the _constrained_ prior, i.e. the prior plus any hard constraints such as `weight 0` statements, and the posterior distribution.
120+
This means that log-normalization estimate of Pigeons compared to that of TreePPL's SMC algorithms will be different on problems with hard constraints; this will be updated in the future.
120121

121122
Another quantity of interest is the _global communication barrier_, $\Lambda$ in the output above, which measures how difficult it is to move samples from the prior to the posterior – a large $\Lambda$ means that it is difficult, a small $\Lambda$ that it is easy.
122-
The global communication barrier also gives a rule of thumb for how many chains are needed; you should use $2\Lambda + 1$ for the algorithm to work efficient.
123-
If you suspect or know that the MCMC kernel on your problem is heavily autocorrelated it can however be beneficial to use more chains than this,
124-
though adding more chains has diminishing returns.
123+
The global communication barrier also gives a rule of thumb for how many chains are needed: you should use $2\Lambda + 1$ for the algorithm to work efficiently.
124+
If you suspect or know that the TreePPL MCMC kernel you are using is heavily autocorrelated on your problem it can be beneficial to use more chains than this; see [Syed et al., 2022](#non-rev2022) for more details.
125125

126-
We of course also want to look at the samples from the cold chain, to do this we first need to compile the samples which are spread across several files into one file
126+
We also want to look at the posterior samples produced by Pigeons and TreePPL, to do this we first need to compile the samples which are spread across several files into one file
127127
```julia
128128
Pigeons.tppl_compile_samples(pt, "compiled_samples.json")
129129
```
130-
We could then analyze the samples using `treeppl-python` or `treepplr`.
131-
However, the CRBD model has very simple outputs – a single Float representing the birth rate parameter – so we will also show a quick and dirty trace and density plot directly in Julia.
130+
The sample file can then be analyzed using either of the companion packages `treeppl-python` or `treepplr`.
131+
However, the CRBD model has very simple outputs – a single float representing the speciation rate parameter – so we will also show a quick and dirty trace and density plot directly in Julia.
132+
This assumes you installed the additional packages in the [installation section](#installing-pigeons).
132133
```julia
133134
using MCMCChains, StatsPlots
134135
# Read the compiled samples file
@@ -143,12 +144,18 @@ plot(ch)
143144
## Further reading
144145

145146
Documentation for the various functions used from [Pigeons.jl](https://pigeons.run/stable/) package is available in the [Pigeons API reference](https://pigeons.run/stable/reference/).
146-
Note that it is also possible (and convenient) to access package documentation directly in the Julia REPL by first typing `?` and then the name of the thing you are interested in learning about, e.g. `?Pigeons.tppl_compile_model` to learn what options are available when compiling a model with Pigeons.
147+
Note that it is also possible, and convenient, to access package documentation directly in the Julia REPL by first typing `?` and then the name of the thing you are interested in learning about, e.g. `?Pigeons.tppl_compile_model` to learn what options are available when compiling a model with Pigeons.
147148

148149
Please browse Pigeon's documentation further to learn more about how to interpret and configure the output from Pigeons.
149150
Have a look at the [documentation on using MPI](https://pigeons.run/stable/mpi/) in particular if you are interested in running Pigeons in a distributed fashion.
150151

151152
## References
152-
Please make sure to acknowledge the awesome work of the Pigeon's team by citing their paper if you use Pigeons in your work.
153+
Please make sure to acknowledge the awesome work of the Pigeons team by citing their paper if you use Pigeons in your work.
153154

154-
Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T., & Bouchard-Côté, A. (2023). Pigeons.jl: Distributed sampling from intractable distributions. [_arXiv:2308.09769_](https://arxiv.org/abs/2308.09769).
155+
<a id="pigeons2023"></a>
156+
Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T., Bouchard-Côté, A., 2023. Pigeons.jl: Distributed Sampling From Intractable Distributions. https://doi.org/10.48550/arXiv.2308.09769
157+
158+
If you want to learn more about the theory of the non-reversible parallel tempering algorithm, you are recommended to read the following reference
159+
160+
<a id="non-rev2022"></a>
161+
Syed, S., Bouchard-Côté, A., Deligiannidis, G., Doucet, A., 2022. Non-reversible parallel tempering: A scalable highly parallel MCMC scheme. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84, 321–350. https://doi.org/10.1111/rssb.12464

0 commit comments

Comments
 (0)