Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor convergence without tempering #42

Open
samueldmcdermott opened this issue Nov 21, 2023 · 0 comments
Open

poor convergence without tempering #42

samueldmcdermott opened this issue Nov 21, 2023 · 0 comments

Comments

@samueldmcdermott
Copy link
Contributor

A test example of poor convergence for mclmc without tempering is shown in the final section of this notebook. There are posterior samples from 5 different chains shown:

  1. results from dynesty nested sampling (no initialization needed)
  2. results from NUTS/HMC as implemented in numpyro, initialized from final state of the dynesty samples
  3. results from mclmc initialized from final state of the dynesty samples, denoted mchmcd
  4. results from mclmc initialized from final state of the numpyro samples, denoted mchmcn
  5. results from mclmc initialized from all parameters equal 0 (essentially a random point), denoted mchmc0

There's a few things to note:

  1. I'm working from a constrained parameter space and I'd like to have uniform priors on some range. I found that NUTS/HMC was better behaved with "hard" priors where the likelihood went to -inf outside of a given range, but mclmc gave nans with this setup, so I gave it "smooth" priors, which leads to some slight disagreements for parameters that are prior dominated as seen in output of cell 35 in the notebook. (If I have time, I'd like to spend some more time making sure that there are no prior-dominated parameters, but I haven't gotten to this yet, so in the meantime this hard vs soft implementation of priors is why there are some discrepancies on a few posteriors)
  2. The main takeaway from that plot is that the mchmcd and mchmcn posteriors are very similar whereas the mchcm0 posteriors (which are initialized from a bad point) are quite different and spend quite a lot of time far away from the other posteriors (even though I throw out half of the samples as burn-in)
  3. In the following cell you can see that the mchmcd and mchmcn chains explore rather similar log-probability values, differing by DLnP = 15, but the mchmc0 maximum log probability is lower by about 250 (or Dchi^2 = +500 if 2*DLnP is chi^2 distributed). The numpyro and dynesty results are even lower than that in LnP, but because of the different priors I don't think that's a fair comparison.

This doesn't seem to be a bug to me, but @reubenharry suggested I submit an issue since this demonstrates a difference in performance between tempered and non-tempered results. I'm happy to run the mchmc0 chains with different specifications and different approaches to annealing/tempering if it's useful, just let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant