-
Notifications
You must be signed in to change notification settings - Fork 2
Docs refresh #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs refresh #122
Conversation
8da26da to
cd234f4
Compare
c9dd5f8 to
b71ba93
Compare
|
Hi @mattwthompson, thanks for agreeing to a review! This is obviously a massive changeset and it's going to keep getting bigger, but at this stage I'd just like a review on chapters 1-4 of the "Pablo Book" and any API docs linked therein. I'd specifically like a sanity check on:
Spelling and grammar mistakes would be great if you spot them, but I'd prefer a user-like higher level read for information content at this stage and can go over everything with a fine toothed comb when it's more settled. Feel free to just read the rendered docs and leave a big comment if it's easier! Thanks!! EDIT: Oh, and I know the PDB files I use in the demo code are a bit of a mess (you'll see when you load the page), planning on tidying them up later! |
updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.10 → v0.12.11](astral-sh/ruff-pre-commit@v0.12.10...v0.12.11) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.11 → v0.12.12](astral-sh/ruff-pre-commit@v0.12.11...v0.12.12) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.12 → v0.13.0](astral-sh/ruff-pre-commit@v0.12.12...v0.13.0) * Update monthly --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthew W. Thompson <[email protected]>
updates: - [github.com/astral-sh/ruff-pre-commit: v0.13.0 → v0.13.1](astral-sh/ruff-pre-commit@v0.13.0...v0.13.1)
updates: - [github.com/astral-sh/ruff-pre-commit: v0.13.1 → v0.13.3](astral-sh/ruff-pre-commit@v0.13.1...v0.13.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
953308c to
b63765d
Compare
mattwthompson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking great! I would be lying if I said I was excited to read the nitty gritty on loading PDB files and custom modifications therein, but I found this easy to get through. I feel like I understand the tool better now and would be better equipped to leverage the API.
I went through the first four chapters in detail and the API docs with more of a skim. You'll see that most of my comments are a mixture of me not being less of a battle-warn biomolecular modeling practitioner than you and possibly being to nitpicky on some details you're already planning to smooth over.
Comments on "the book" first:
- The first example, loading 2ZUQ, emits a warning about atom ordering. I wouldn't note this except that it comes immediately after telling the user that Pablo "just works" (except for a separate case).
- " ... as long as the ligands are complete and named in the standard way." Complete as in not missing hydrogens? And what's the standard way? (If I should already know both answers, take this as gaps in my background.)
- Multi-model warning with 1A4T, can this be worked around or acknowledged?
- Why does the 5AP1 ligand need
additional_definitionsbut 3H34 "just works" without it? - In the beginning of Chapter 4, custom residue definitions are reference without, I think, fully being introduced. Reordering the next chapter is probably unadvised
- Many (all?) of the links to API methods/functions are filled out the first time one is mentioned and not the later. Hassle, I know, but having them all point to the API docs would be nice.
- 4.2.1 - At first I wondered if this handled protonation and deprotonation, since the example only covers adding protons. The text itself is clear enough that it covers both (assuming I understand the chemistry correctly) but it might be helpful to make this doubly explicit or also include a deprotonation example.
- How would
with_vsite_watermeaningfully "apply to residues added later" ? I understand how this might be relevant for virtual sites on ligands or proteins, but not water. Doesn't that method already do everything the user is likely to need with 4+-site waters? - I struggled to understand what "The patch is a function from a single residue definition to a list of residue definitions." means in prose. The type annotation helps, but I had to look elsewhere for it
- 4.3 - At first I was pretty confused what novel behavior this was showing (I'm right in assuming the CCD would handle this just fine?). It might be useful to make explicit that this is just showing off a way to use a more minimal residue library. If I'm understanding it correctly that is
API/UX questions:
- How often is
$XDG_CACHE_HOMEoverriden from the default? It read at first like an important implementation detail of managing the cache but now seems like something in which the default works fine except for users who might override it. - What's the motivation of the
formatargument? The behavior of the default value covers all of what I'd expect a reasonable user to provide STD_CCD_CACHE.with_looks odd - what's with the trailing underscore?- There are plenty of
with_xmethods and it was confusing at first that some are onCcdCacheand others are onResidueDefinition - High-level docstring of
CcdCache.withoutconfused me at first since, even though the argument takes residue names, the effect is removing residue definitions. At first, I thought there was an asymmetry compared to.with_( - In general, the one-line docstrings of
CcdCache.withXare a little inconsistent. It bears repeating in each, I think, that a newCcdCacheis returned
|
Thanks for your comments Matt! I addressed them today:
Have changed to a different PDB that doesn't have the warning
Yes complete as in not missing hydrogens - have made this more explicit. The standard way is just whatever it says in the CCD, I discuss it in the last section of the quickstart, but I've tried to make this clearer.
Have acknowledged it!
Clarified that this refers to the same standardness discussed previously
I've made this seem a little less like I forgot something at least.
This is the ongoing fight between readability and navigability. I've added more links to things that haven't been linked in a paragraph or two.
I've made it clearer that the given example both adds and removes protons!
People can always add new residue definitions, even for residues already in the cache, but I agree this is covering an unusual edge case. I've softened the language a little.
Hmm I'm used to "function from x to y" meaning "function that takes arguments x and returns y". Have clarified.
Yeah that's right, I just wanted to stress the point that all chemical info in Pablo can be changed by the user API/UX questions:
Clarified that this is a cross-OS standard and de-emphasized the role of the environment variable. I have no idea how many people change this variable but there was a big push a few years ago to get devs to stop writing stuff to
The format argument is mostly there for loading file-like objects.
Oof yeah taking a second look at these I see what you mean. Fixed! |
j-wags
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused on reading the book "cover-to-cover" and following my nose down some other functionality rabbit holes. it took me a few hours so I didn't review the non-book code changes too closely. Overall this is absolutely excellent. A lot of suggestions and only a few blockers before I think this is ready for merge.
- (not blocking) Could be good to tell people where the example files live (I'm pretty good at this stuff and it was the 4th place I looked)
- The heme example doesn't actually load it correctly, and may be too clever to be a good intro:
- (blocking) The hemes are supposed to be covalently bound to the protein, but aren't in the pablo-loaded topology (see footnote [1] below) (this is probably a limitation of the preparation process, but it makes it a weaker headline example regardless)
- (not blocking) We can't parameterize iron, even if it were loaded correctly, which makes this not-the-best-first-example
- (not blocking) Could consider putting a simpler topology here, and moving this to a later page where we can give more context, since it is really impressive
- (not blocking) The tooltips are really slick, great work on those
- (not blocking) Could be cool to include a pdbx/mmcif in the quickstart
- (not blocking) It could be good to include something like a philosophy/use case overview between 1.1 and 1.2, like
- Normal usage of pablo expects that things will "just work" if your inputs are made of standard residues, or you'll get a helpful error message explaining how your input file needs to change or which residues need additional information to be loaded. We intend to offer several options in our API for users to teach pablo about nonstandard residues. It's important to note that we intend for it it be very rare for users to custom residues atom-by-atom. Instead we offer several API points to allow users to copy and modify existing residue definitions, and we expect these will be the shortest path to defining custom residues in the vast majority of cases.
- (not blocking) In section 4: "The fact that a residue library admits multiple residue definitions for each name allows alternate protonation states and resonance forms to be included in a residue library." - The inclusion of "resonance forms" here gives me pause. I think this means to say "you could put the charge on the other oxygen in a carboxylate" but I'm thinking like "you could permute the double bonds in an aromatic ring", but then the implication that this would be additive (like, we're telling the user they could have residue definitions for BOTH permutations of double bonds around a ring) creates more questions than answers. I might recommend just leaving this at "protonation states"
- (not blocking) Drawing leaving atoms in light grey by default is SO COOL
[1] The PDB website shows two cysteines binding the hemes, but the loaded complex doesn't have them
However if bonds existed between the protein and hemes, they'd all be molecule 0. Instead molecule 1 is a disconnected heme.
topology.molecule(1).to_smiles()yields
'[H][O][C](=[O])[C]([H])([H])[C]([H])([H])[C]1=[C]2/[C]([H])=[C]3/[C]([C]([H])([H])[C]([H])([H])[C](=[O])[O][H])=[C]([C]([H])([H])[H])[C]4=[N]3->[Fe@]35<-[N]6=[C]([C]([C]([H])=[C]([H])[H])=[C]([C]([H])([H])[H])/[C]6=[C](\\[H])[C](=[C]1[C]([H])([H])[H])[N]23)/[C]([H])=[C]1/[C]([C]([H])([H])[H])=[C]([C]([H])=[C]([H])[H])/[C](=[C]/4[H])[N]15'
I've moved them to a subdirectory of the docs so they live next to the documentation source
Great catch, I've replaced this with a freshly prepared PDB file with a more Sage-friendly noncovalent ligand
Will give this a miss as I think it'll distract from what the library is really trying to do.
Added with some edits
Permuting the double bonds in the aromatic ring is absolutely allowed, and a user could absolutely have both permutations of a double bonds around a ring. This is important for eg. resonance forms of charged substituents of an aromatic system where the charge moves around the ring, changing the formal bond orders in the aromatic system. I've clarified what happens when multiple resonance forms match - as long as they assign the same net formal charge, elements, and connectivity, the first match is chosen. |
Co-authored-by: Jeff Wagner <[email protected]>
jameseastwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of my comments should be considered blocking. Overall, this documentation looks great.
|
|
||
| PDB files are a highly permissive format widely used in molecular modeling. The rules for how a PDB file should be written or read are vague enough --- and the chemistry familiar enough --- that the format often runs on the principle of "you know what I mean". For traditional MD workflows, this permissiveness is an asset. Simulation engines can write PDB files with arbitrary virtual sites or coarse-grainings safe in the knowledge that most tools will be able to make do, at least in the biochemical context where the user has force field parameters. | ||
|
|
||
| The OpenFF ecosystem, however, relies on access to detailed chemical information of the modeled molecular system. In particular, we require a complete chemical graph; every atom, including hydrogens, their atomic numbers and formal charges, and their connectivity with bond orders. We use this rich chemical information to assign accurate force field parameters to an enormous variety of molecules. By contrast, PDB files have no specified facility for the specification of bond orders, and routinely omit bonds, formal charges, and hydrogen atoms altogether. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "PDB files have no facility for the specification of bond orders" accurate? To avoid repeating "specify"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's slightly less accurate - there is an unusual/rare convention that duplicate CONECT records can represent higher bond orders, but it's not in the PDB spec (hence the first specified). Have rephrased the sentence "By contrast, PDB files routinely omit bonds, formal charges, and hydrogen atoms, and the PDB spec provides no way at all to define bond orders for arbitrary chemistries."
docs/book/03-how-loading.md
Outdated
|
|
||
| ## PDBx/mmCIF files | ||
|
|
||
| Pablo has basic support for PDBx/mmCIF files. This works by reading a subset of data entries in the file that are equivalent to the columns of a PDB file's `ATOM/HETATM` records and passing that data through the same matching and topology construction pipeline used for PDB files. This ignores most of the chemical information in the PDBx/mmCIF file. If you have a use-case for the PDBx/mmCIF format that isn't covered by the existing functionality, please let us know --- if there's demand, we'll implement it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my professional responsibility to stop developers from promising users that we will implement some feature. Can we re-word this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "please let us know so we can consider implementing it!" work?
docs/book/05-resdef.Rmd
Outdated
|
|
||
| Sometimes you need to create a custom residue definition. This might be when you want to augment the CCD cache with a residue of your own, or when creating your own residue library. All residue definitions are instances of the [`ResidueDefinition`] class. That class has a large number of `from_*()` methods that can be used to easily and quickly generate residue definitions. | ||
|
|
||
| For ligands and single-residue molecules, the reference docs for that class might be all you need. Unfortunately, preparing residue definitions that can actually link to each other in a polymer requires a little bit more understanding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit "Unfortunately"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this omission I felt like the whole paragraph doesn't hold its weight - I intended it as a sort of oblique "sorry this is so complicated". I've removed the paragraph.
Co-authored-by: James Eastwood <[email protected]>
j-wags
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dope. So excited.
Fixes #
Changes made in this Pull Request:
PR Checklist