Update README.md

rdk · web-flow · commit d09945b60c88 · 2024-11-14T11:27:19.000+01:00
diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ Ligand-binding site prediction based on machine learning.
 
 ### Description
 
-P2Rank is a stand-alone command-line program for the rapid and accurate prediction of ligand-binding sites from protein structures. 
+P2Rank is a stand-alone command-line program for fast and accurate prediction of ligand-binding sites from protein structures. 
 It achieves high prediction success rates without relying on external software for computation of complex features or on a database of known protein-ligand templates.
            
 ### ✨ What's new?
@@ -106,17 +106,17 @@ prank predict -c alphafold   test.ds     # use alphafold config and model (confi
 
 ### Prediction output 
 
-   For each structure file `{struct_file}` in the dataset, P2Rank produces several output files:
-   * `{struct_file}_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates 
-   of their centers together with a list of adjacent residues, list of adjacent protein surface atoms, and a calibrated probability of being a ligand-binding site.
-   * `{struct_file}_residues.csv`: contains a list of all residues from the input protein with their scores, 
-   mapping to predicted pockets, and a calibrated probability of being a ligand-binding residue.
-   * PyMol and ChimeraX visualizations in `visualizations/` directory (`.pml` and `.cxc` scripts with data files in `data/`).
-     * Generating visualizations can be turned off with the `-visualizations 0` parameter.
-     * `-vis_renderers 'pymol,chimerax'` parameter can be used to turn individual visualization renderers on/off. 
-     * `-vis_copy_proteins 0` parameter can be used to turn off copying of protein structures to the visualizations directory (faster, but visualizations won't be portable).
-     * Coordinates and ligandability scores of SAS points can be found in `visualizations/data/{struct_file}_points.pdb.gz`. Here, the "Residue sequence number" (23-26 of HETATM record)
-       is the rank of the corresponding pocket (0 means the point doesn't belong to any pocket) and the b-factor column corresponds to the ligandability score.
+   For each structure file `{struct_file}` in the dataset, P2Rank generates several output files:
+   * `{struct_file}_predictions.csv`: lists **predicted pockets** in order of score, including each pocket's score, center coordinates, adjacent residues, adjacent protein surface atoms, and a calibrated probability of being a ligand-binding site.
+   * `{struct_file}_residues.csv`: lists **all residues** from the input protein along with their scores, mapping to predicted pockets, and a calibrated probability of being a ligand-binding residue.
+   * **PyMol and ChimeraX visualizations**: `.pml` and `.cxc` scripts in `visualizations/` directory  with additional files in `data/`.
+     * Optional settings:
+       * Use `-visualizations 0` to disable visualization generation.
+       * Use `-vis_renderers 'pymol,chimerax'` to toggle specific renderers on/off.
+       * Use `-vis_copy_proteins 0` to prevent copying protein structures to the visualizations directory (faster, but visualizations won't be portable). 
+   * **SAS points data**: coordinates and ligandability scores for solvent-accessible surface (SAS) points are saved in `visualizations/data/{struct_file}_points.pdb.gz`. Here:
+     * Residue sequence number (position 23-26 in the HETATM record) represents the pocket rank (with 0 indicating no pocket).
+     * B-factor column indicates the ligandability score.
 
 
 ### Configuration
@@ -142,13 +142,6 @@ To see the complete commented list of all (including undocumented)
 parameters see [Params.groovy](https://github.com/rdk/p2rank/blob/develop/src/main/groovy/cz/siret/prank/program/params/Params.groovy) in the source code.
 
 
-### Evaluate prediction model
-...on a file or a dataset with known ligands.
-
-~~~ruby
-prank eval-predict -f test_data/1fbl.pdb
-prank eval-predict test.ds
-~~~
 
 ### Rescoring (PRANK algorithm)
 
@@ -173,8 +166,6 @@ Rescoring output:
 prank rescore fpocket.ds                   
 prank rescore fpocket.ds -o output_here   # explicitly specify output directory
 prank rescore fpocket.ds -c rescore_2024  # use new experimental rescoring model (recommended for alphafold models)
-   
-prank eval-rescore fpocket.ds             # evaluate rescoring model on a dataset with known ligands
 ~~~
 
 For rescoring, the dataset file needs to have a specific 2-column format. See examples in `test_data/`: `fpocket.ds`, `concavity.ds`, `puresnet.ds`.
@@ -196,6 +187,18 @@ In this case, the dataset file can be a simple list of pdb/cif files since Fpock
 `prank fpocket-rescore` will produce `predictions.csv` as well, so it can be used as an in-place replacement for `prank predict` in most scenarios.
 Note: if you use `fpocket-rescore`, please cite Fpocket as well.
 
+### Evaluate prediction and rescoring models
+
+Use following commands to calculate prediction metrics (DCA, DCC, ...) for structures files where the ligands are present.
+
+~~~ruby
+prank eval-predict -f test_data/1fbl.pdb         # evaluate default prediction model on a single file
+prank eval-predict              test.ds          # evaluate default prediction model on a dataset with known ligands
+prank eval-predict -c alphafold test.ds          # evaluate specific prediction model on a dataset with known ligands
+
+prank eval-rescore                 fpocket.ds    # evaluate default rescoring model on a dataset with known ligands
+prank eval-rescore -c rescore_2024 fpocket.ds    # evaluate specific rescoring model on a dataset with known ligands
+~~~
 
 
 ## 🏗️ Build from sources