repo/
├─ flexisol/ # Benchmark data (structures, per-method folders)
├─ data/
│ ├─ raw_energies/ # CSV with energies for `populate`
│ ├─ references/ # Experimental references (CSV)
│ └─ results/ # Published raw results (CSV)
├─ output/ # Generated CSVs from evaluations
├─ flexisol_cli/ # Package: CLI and helpers
│ ├─ cli.py # Console entrypoint (`flexisol`)
│ ├─ reader.py # I/O + weighting (Boltzmann/minimum)
│ ├─ evaluation.py # gsolv/pkab assembly vs references
│ ├─ metrics.py # Error metrics, outlier filtering
│ ├─ config.py # Defaults and runtime config
│ └─ registry.json # Method registry (column → folder, type)
├─ pyproject.toml
└─ README.md
- Energy files under each method folder are named
el_energy
orsolv_energy
. - Avoid committing large outputs; keep
output/
small and reproducible.
- Create a fresh Conda env (recommended):
conda create -n flexisol python=3.12 -y
conda activate flexisol
- Install from the repo root:
- Editable (dev):
pip install -e .
- Regular:
pip install .
- Editable (dev):
- Verify installation:
flexisol --help
orflexisol-cli --help
- Alternatively:
python -m flexisol_cli.cli --help
General syntax is: flexisol <command> [options]
or flexisol-cli <command> [options]
(e.g., flexisol all -h
).
There are two main commands: evaluate-all
(alias: all
) and evaluate-one
(alias: one
) for analyzing multiple or single methods. The populate
command copies energies from a CSV into the expected folder structure.
- Populate and evaluate all methods using the provided data folders:
flexisol populate --csv data/raw_energies/energies.csv --benchmark-root flexisol
flexisol all --benchmark-root flexisol -w boltzmann -g full
- Find mode-specific help with
-h
on any command, e.g.:flexisol all -h
flexisol one -h
flexisol populate -h
- Show resolved paths and options (root, references, output, weighting, geometry):
flexisol config --show
- The CLI needs the path to the benchmark dataset (the directory containing structure subfolders). This is called the "benchmark root".
- Set it explicitly with any of these equivalent flags:
--benchmark-root
,--dataset-root
, or--root
.- Example:
flexisol evaluate-all --benchmark-root /path/to/flexisol
- Example:
- Alternatively, set the environment variable
FLEXISOL_ROOT
. - Defaults: if omitted, the tool looks for
./flexisol
(relative to your current working directory). If not found, it uses the current working directory.
- Populate from CSV (writes el_energy/solv_energy):
flexisol populate --csv data/raw_energies/energies.csv --benchmark-root flexisol
- Evaluate all baselines (writes CSVs to
output/
, prints stats):flexisol all --benchmark-root flexisol -w boltzmann -g full
- Options for stats filtering (legacy-like):
--sigma 3
(default),--no-sigma
,--abs-cutoff 200
(default),--no-abs-cutoff
.
- Weighting (
-w
,--weighting
): controls how conformer energies are aggregated per group.boltzmann
(default): Boltzmann-weighted average over conformers (298.15 K).minimum
: Selects the minimum energy across conformers.
- Geometry (
-g
,--geometry
): chooses which geometry to use when evaluating.full
(default): Use gas-phase geometry for gas mode and solvated geometry for solv mode.gas
: Use gas-phase geometry for both modes (duplicates gas rows to solv).solv
: Use solvated geometry for both modes (duplicates solv rows to gas).
Examples:
- Minimum weighting with solv geometry:
flexisol all --benchmark-root flexisol -w minimum -g solv
- Gas geometry only:
flexisol one --benchmark-root flexisol -ee el_r2scan-3c -se smd -g gas
Below is a sample run of flexisol all
with Boltzmann weighting and full geometry. It shows the header (tool, version, authors), progress steps with timings, result path and row counts, and per-datapoint statistics.
+------------------------------------+
| FlexiSol Evaluator |
| v 0.1.0 |
| Authors: L. Wittmann, C. E. Selzer |
+------------------------------------+
Working on el_r2scan-3c [weighting=boltzmann, geometry=full]
reading structures ... done (0.20 sec)
reading energies ... done (17.71 sec)
weighting (boltzmann) ... done (1.25 sec)
evaluating gsolv ... done (1.82 sec)
evaluating pkab ... done (1.03 sec)
results written to /home/wittmann/Documents/projects/p02_solvl/paperscript/evaluator/output/el_r2scan-3c-full-boltzmann-results.csv
results: 824 rows (gsolv=530, pkab=294) (total 22.37 sec)
statistics for gsolv (kcal/mol) (abs>200, 3-sigma):
method ME MAE RMSE SD AMAX N
------------------------ -------- -------- -------- -------- -------- ------
alpb -0.54 2.79 3.54 3.50 10.31 526
smd 0.71 2.50 3.27 3.19 10.09 529
...
statistics for pkab (log units) (abs>200, 3-sigma):
method ME MAE RMSE SD AMAX N
------------------------ -------- -------- -------- -------- -------- ------
alpb 1.71 2.47 3.11 2.59 9.25 290
smd -0.09 1.06 1.41 1.41 4.57 291
...
- Progress: each step shows wall time to help spot bottlenecks.
- Results: path and row counts; counts per datapoint (gsolv, pkab) are printed.
- For statistics, see "Error metrics" below.
- Example (r2scan-3c + SMD, full + Boltzmann):
flexisol evaluate-one --root flexisol -ee el_r2scan-3c -se smd -w boltzmann -g full
- Always writes a CSV to
output/<ee>-<geometry>-<weighting>-<se>-results.csv
and prints per-datapoint stats (gsolv, pkab) with N.
- Edit
flexisol_cli/registry.json
(no code changes needed) - Compute your new method (electronic energy or solvation energy) for all structures.
- Include the energies either in the
flexisol/
folder directly (el_energy
orsolv_energy
), or - Place them in
data/raw_energies/energies.csv
and runflexisol populate
to copy them over.
- Include the energies either in the
The following error metrics are computed by default:
- Mean Error (ME) (also called Mean Signed Error, MSE)
- Mean Absolute Error (MAE)
- Root Mean Square Error (RMSE)
- Standard Deviation (SD)
- Maximum Absolute Error (AMAX)
- Count (N) of datapoints after filtering.
Two types of outlier filtering are applied by default:
- Absolute cutoff: removes datapoints with absolute reference value > 200 (kcal/mol for gsolv, log units for pkab). This is not needed but included for legacy reasons.
- Sigma-clipping: removes datapoints with absolute error > 3 standard deviations from the mean error.