Skip to content
This repository was archived by the owner on Jan 22, 2026. It is now read-only.

Commit f18f33c

Browse files
committed
add result interpretation
1 parent 7dbb806 commit f18f33c

1 file changed

Lines changed: 73 additions & 0 deletions

File tree

README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,79 @@ sphinx-build -E docs ./arctic3d-docs
8585

8686
Then you can open the file `arctic3d-docs/index.html`, which contains all the necessary documentation.
8787

88+
## Result interpretation
89+
90+
After running ARCTIC-3D, results are stored in the output directory (default: `arctic3d-{uniprot_id}/`). Below is an explanation of each output file and how to interpret them.
91+
92+
### Output files
93+
94+
| File | Description |
95+
|------|-------------|
96+
| `arctic3d.log` | Log file with execution details and warnings |
97+
| `input_data/` | Directory containing copies of input files |
98+
| `{pdb_id}_updated.cif` | Structure file downloaded from PDBe (mmCIF format) |
99+
| `{pdb_id}-{chain}.pdb` | Cleaned PDB structure used for analysis (renumbered to UniProt numbering) |
100+
| `retrieved_interfaces.out` | All interfaces retrieved from PDBe, listing partner IDs and their residue lists |
101+
| `interface_matrix.txt` | Pairwise dissimilarity values between all interfaces (used for clustering) |
102+
| `dendrogram_{linkage}.png` | Hierarchical clustering dendrogram (e.g., `dendrogram_average.png`) |
103+
| `clustered_interfaces.out` | Interfaces grouped into clusters (binding surfaces) |
104+
| `clustered_residues.out` | Residues belonging to each cluster |
105+
| `clustered_residues_probs.out` | Residues ranked by probability within each cluster |
106+
| `{pdb_id}-{chain}_cl{N}.pdb` | PDB structure for cluster N with probabilities encoded in B-factor column |
107+
| `sequence_probability.html` | Interactive bar plot of per-residue probabilities |
108+
| `sequence_probability.json` | JSON data for the interactive plot |
109+
110+
### Understanding the clustering
111+
112+
ARCTIC-3D groups similar interfaces into **binding surfaces** (clusters). Two interfaces are considered similar when they overlap spatially on the protein surface. The dissimilarity is measured using the squared sine of the angle between interface vectors in a Hilbert space representation - values close to 0 indicate overlapping interfaces, while values close to 1 indicate completely distinct regions.
113+
114+
The `interface_matrix.txt` file contains the pairwise dissimilarity values in the format:
115+
```
116+
interface1 interface2 dissimilarity_value
117+
```
118+
119+
### Interpreting residue probabilities
120+
121+
The **probability** (or "contact probability score") represents **the fraction of interfaces within a cluster where a residue is observed**. It is calculated independently for each cluster:
122+
123+
```
124+
probability = (number of interfaces containing the residue) / (total interfaces in cluster)
125+
```
126+
127+
For each cluster, residues are assigned a probability value between 0 and 1:
128+
129+
- **Probability = 1.0**: The residue appears in every interface within the cluster (a "hotspot" residue)
130+
- **Probability = 0.5**: The residue appears in half of the cluster's interfaces
131+
- **Probability close to 0**: The residue rarely appears at this binding surface
132+
133+
**Important**: Probabilities do NOT sum to 1.0 across clusters for a given residue. A residue can have high probability in multiple clusters if it participates in different binding surfaces. For example, a residue with probability 0.8 in cluster 1 and 0.6 in cluster 2 means it appears in 80% of cluster 1's interfaces and 60% of cluster 2's interfaces.
134+
135+
The `clustered_residues_probs.out` file lists residues ranked by probability:
136+
```
137+
Cluster 1 : 15 residues
138+
rank resid resname probability
139+
1 42 ALA 1.000
140+
2 45 GLU 0.875
141+
...
142+
```
143+
144+
### Visualizing probabilities in PDB files
145+
146+
The output PDB files (`{pdb_id}-{chain}_cl{N}.pdb`) encode probabilities in the B-factor column:
147+
148+
- Cluster residues: `B = 50 × (1 + probability)`, ranging from 50 (probability=0) to 100 (probability=1)
149+
- Non-cluster residues: `B = 0`
150+
151+
This allows visualization in molecular viewers (PyMOL, ChimeraX, etc.) using a color spectrum where high B-factors (red) indicate hotspot residues and low values (blue) indicate residues not involved in that binding surface.
152+
153+
### Interpreting the dendrogram
154+
155+
The dendrogram (`dendrogram_average.png`) shows the hierarchical relationship between all retrieved interfaces. The x-axis represents the dissimilarity between interfaces or groups. Interfaces that merge below the threshold (default: 0.866, corresponding to a 60° angle) form a single binding surface. The threshold can be adjusted with `--threshold` to obtain finer or coarser clustering.
156+
157+
### Using the interactive plot
158+
159+
Open `sequence_probability.html` in a web browser to explore per-residue binding probabilities. Each cluster is shown as a separate colored bar series, allowing you to identify which residues are involved in which binding surface and compare hotspots across different clusters.
160+
88161
## Citing us
89162

90163
If you used ARCTIC-3D in your work please cite the following publication:

0 commit comments

Comments
 (0)