Describe the bug
RFD3's training resources contain most of the required materials for others to reproduce its results. However, a few key data artifacts are currently missing, which makes exact reproducibility unattainable for the time being.
Expected behavior
In no particular order, the following data artifacts (i.e., data files or datasets) should be publicly available for download, to make RFD3's training fully reproducible.
|
data: /projects/ml/prot_dna/transcriptionFactor_distillation_rf3.newDL.csv |
|
data: ${paths.data.monomer_distillation_parquet_dir}/af2_distillation_facebook.parquet |
|
insulinr: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/insulin_target.pdb |
|
contig: 100-100,/0,B1-150 |
|
contig_atoms: '{}' |
|
length: 250-250 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B59: |
|
CG,CZ: 1 |
|
B83: |
|
CG,CZ: 1 |
|
B91: |
|
CG,CZ: 1 |
|
pdl1: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/5o45_pdl1.pdb |
|
contig: 100-100,/0,B1-115 |
|
contig_atoms: '{}' |
|
length: 215-215 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B40: |
|
CG,CZ: 1 |
|
B99: |
|
CG,SD: 1 |
|
B107: |
|
CG,CZ: 1 |
|
vegfr: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/vegfr_2x1w_and_af2_B.pdb |
|
contig: 100-100,/0,B1-200 |
|
contig_atoms: '{}' |
|
length: 300-300 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B13: |
|
CG1,CG2: 1 |
|
B15: |
|
CG,CZ: 1 |
|
B43: |
|
CG,CZ: 1 |
|
B75: |
|
CG,SD: 1 |
|
B89: |
|
CD1,CG2: 1 |
|
B91: |
|
CG1,CG2: 1 |
|
B187: |
|
CG,CD1: 1 |
|
rbd: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/COVID19_target.pdb |
|
contig: 100-100,/0,B1-195 |
|
contig_atoms: '{}' |
|
length: 295-295 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B89: |
|
CG,CZ: 1 |
|
B121: |
|
CG,CZ: 1 |
|
B123: |
|
CG,CD1: 1 |
|
B124: |
|
CG,CZ: 1 |
|
B141: |
|
CG,CZ: 1 |
|
B157: |
|
CG,CZ: 1 |
|
B163: |
|
CG,CZ: 1 |
|
B165: |
|
CG,CZ: 1 |
|
B173: |
|
CG,CZ: 1 |
|
cd28: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/cd28_1yjd_B.pdb |
|
contig: 100-100,/0,B1-118 |
|
contig_atoms: '{}' |
|
length: 218-218 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B51: |
|
CG,CZ: 1 |
|
B61: |
|
CG,CZ: 1 |
|
B99: |
|
CG,SD: 1 |
|
B104: |
|
CG,CZ: 1 |
|
il2ra: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/il2ra_1z92_B.pdb |
|
contig: 100-100,/0,B1-122 |
|
contig_atoms: '{}' |
|
length: 222-222 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B3: |
|
CG,CD1: 1 |
|
B26: |
|
CG,SD: 1 |
|
B43: |
|
CG,CD1: 1 |
|
B44: |
|
CG,CZ: 1 |
|
B46: |
|
CG,CD1: 1 |
|
il10ra: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/il10rb_1lqs_B.pdb |
|
contig: 100-100,/0,B1-207 |
|
contig_atoms: '{}' |
|
length: 307-307 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B39: |
|
CG,CD1: 1 |
|
B50: |
|
CD1,CG2: 1 |
|
B59: |
|
CG,CZ: 1 |
|
B63: |
|
CA,CB: 1 |
|
B64: |
|
CG1,CG2: 1 |
|
B66: |
|
CG,CD1: 1 |
|
tie2: |
|
input: /projects/ml/aa_design/benchmarks/bcov_af3_ppi_benchmark/tie2_2gy5_official_B.pdb |
|
contig: 100-100,/0,B1-188 |
|
contig_atoms: '{}' |
|
length: 288-288 |
|
redesign_motif_sidechains: false |
|
atom_level_hotspots: |
|
B132: |
|
CG1,CG2: 1 |
|
B134: |
|
CG,CZ: 1 |
|
B135: |
|
CG,CD: 1 |
|
B139: |
|
CG,CZ: 1 |
|
B140: |
|
CD1,CG2: 1 |
|
B154: |
|
CG1,CG2: 1 |
|
B156: |
|
CG,CD1: 1 |
|
B167: |
|
CG,CZ: 1 |
|
B172: |
|
CD1,CG2: 1 |
|
data: ${paths.data.design_benchmark_data_dir}/dna_binder.json |
|
data: ${paths.root_dir}/tests/dna.json |
|
data: ${paths.root_dir}/rfd3/tests/test_data/dna.json |
|
data: ${paths.data.design_benchmark_data_dir}/indexed.json |
|
data: ${paths.data.design_benchmark_data_dir}/mcsa_41_short_rigid_new.json |
|
data: ${paths.data.design_benchmark_data_dir}/mcsa_41.json |
|
data: ${paths.data.design_benchmark_data_dir}/sm_binder_hbonds_sampled.json |
|
data: ${paths.data.design_benchmark_data_dir}/sm_binder_hbonds.json |
|
data: ${paths.data.design_benchmark_data_dir}/unconditional_deep.json |
|
data: ${paths.data.design_benchmark_data_dir}/monomer.json |
|
data: ${paths.data.design_benchmark_data_dir}/unindexed.json |
|
monomer_distillation_data_dir: /squash/af2_distillation_facebook/ |
|
monomer_distillation_parquet_dir: /projects/ml/datahub/dfs/distillation/af2_distillation_facebook |
|
design_benchmark_data_dir: /projects/ml/aa_design/benchmarks |
|
design_model_weight_dir: /projects/ml/aa_design/models |
|
cif_cache_dir: /net/tukwila/ncorley/cifutils/cache |
- (Probably optional):
|
residue_cache_dir: /net/tukwila/lschaaf/datahub/MACE-Egret-3-noH/mace_embeddings |
Additional context
Thank you for open-sourcing RFD3's code and configs! These are significant resources from which the research community has already begun benefitting.
Describe the bug
RFD3's training resources contain most of the required materials for others to reproduce its results. However, a few key data artifacts are currently missing, which makes exact reproducibility unattainable for the time being.
Expected behavior
In no particular order, the following data artifacts (i.e., data files or datasets) should be publicly available for download, to make RFD3's training fully reproducible.
foundry/models/rfd3/configs/datasets/train/pdb/na_complex_distillation.yaml
Line 12 in cee116d
foundry/models/rfd3/configs/datasets/train/rfd3_monomer_distillation.yaml
Line 21 in cee116d
foundry/models/rfd3/configs/datasets/val/val_examples/bcov_ppi_easy_medium_with_ori.yaml
Lines 4 to 151 in cee116d
foundry/models/rfd3/configs/datasets/val/dna_binder_design5.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/dna_binder_long.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/dna_binder_short.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/indexed.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/mcsa_41_short_rigid.yaml
Line 9 in cee116d
foundry/models/rfd3/configs/datasets/val/mcsa_41.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/sm_binder_hbonds_short.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/sm_binder_hbonds.yaml
Line 6 in cee116d
foundry/models/rfd3/configs/datasets/val/unconditional_deep.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/datasets/val/unconditional.yaml
Line 6 in cee116d
foundry/models/rfd3/configs/datasets/val/unindexed.yaml
Line 7 in cee116d
foundry/models/rfd3/configs/paths/data/default.yaml
Lines 6 to 7 in cee116d
foundry/models/rfd3/configs/paths/data/default.yaml
Lines 12 to 13 in cee116d
foundry/models/rfd3/configs/paths/data/default.yaml
Line 18 in cee116d
foundry/models/rfd3/configs/paths/data/default.yaml
Line 16 in cee116d
Additional context
Thank you for open-sourcing RFD3's code and configs! These are significant resources from which the research community has already begun benefitting.