AMPRecognitionBenchmark

The official code for the paper "A Benchmark for Antimicrobial Peptide Recognition Based on Structure and Sequence Representation"

Antimicrobial peptides (AMPs) serve as potent therapeutic agents against drug-resistant (DR) microbes; however, their clinical application is constrained by limitations in activity. Recently, machine learning has shown significant promise in recognizing high-activity AMPs. Nevertheless, these activity datasets about AMPs are aggregated from thousands of publications, which employ varying wet-lab experimental setups and focus on only one or a few types of DR bacteria. This heterogeneity restricts the advancement of AI methods for fair evaluation of AMP recognition. Additionally, while AlphaFold has revolutionized drug discovery through accurate protein structure predictions, the integration of these predicted structures into AMP discovery remains unexplored.

To address these challenges, we present two key contributions:

(a) DRAMPAtlas 1.0: We introduce DRAMPAtlas 1.0, comprising a training set sourced from public databases and a testing set derived from our wet-lab experiments. Each AMP sequence in the atlas is annotated with its 3D structure, activity data against six types of DR bacteria, and toxicity profiles.

(b) Comprehensive AMP recognition Experiments: We perform extensive experiments on AMP recognition by modeling the 3D structures as voxels or graphs, either in combination with sequence information or using structure or sequence data exclusively. Our experiments reveal several insightful findings that enhance our understanding of AMP activity prediction.

We anticipate that our benchmark and findings will aid the research community in designing more effective algorithms for discovering high-activity AMPs.

DRAMPAtlas 1.0 Dataset

Harvard Dataverse(wet-lab data not included for now, reveal after publication)

Instruction

Download the code

git clone https://github.com/EricwanAR/AMPRecognitionBenchmark.git
cd AMPRecognitionBenchmark

Dataset Preparation

Put the Downloaded dataverse_files.zip in the root directory of this project AMPRecognitionBenchmark/.
Run bash dataproc.sh in terminal
Check if there are 2 new folders metadata/ and pdb/

metadata/
├── data_simi.csv
├── data_0920_i.csv
└── ...

pdb/
├── pdb_af/ #AlphaFold Predicts
│   └── (pdb files ...)
└── pdb_dbassp/ #HelixFold Predicts
    └── (pdb files ...)

Training and Inferencing

cd to any unit folder, eg cd t3.1 or cd voxabl
python main.py [args], full arguments can be found in main.py, refer to runthro.sh for experiments in our paper.
Checkpoints will be saved in run/
For inferencing, simply replace main.py with infer.py, eg python infer.py [training args]

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
graphabl		graphabl
pics		pics
t3.1		t3.1
t3.2.1		t3.2.1
t3.2.2		t3.2.2
t4.2		t4.2
t5		t5
t5g		t5g
t6		t6
t7g		t7g
voxabl		voxabl
.gitignore		.gitignore
README.md		README.md
dataproc.sh		dataproc.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AMPRecognitionBenchmark

DRAMPAtlas 1.0 Dataset

Instruction

Download the code

Dataset Preparation

Training and Inferencing

About

Uh oh!

Releases

Packages

Languages

EricwanAR/AMPRecognitionBenchmark

Folders and files

Latest commit

History

Repository files navigation

AMPRecognitionBenchmark

DRAMPAtlas 1.0 Dataset

Instruction

Download the code

Dataset Preparation

Training and Inferencing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages