Welcome to the MucLiPred repository.
The DOI of our work: https://doi.org/10.1021/acs.jcim.3c01471
In our study, we frequently refer to RBP and DBP. For clarity:
RBP (RNA Binding Protein): This refers to cases where we focus on the interactions or binding residues between proteins and RNA molecules.
DBP (DNA Binding Protein): This term refers to cases where we focus on the interactions or binding residues between proteins and DNA molecules.
The repository is organized into several directories, each serving a distinct purpose in the model development and deployment process.
This directory contains the inference code using the pretrained model and your own data.
The /model
folder includes the link to the pretrained model stored on Google Drive.
You can run custom_save_pred.py
to perform inference.
The input file should be a TSV file containing the following columns: id
, sequence
, and type
.
For the type field, use: 0 for DNA; 1 for RNA and 2 for peptide.
A sample input file is provided in the /data
folder to illustrate the expected format.
protBert_main.py
: Contains the code for training the model. This script is the entry point for initiating the training process./other
: This directory includes various experimental code snippets that have been used during the development and testing of the model.
prot_bert.py
: Houses the code defining the architecture of the model. This script details the layers and structure of the neural network used for predictions.
- This directory contains datasets utilized for training and testing the model. Ensure to follow any licensing or usage restrictions associated with the data.
config.py
: Stores various configuration parameters used during training. Parameters such as learning rate, number of epochs, and other hyperparameters can be adjusted here.
- This directory contains the pretrained models.
To get started with MucLiPred, clone the repository and install the required dependencies:
git clone https://github.com/sethzhangjs/MucLiPred