Drug-Drug Interaction Prediction

This project aims to predict the severity of drug-drug interactions based on the molecular properties of drugs. Drug interactions can lead to adverse effects in patients, and predicting these interactions can help in avoiding dangerous drug combinations. The primary goal of this project is to build a machine learning model that can accurately predict the potential severity of drug interactions.

Features:

Data Cleaning: The raw dataset is preprocessed to handle missing values, duplicates, and normalization of features.
Modeling: The project uses a Random Forest Classifier to predict drug interaction levels based on drug properties.
Visualization: Visualizations such as feature importance and the distribution of interaction levels are generated to understand the underlying patterns.
Future Enhancements: Models like SVM, CNN, and others will be integrated soon to further improve prediction accuracy.

Introduction

Drug-drug interactions (DDIs) occur when two or more drugs interact and cause an adverse reaction in a patient. Understanding these interactions can improve patient safety and reduce the occurrence of side effects. This project uses machine learning techniques to predict DDIs based on drug properties such as molecular weight, solubility, and other related features.

Currently, a Random Forest Classifier is used to predict the severity of these interactions, and visualizations such as feature importance plots are generated to provide insights into the model's behavior.

Getting Started

This section will guide you through setting up the project locally on your machine and running it.

Prerequisites

Ensure you have Python 3.x installed on your machine along with pip for package management.

Installation

Clone the repository to your local machine:

git clone https://github.com/yourusername/ddi-prediction.git
cd ddi-prediction

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies:

pip install -r requirements.txt

Usage

Once the dependencies are installed, you can run the main.py script to start the model training and evaluation.

Running the Model

To run the model and generate predictions, execute the following command:

python main.py

This will:

Load and clean the data: The data is preprocessed, and any missing values or duplicates are removed.
Train the model: A Random Forest Classifier is trained to predict drug interaction levels.
Evaluate the model: The script will display the accuracy score and a classification report.
Visualize results: It will generate visualizations, including:
- Feature Importance: Bar plot showing the most important features used by the model.
- Interaction Levels: Distribution of different interaction levels (e.g., severe, moderate, or no interaction).

Expected Output

The script will output the following:

Accuracy of the Random Forest model.
A classification report with precision, recall, and F1 scores for each interaction level.
A feature importance plot to see how much each drug property contributes to the prediction.
A count plot of the interaction levels showing how the data is distributed.

Project Structure

Here’s how the repository is organized:

dataset/
│── merged_data/
│   └── merged_data.csv
│── raw_data/
│   ├── raw_drug.csv
│   └── raw_properties.xlsx
│── train_data/
│   └── train_set.csv
│── featuring_data/
│   └── ML1.ipynb
model/
│── data/
│── ddi_notebook.ipynb
│── main.py
│── readme.md
│── requirements.txt

data: Contains datasets (both raw and cleaned).
notebooks: Contains Jupyter notebooks used for data analysis and exploration.
main.py: Main script for training the model, evaluating it, and visualizing results.
requirements.txt: A list of Python libraries needed to run the project.

Future Enhancements

1. SVM (Support Vector Machine) Classifier

Support Vector Machines will be integrated into the project to compare performance with the Random Forest Classifier. The goal is to understand if SVM provides a better decision boundary for drug interactions.

2. Convolutional Neural Networks (CNNs)

We are planning to experiment with CNNs for feature extraction from drug property data. This will be a significant step in advancing the model's capability by using deep learning techniques to process structured data.

3. Hyperparameter Tuning

For better performance, hyperparameter tuning using techniques like Grid Search and Randomized Search will be applied to optimize the model parameters.

4. Model Evaluation

More evaluation metrics such as confusion matrices and ROC curves will be added to assess the performance more comprehensively.

5. Deployment

Eventually, we plan to deploy the model as a web application where healthcare professionals can input drug combinations to check potential interactions.

6. Additional Models

Logistic Regression and Decision Trees will be explored for comparison purposes.
XGBoost and LightGBM are potential alternatives to Random Forest that will be tested for better prediction power.

Acknowledgments

We would like to acknowledge the developers of the libraries used in this project, including:

Scikit-learn for machine learning tools and algorithms.
Pandas for data manipulation.
Matplotlib and Seaborn for visualization.
Jupyter for exploratory data analysis.

Conclusion

This project demonstrates the potential of machine learning in predicting drug-drug interactions and improving patient safety. With future enhancements like the addition of new models and deployment of the system, this can be a valuable tool for healthcare providers to prevent adverse drug reactions.

You can adjust the GitHub repository links or any other project-specific details in the README.md based on your actual project. Let me know if you need further adjustments!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drug-Drug Interaction Prediction

Features:

Table of Contents

Introduction

Getting Started

Prerequisites

Installation

Usage

Running the Model

Expected Output

Project Structure

Future Enhancements

1. SVM (Support Vector Machine) Classifier

2. Convolutional Neural Networks (CNNs)

3. Hyperparameter Tuning

4. Model Evaluation

5. Deployment

6. Additional Models

Acknowledgments

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataset		dataset
featuring_data		featuring_data
model		model
DDI.ipynb		DDI.ipynb
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Drug-Drug Interaction Prediction

Features:

Table of Contents

Introduction

Getting Started

Prerequisites

Installation

Usage

Running the Model

Expected Output

Project Structure

Future Enhancements

1. SVM (Support Vector Machine) Classifier

2. Convolutional Neural Networks (CNNs)

3. Hyperparameter Tuning

4. Model Evaluation

5. Deployment

6. Additional Models

Acknowledgments

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages