The problem we are solving is the inconvenience and expense of doctor appointments for common health concerns.
Our project aims to offer a reliable medicine search system that recommends appropriate medications, leveraging extensive datasets, including medicine reviews, side effects, and interactions, to provide users with cost-effective and rapid healthcare guidance.
We chose this problem because we aim to address the barriers to accessing timely healthcare, including long waits at the hospital. Additionally, factors such as language and cultural barriers, primary care access issues, and challenges in doctor-patient communication further emphasize the need for a recommendation system to improve healthcare accessibility and quality.
To launch the web app, follow these steps:
-
Install all the required packages by executing the following command in your terminal:
pip install -r requirements.txt
-
Change into the MAIS202_WebAPP directory of this repository using
cd:cd path/to/MAIS202_WebAPP -
Run the web app using the following command:
python web_app.py
-
Open your web browser and navigate to http://localhost:5000.
Now, you should be able to access and interact with the web app locally.
The Y_label is derived from the column containing the respective drugs, while the X_features consist of columns containing symptom and patient information.
Drugs are transformed into standardized codes using either the ATC Third Level or AHFS (Pharmacologic-Therapeutic Classification System) coding systems, ensuring uniformity and consistency in drug representation.
-
Feature Representation:
- Utilizing techniques such as one-hot encoding, Bag-of-Words (BOW), or Term Frequency-Inverse Document Frequency (TF-IDF) based on task-specific requirements.
-
Feature Selection:
- Narrowing down symptom features to the top 500 most popular symptoms to manage computational complexity while maintaining informativeness and computational efficiency.
As a classification problem, our model recommends drugs based on symptoms. The primary evaluation metrics include:
-
Confusion Matrix and Related Metrics:
- Provides insight into True Positives, True Negatives, False Positives, and False Negatives.
- Accuracy = (TP+TN) / (TP+TN+FP+FN)
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1-Score = 2 * (Precision * Recall) / (Precision + Recall)
-
Logistic Loss (Log Loss):
- Evaluates prediction confidence by penalizing wrong predictions made with high confidence.
-
Jaccard Coefficient and F1 Score:
Jaccard Similarity and F1 Score Comparison between models
The initial evaluation of the Random Forest Classifier, using the MultiOutputClassifier with specific hyperparameters, yielded Jaccard scores of 0.4100 and F1 scores of 0.5607. After fine-tuning through BayesSearchCV, the chosen hyperparameters (as shown below) resulted in improved performance:
MultiOutputClassifier(estimator=RandomForestClassifier(bootstrap=False,
max_depth=37,
min_samples_leaf=4,
min_samples_split=14,
n_estimators=112,
random_state=42))The subsequent Jaccard scores increased, indicating better model accuracy (e.g., from 0.4100 to 0.4298), and the F1 scores also showed enhancement (e.g., from 0.5607 to an improved 0.5802). This demonstrates the efficacy of the fine-tuning process and justifies the hyperparameter choices, as the optimized model better captures underlying patterns in the data, leading to superior predictive performance.




