Pandemic_2019_Predictions

Overview

This project focuses on analysing COVID-19 data to build predictive models that can aid in the early detection or forecasting of COVID-19 infections. The analysis is carried out using Python, leveraging essential libraries for data manipulation, visualisation, and machine learning and algorithms.

Project Structure

The notebook is organised into several distinct sections, each dedicated to a specific aspect of the workflow:

Importing Packages: Relevant Python libraries such as NumPy, pandas, Matplotlib, seaborn, and scikit-learn are imported for performing data analysis, visualisation, and machine learning tasks.
Data Loading: The data is uploaded into a pandas DataFrame from an Excel file (dataset.xlsx). Ensure the file is placed in the same directory as the notebook, or adjust the path to where the file is stored.
Exploratory Data Analysis (EDA): Initial exploration of the dataset is conducted to understand its features, data types, missing values, and other preliminary insights through visualisations.
Data Preprocessing:
- Handling missing values
- Encoding categorical variables
- Standardising numerical variables
Feature Engineering: Selection and creation of relevant features based on the exploratory analysis.
Model Building:
- Data split into training and testing sets.
- Use of classification algorithms such as Logistic Regression, Decision Trees, or Random Forest (based on the notebook content).
Model Evaluation:
- Performance metrics such as accuracy, precision, recall, ROC-AUC, and F1 score.
- Confusion matrix and ROC curves for visual performance evaluation.

Dataset

The project uses an Excel dataset (dataset.xlsx). This file must contain relevant COVID-19 patient or case data, including features that aid in prediction, such as demographic information, symptoms, pre-existing conditions, or clinical findings.

Running the Project

Requirements

Python (version >= 3.6)
Libraries: pandas, NumPy, Matplotlib, seaborn, scikit-learn

Install the libraries using pip:

pip install numpy pandas matplotlib seaborn scikit-learn

Steps

Place dataset.xlsx in the same directory as your Jupyter Notebook.
Execute the notebook sequentially to replicate the results.
Modify and experiment with different models or preprocessing methods to potentially improve prediction accuracy.

Modifying the Project

Dataset Update: Replace or update dataset.xlsx with new data or expanded datasets.
Model Adjustments: Experiment with different algorithms, hyperparameters, or cross-validation techniques.
Feature Changes: Include or remove features to test impacts on model performance.

Contributions

Feel free to clone, fork, and improve upon this project. Contributions for better accuracy, code optimization, or feature engineering improvements are welcome.

Author: Kaustubh Ramekar

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AI ESG Analysis.ipynb		AI ESG Analysis.ipynb
LICENSE		LICENSE
Pandemic_2019_Predictions kksr.ipynb		Pandemic_2019_Predictions kksr.ipynb
Population Data Analysis.ipynb		Population Data Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pandemic_2019_Predictions

Overview

Project Structure

Dataset

Running the Project

Requirements

Steps

Modifying the Project

Contributions

Date: 10/05/2025

About

Uh oh!

Releases

Packages

Languages

License

EcoDataSage/Pandemic_2019_Predictions

Folders and files

Latest commit

History

Repository files navigation

Pandemic_2019_Predictions

Overview

Project Structure

Dataset

Running the Project

Requirements

Steps

Modifying the Project

Contributions

Date: 10/05/2025

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages