GitHub - datascientistshorya/WineQT-EDA: Wine Quality (QT) – Exploratory Data Analysis project focused on understanding chemical properties affecting wine quality. Performed univariate, bivariate, and correlation analysis, handled missing values/outliers, and identified the most influential features driving quality prediction using statistical insights and visualizations.

🍷 Wine Quality (QT) – Exploratory Data Analysis 📌 Project Overview

This project performs Exploratory Data Analysis (EDA) on the Wine Quality (QT) dataset to understand the chemical properties that influence wine quality and identify the most important features for predicting quality.

The objective is to:

Analyze feature distributions

Study relationships between variables

Detect patterns, outliers, and correlations

Identify the most influential predictors of wine quality

📂 Dataset Information

The dataset contains physicochemical properties of wine samples along with a quality score (target variable).

🔎 Features Include:

Fixed Acidity

Volatile Acidity

Citric Acid

Residual Sugar

Chlorides

Free Sulfur Dioxide

Total Sulfur Dioxide

Density

pH

Sulphates

Alcohol

Quality (Target Variable)

🧠 Problem Type

Type: Supervised Learning

Task: Regression (Predicting quality score)

(Can also be treated as classification if quality is grouped into categories)

🔬 EDA Workflow 1️⃣ Data Understanding

Checked dataset structure and data types

Verified missing values and duplicates

Summary statistics review

2️⃣ Univariate Analysis

Distribution plots (Histograms, KDE plots)

Boxplots for outlier detection

Skewness evaluation

3️⃣ Bivariate Analysis

Correlation heatmap

Feature vs Quality analysis

Scatterplots and trend observations

4️⃣ Multivariate Insights

Strong positive and negative correlations

Interaction effects between chemical properties

📊 Key Findings

Alcohol showed strong positive correlation with quality.

Volatile Acidity showed negative correlation with quality.

Sulphates and Citric Acid also contribute positively.

Some features like Residual Sugar and pH have weaker standalone impact.

Outliers exist but may represent valid chemical variations rather than errors.

🎯 Most Important Features for Quality Prediction

Based on correlation analysis and visual inspection:

Alcohol

Volatile Acidity

Sulphates

Citric Acid

These features are likely to play a major role in building predictive models.

🛠 Tools & Libraries Used

Python

Pandas

Matplotlib

Seaborn

🚀 Next Steps

Feature transformation (if required)

Train/Test split

Model building (Linear Regression / Random Forest / XGBoost)

Feature importance validation

Model evaluation

📌 Project Goal

To build a strong analytical foundation before moving into machine learning modeling and ensure data-driven feature selection.

📬 Connect With Me

If you found this project helpful or want to collaborate:

🔗 LinkedIn: https://www.linkedin.com/in/shorya-bisht-a20144349/

⭐ If you like this project, consider giving it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
WineQT.csv		WineQT.csv
WineQT.ipynb		WineQT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages