🍷 Wine Quality (QT) – Exploratory Data Analysis 📌 Project Overview
This project performs Exploratory Data Analysis (EDA) on the Wine Quality (QT) dataset to understand the chemical properties that influence wine quality and identify the most important features for predicting quality.
The objective is to:
Analyze feature distributions
Study relationships between variables
Detect patterns, outliers, and correlations
Identify the most influential predictors of wine quality
📂 Dataset Information
The dataset contains physicochemical properties of wine samples along with a quality score (target variable).
🔎 Features Include:
Fixed Acidity
Volatile Acidity
Citric Acid
Residual Sugar
Chlorides
Free Sulfur Dioxide
Total Sulfur Dioxide
Density
pH
Sulphates
Alcohol
Quality (Target Variable)
🧠 Problem Type
Type: Supervised Learning
Task: Regression (Predicting quality score)
(Can also be treated as classification if quality is grouped into categories)
🔬 EDA Workflow 1️⃣ Data Understanding
Checked dataset structure and data types
Verified missing values and duplicates
Summary statistics review
2️⃣ Univariate Analysis
Distribution plots (Histograms, KDE plots)
Boxplots for outlier detection
Skewness evaluation
3️⃣ Bivariate Analysis
Correlation heatmap
Feature vs Quality analysis
Scatterplots and trend observations
4️⃣ Multivariate Insights
Strong positive and negative correlations
Interaction effects between chemical properties
📊 Key Findings
Alcohol showed strong positive correlation with quality.
Volatile Acidity showed negative correlation with quality.
Sulphates and Citric Acid also contribute positively.
Some features like Residual Sugar and pH have weaker standalone impact.
Outliers exist but may represent valid chemical variations rather than errors.
🎯 Most Important Features for Quality Prediction
Based on correlation analysis and visual inspection:
Alcohol
Volatile Acidity
Sulphates
Citric Acid
These features are likely to play a major role in building predictive models.
🛠 Tools & Libraries Used
Python
Pandas
Matplotlib
Seaborn
🚀 Next Steps
Feature transformation (if required)
Train/Test split
Model building (Linear Regression / Random Forest / XGBoost)
Feature importance validation
Model evaluation
📌 Project Goal
To build a strong analytical foundation before moving into machine learning modeling and ensure data-driven feature selection.
📬 Connect With Me
If you found this project helpful or want to collaborate:
🔗 LinkedIn: https://www.linkedin.com/in/shorya-bisht-a20144349/
⭐ If you like this project, consider giving it a star!