Skip to content

The iris_visualization.ipynb notebook analyzes the Iris dataset using Python, creating visualizations like scatter plots, histograms, and correlation heatmaps to explore species differences. It uses libraries such as pandas, matplotlib, seaborn, numpy, and joypy for data maniplation and plotting. Repo includes Iris.csv, HTML output for easy sharing

Notifications You must be signed in to change notification settings

pranavasree/Big_Data_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Visualization Using Different Graph Types Project

Overview

This project, iris_visualization.ipynb, focuses on visualizing the Iris dataset using various graph types to analyze differences between species (Iris-setosa, Iris-versicolor, Iris-virginica). It includes visualizations like histograms, scatter plots, box plots, ridgeline plots, line graphs, bar charts, pie charts, and correlation heatmaps, created with Python libraries.

The project was developed as part of a Big Data course by Group 13:

  • Harshaanjaneyavarma Samanthapudi
  • Pranava Sree Pottipati
  • Sana Reddy Gaddam
  • Vijaya Raghava Ponnganti
  • Ravi Chandra PolavarapuRishik Pendurthi.

Dataset

The Iris dataset (Iris.csv) contains 150 samples (50 per species) with four measurements: sepal length, sepal width, petal length, and petal width, in centimeters. The dataset is included in the repository.

Prerequisites

  • Python 3.13.2 (or a compatible version)
  • Visual Studio Code (recommended IDE with Jupyter support)
  • Git (optional, for cloning the repository)

Setup Instructions

Clone the Repository (if applicable):

git clone https://github.com/your-repo/iris-visualization.git
cd iris-visualization

Install Python:

Download and install Python from python.org. Verify the installation:

python --version

Set Up a Virtual Environment:

Create a virtual environment in the project folder:

python -m venv .venv

Activate the virtual environment:

  • On Windows:
    .venv\Scripts\activate
  • On macOS/Linux:
    source .venv/bin/activate

Install Required Libraries:

Install the necessary Python libraries:

pip install pandas matplotlib seaborn numpy joypy jupyter

Install specific versions for compatibility:

pip install pandas==2.2.2 matplotlib==3.8.4 seaborn==0.13.2 numpy==1.26.4 joypy==0.2.6

Set Up VS Code:

Open VS Code and install the Python and Jupyter extensions:
Go to Extensions (Ctrl+Shift+X), search for "Python" and "Jupyter," and install both by Microsoft.

Open iris_visualization.ipynb in VS Code.
Select the Python interpreter from the virtual environment (bottom-left corner in VS Code, choose .venv interpreter).
If prompted, install additional Jupyter dependencies like ipykernel.

Usage

Run the Notebook:

Open iris_visualization.ipynb in VS Code.
Run all cells (Run All button or Shift+Enter for each cell) to generate visualizations. Outputs include:

  • Tables (e.g., group2.head(6) for sample data)
  • Plots saved as images (e.g., histogram_view.png, scatter_sepal.png)

Generate HTML Output:

Convert the notebook to HTML for sharing:

jupyter nbconvert --to html iris_visualization.ipynb

This creates iris_visualization.html in the project folder, containing all code, explanations, and plots.

View Results:

  • Open the saved images (e.g., histogram_view.png) to see the visualizations.
  • Open iris_visualization.html in a web browser for a complete report.

Project Structure

  • iris_visualization.ipynb: The main Jupyter notebook with code and visualizations.
  • Iris.csv: The Iris dataset file.
  • *.png: Generated plot images (e.g., scatter_sepal.png, correlation_matrix.png).
  • iris_visualization.html: HTML output of the notebook (after conversion).

Visualizations Included

  • Histograms: Show the distribution of measurements (e.g., sepal length).
  • Density and Ridgeline Plots: Compare distributions across species.
  • Line, Bar, and Pie Charts: Display averages and counts of species.
  • Box and Violin Plots: Summarize measurement distributions by species.
  • Scatter Plots: Show relationships between measurements (e.g., sepal length vs. width).
  • Correlation Heatmaps: Visualize relationships between all measurements.

Libraries Used

  • pandas: For data loading and manipulation
  • matplotlib: For creating and customizing plots
  • seaborn: For high-level statistical visualizations
  • numpy: For numerical computations
  • joypy: For ridgeline plots

Presentation

A PowerPoint presentation (Final-Presentation-Group-13.pptx) with 22 slides summarizes the project findings and visualizations.
The notebook can be presented by running cells in VS Code while showing the slides, as outlined in the presentation script.

Project Walkthrough

License

This project is licensed under the MIT License && Group13 [2025 Spring]. See the LICENSE file for details.

Contact

For questions or feedback, reach out to Group 13 at:

About

The iris_visualization.ipynb notebook analyzes the Iris dataset using Python, creating visualizations like scatter plots, histograms, and correlation heatmaps to explore species differences. It uses libraries such as pandas, matplotlib, seaborn, numpy, and joypy for data maniplation and plotting. Repo includes Iris.csv, HTML output for easy sharing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published