🏏 Batting Through Numbers: A Data-Driven Analysis of Cricket

Welcome to our repository where data meets passion in the realm of cricket! This project is a detailed exploration of cricket analytics, where we use data science to uncover hidden patterns and optimize strategies.

🌟 Project Overview

"Batting Through Numbers" is not just about cricket analytics; it's a blend of our love for the game and statistical analysis. We delve deep into datasets to discover what lies beneath the surface of scorecards and player statistics.

🎯 Research Questions Addressed

Association Between Batting Position and Dismissal:
- We investigate how a player’s batting position correlates with their likelihood of getting dismissed, providing insights that could influence batting order decisions.
Distribution of Dismissal Types for Top-Scoring Batsmen:
- This analysis focuses on understanding how top performers typically get out, which may help coaches tailor specific training to mitigate these dismissals.
Forecasting Batting Performance Using Machine Learning:
- By employing historical data on similar batting positions, we develop predictive models to forecast batting performances, offering a tool for strategy formulation.

🎯 Objectives

Analyzing T20 Ball-by-Ball Data: Using detailed ESPN datasets, we uncover complex play patterns and performance metrics.
Predictive Modeling: We build models to predict match outcomes for the LUMS cricket team and other international teams, leveraging historical performance data.
Cricket Storytelling: Beyond numbers, we narrate stories of cricket matches, offering insights that are often overlooked.

📂 Repository Structure

models: Contains machine learning models saved as .pkl files for various international cricket teams.
data: Includes datasets such as:
- ball_by_ball_it20(train dataset).csv: Detailed ball-by-ball play data for training our models.
- LUMS Cricket Dataset - Batting.csv: Specific dataset for batting performance of the LUMS team.
- LUMS Cricket Dataset - Balling.csv: Specific dataset for bowling performance of the LUMS team.
CricketDataAnalysis.ipynb: Jupyter notebook containing all the analysis, from data preprocessing to model training and evaluation.

🔧 Tools & Technologies Used

Python: For all computing and data analysis.
Pandas & NumPy: For data manipulation.
Matplotlib & Seaborn: For data visualization.
Scikit-Learn: For predictive modeling and machine learning.
Jupyter Notebook: For interactive code development and testing.

🔧 Data Preparation and Transformation

Data Creation

Data Prep:
- Loaded, cleaned, and simplified batting/bowling datasets.
- Handled null values and dropped unnecessary columns.
Data Transformation:
- Developed specific functions for dataset generation.
- Imputed numeric nulls, mapped categorical data, and standardized columns.
Summarized Data:
- Condensed ball-by-ball details into summarized player stats per match.
- Key metrics include runs, balls faced, batting position, etc.
Result:
- Created a concise dataset focusing on player performance in matches.
- Columns include: batter, runs, ballsPlayed, howOut, battingPosition, teamName, batFirst, targetScore, extraRuns, winner, matchId.

📈 Statistical Analysis, Results, and Key Findings

Statistical Test

Chi-square statistic: 324.20
P-value: 3.84e-62

Average Treatment Effect (ATE)

ATE: 0.177
Inference: Top-scoring batsmen exhibit a higher average dismissal type compared to non-top-scorers.
Statistical Analysis: We performed extensive statistical tests, like the Chi-squared test, to explore the association between a player's batting position and their likelihood of dismissal.
Predictive Performance: Our models, trained on diverse datasets, predict outcomes with high accuracy, demonstrated by metrics like RMSE (Root Mean Squared Error).
Strategic Insights: Recommendations on batting order and player positioning that can significantly affect match outcomes.

🚀 How to Use This Repository

Clone the repository:

git clone https://github.com/yourusername/CricketDataAnalysis.git

Install the necessary Python packages:
```
 pip install -r requirements.txt
```
Run the Jupyter notebooks:
```
 jupyter notebook
```

📝 Conclusion

Our project bridges the gap between raw data and actionable cricket insights. It demonstrates how data-driven strategies can be applied to enhance understanding and performance in sports analytics.

📚 Additional Resources

Final Project Blog: Complete insights into the project. Read here
EDA Blog: Deep dive into the exploratory data analysis. Read here

🤝 Contributing

Feel free to fork this repo, suggest changes via pull requests, or discuss ideas in issues. Your contributions are always welcome!

🌐 Contact

For any further questions, please email us at [email protected], or raise an issue in this repository.

⭐ Support

If you find this repository helpful, please consider giving it a star to help others find it!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
Batting Through Numbers A Data-Driven Analysis of Cricket.pdf		Batting Through Numbers A Data-Driven Analysis of Cricket.pdf
Blog Links.pdf		Blog Links.pdf
Group25_final.ipynb		Group25_final.ipynb
LUMS Cricket Dataset - Balling.csv		LUMS Cricket Dataset - Balling.csv
LUMS Cricket Dataset - Batting.csv		LUMS Cricket Dataset - Batting.csv
README.md		README.md
ball_by_ball_it20(train dataset).csv		ball_by_ball_it20(train dataset).csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏏 Batting Through Numbers: A Data-Driven Analysis of Cricket

🌟 Project Overview

🎯 Research Questions Addressed

🎯 Objectives

📂 Repository Structure

🔧 Tools & Technologies Used

🔧 Data Preparation and Transformation

Data Creation

📈 Statistical Analysis, Results, and Key Findings

Statistical Test

Average Treatment Effect (ATE)

🚀 How to Use This Repository

📝 Conclusion

📚 Additional Resources

🤝 Contributing

🌐 Contact

⭐ Support

About

Uh oh!

Releases

Packages

Languages

Fayzan-Ali-Akhtar/T20-Cricket-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

🏏 Batting Through Numbers: A Data-Driven Analysis of Cricket

🌟 Project Overview

🎯 Research Questions Addressed

🎯 Objectives

📂 Repository Structure

🔧 Tools & Technologies Used

🔧 Data Preparation and Transformation

Data Creation

📈 Statistical Analysis, Results, and Key Findings

Statistical Test

Average Treatment Effect (ATE)

🚀 How to Use This Repository

📝 Conclusion

📚 Additional Resources

🤝 Contributing

🌐 Contact

⭐ Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages