Sanjaya Uwacha: AI-Powered Football Commentary Generation 🎙️⚽

An intelligent system that automatically generates real-time football commentary by analyzing match footage using computer vision and natural language processing.

📋 Table of Contents

Overview
Features
System Architecture
Installation
Usage
Project Structure
Technical Implementation
Roadmap
Contributing
Team

🎯 Overview

Sanjaya Uwacha is an AI-driven football commentary generation system that transforms raw match footage into engaging, context-aware commentary. The system leverages state-of-the-art computer vision models for player and ball detection, tracking algorithms for possession analysis, and natural language generation for creating dynamic commentary.

Key Capabilities

Real-time Player Detection & Tracking: Identifies and tracks all players, referees, and the ball
Event Analysis: Determines what event has occured during the football match and generates commentary
Automated Commentary Generation: Creates natural language commentary based on detected events

✨ Features

Current Implementation (MVP Phase 1)

✅ Player & Ball Detection: YOLOv8-based object detection using Roboflow API
✅ Multi-Object Tracking: ByteTrack algorithm for consistent player identification
✅ Player Identification: Individual player name mapping and labeling
✅ Possession Detection: Distance-based algorithm to determine ball control
✅ Basic Commentary: Text-to-speech commentary for possession changes
✅ Video Annotation: Visual overlay of detections, tracking IDs, and possession indicators

Upcoming Features (MVP Phase 2-3)

🔄 Advanced Event Detection: Goals, passes, fouls, corners, throw-ins
🔄 Team Classification: Automatic team identification using color analysis
🔄 Contextual Commentary: LLM-powered natural commentary generation
🔄 Action Classification: Detailed activity recognition (shooting, dribbling, tackling)
🔄 Multi-language Support: Commentary in English, Hindi, and Nepali

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     INPUT VIDEO STREAM                       │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│              COMPUTER VISION MODULE                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   YOLOv8     │→ │  ByteTrack   │→ │  Event       │     │
│  │  Detection   │  │   Tracking   │  │  Analysis    │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│              EVENT DETECTION MODULE                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Temporal   │→ │   Action     │→ │    Event     │     │
│  │   Analysis   │  │ Recognition  │  │ Classification│     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│         NATURAL LANGUAGE GENERATION MODULE                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │    LLM       │→ │   Context    │→ │     TTS      │     │
│  │              │  │   Builder    │  │  Synthesis   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│              AUDIO-VIDEO SYNCHRONIZATION                     │
│                    & OUTPUT GENERATION                       │
└─────────────────────────────────────────────────────────────┘

🚀 Installation

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (recommended for real-time processing)
FFmpeg (for video processing)

Setup Instructions

Clone the repository

git clone https://github.com/fuseai-fellowship/Football-Commentary-Generation.git
cd Football-Commentary-Generation

Install uv (if not already installed)

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or using pip
pip install uv

Install project dependencies

# Install all dependencies including dev dependencies
uv sync

# Or install only production dependencies
uv sync --no-dev

Configure environment variables

cp .env.example .env
# Edit .env and add your Roboflow API key
ROBOFLOW_API_KEY=your_api_key_here

Install FFmpeg (if not already installed)

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Alternative: Using pip

If you prefer using pip instead of uv:

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies using pyproject.toml
pip install -e .

# For development dependencies
pip install -e ".[dev]"

💻 Usage

Running the Notebooks

# Navigate to notebooks directory
cd notebooks

# Run basic detection and tracking
jupyter notebook first_step.ipynb

# Run possession detection with commentary
jupyter notebook possession_with_commentary.ipynb

📁 Project Structure

Football-Commentary-Generation/
│
├── notebooks/
│   ├── first_step.ipynb                  # Basic detection & tracking
│   ├── possession_with_commentary.ipynb  # Possession + commentary
│   ├── tracking_experiments/             # Tracker comparison experiments
│   │   ├── 01_deepsort_experiment.ipynb
│   │   ├── 02_sort_experiment.ipynb
│   │   ├── 03_comparison_analysis.ipynb
│   │   └── tracker_utils.py
│   └── play.mp4                          # Sample input video
│
├── src/
│   ├── detection/
│   │   ├── player_detector.py            # Player detection module
│   │   └── ball_detector.py              # Ball detection module
│   ├── tracking/
│   │   └── tracker.py                    # Multi-object tracking
│   ├── possession/
│   │   └── possession_analyzer.py        # Possession detection
│   └── commentary/
│       ├── event_detector.py             # Event detection
│       └── text_generator.py             # NLG module
│
├── data/
│   ├── raw/                              # Raw video inputs
│   ├── processed/                        # Processed outputs
│   └── annotations/                      # Training annotations
│
├── results/
│   ├── videos/                           # Output videos
│   ├── metrics/                          # Performance metrics
│   └── comparison_report.md              # Tracker comparison report
│
├── .gitignore                            # Git ignore rules
├── requirements.txt                      # Python dependencies
├── .env.example                          # Environment variables template
├── readme.md                             # This file
└── LICENSE                               # MIT License

🔧 Technical Implementation

1. Object Detection & Tracking

Model: YOLOv8 (via Roboflow API)

Classes: Players (Team A, Team B, Goalkeepers), Referees (Main, Assistant), Ball
Input Resolution: 640×640 pixels
Confidence Threshold: 0.3
NMS Threshold: 0.5

Tracking Algorithm: ByteTrack

Features: Robust multi-object tracking with occlusion handling
Frame-to-frame association using Kalman filtering
Handles: Player identity consistency across frames

2. Possession Detection

Algorithm: Distance-based proximity detection

def detect_possession(player_centers, ball_center, threshold=100):
    distances = euclidean_distance(player_centers, ball_center)
    return argmin(distances) if min(distances) < threshold else None

Parameters:

Proximity threshold: 100 pixels (adjustable)
Update frequency: Per frame (30 FPS)

3. Commentary Generation

Current: Text-to-Speech (gTTS)

Generates audio for player names on possession change
Synchronizes with video timestamp using FFmpeg

Future: LLM-based generation (Gemini/GPT-4)

Context-aware commentary
Multiple commentary styles (analytical, entertaining)
Real-time event descriptions

4. Video Processing Pipeline

1. Frame Extraction (30 FPS)
2. Object Detection (YOLOv8)
3. Tracking Update (ByteTrack)
4. Possession Analysis
5. Event Detection [Future]
6. Commentary Generation
7. Audio Overlay (FFmpeg)
8. Output Video Creation

5. Performance Optimization

GPU Acceleration: CUDA-enabled inference using ONNX Runtime
Batch Processing: Frame batching for efficient GPU utilization
Model Optimization: Quantization and pruning for faster inference

🗓️ Roadmap

Phase 1: Basic Detection & Tracking ✅ (Completed)

Phase 2: Event Detection & Commentary 🔄 (In Progress)

Team classification using color clustering
Advanced event detection (passes, shots, fouls)
Temporal action recognition
Tracker comparison experiments (ByteTrack vs DeepSORT vs SORT)
Enhanced possession accuracy
Contextual commentary generation

Phase 3: Advanced NLG & Multi-language 📅 (Planned)

LLM integration (Gemini/GPT-4)
Context-aware commentary
Multi-language support (Hindi, Nepali)
Real-time streaming capability
Performance optimization

Phase 4: Production Deployment 🎯 (Future)

📊 Performance Metrics

TBC later

🧪 Tracking Algorithm Experiments

As part of our research phase, we're conducting comprehensive experiments to compare different tracking algorithms:

Algorithm	Pros	Cons	Use Case
ByteTrack	Fast, handles occlusions well	Requires tuning	Real-time tracking
DeepSORT	High accuracy, uses appearance	Slower, GPU intensive	Offline processing
SORT	Very fast, simple	Lower accuracy	Quick prototyping

Experiment Results: Coming soon in results/comparison_report.md

🤝 Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guide for Python code
Add docstrings to all functions and classes
Write unit tests for new features
Update documentation as needed

👥 Team

FuseAI Fellowship - Football Commentary Generation Team

Bijay Shrestha
Sudip Shrestha

Mentor: Sushil Dyopla
Program: FuseAI Fellowship 2024

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Roboflow for object detection infrastructure
Supervision for computer vision utilities
ByteTrack for multi-object tracking
gTTS for text-to-speech synthesis
FFmpeg for audio-video processing
FuseAI Fellowship for mentorship and support

📚 References

Zhang, Y., et al. (2022). "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"
Wojke, N., et al. (2017). "Simple Online and Realtime Tracking with a Deep Association Metric"
Redmon, J., et al. (2016). "You Only Look Once: Unified, Real-Time Object Detection"

📧 Contact

For questions, suggestions, or collaboration opportunities:

Email: sudipshrestha2051219@gmail.com, bijay17khadka@gmail.com
Project Repository: GitHub
Issues: GitHub Issues

🌟 Support

If you find this project useful, please consider:

⭐ Starring the repository
🐛 Reporting bugs and issues
💡 Suggesting new features
📖 Improving documentation
🤝 Contributing code

Note: This project is under active development as part of the FuseAI Fellowship program. Features and documentation are continuously being updated. Star ⭐ the repo to stay updated with the latest developments!

Last Updated: October 2024

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
be		be
notebooks		notebooks
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
main.py		main.py
pyproject.toml		pyproject.toml
readme.md		readme.md
systemarchitecture.png		systemarchitecture.png
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Sanjaya Uwacha: AI-Powered Football Commentary Generation 🎙️⚽

📋 Table of Contents

🎯 Overview

Key Capabilities

✨ Features

Current Implementation (MVP Phase 1)

Upcoming Features (MVP Phase 2-3)

🏗️ System Architecture

🚀 Installation

Prerequisites

Setup Instructions

Alternative: Using pip

💻 Usage

Running the Notebooks

📁 Project Structure

🔧 Technical Implementation

1. Object Detection & Tracking

2. Possession Detection

3. Commentary Generation

4. Video Processing Pipeline

5. Performance Optimization

🗓️ Roadmap

Phase 1: Basic Detection & Tracking ✅ (Completed)

Phase 2: Event Detection & Commentary 🔄 (In Progress)

Phase 3: Advanced NLG & Multi-language 📅 (Planned)

Phase 4: Production Deployment 🎯 (Future)

📊 Performance Metrics

🧪 Tracking Algorithm Experiments

🤝 Contributing

Development Guidelines

👥 Team

📄 License

🙏 Acknowledgments

📚 References

📧 Contact

🌟 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages