Skip to content

fuseai-fellowship/Football-Commentary-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sanjaya Uwacha: AI-Powered Football Commentary Generation πŸŽ™οΈβš½

Python PyTorch License

An intelligent system that automatically generates real-time football commentary by analyzing match footage using computer vision and natural language processing.

πŸ“‹ Table of Contents

🎯 Overview

Sanjaya Uwacha is an AI-driven football commentary generation system that transforms raw match footage into engaging, context-aware commentary. The system leverages state-of-the-art computer vision models for player and ball detection, tracking algorithms for possession analysis, and natural language generation for creating dynamic commentary.

Key Capabilities

  • Real-time Player Detection & Tracking: Identifies and tracks all players, referees, and the ball
  • Event Analysis: Determines what event has occured during the football match and generates commentary
  • Automated Commentary Generation: Creates natural language commentary based on detected events

✨ Features

Current Implementation (MVP Phase 1)

  • βœ… Player & Ball Detection: YOLOv8-based object detection using Roboflow API
  • βœ… Multi-Object Tracking: ByteTrack algorithm for consistent player identification
  • βœ… Player Identification: Individual player name mapping and labeling
  • βœ… Possession Detection: Distance-based algorithm to determine ball control
  • βœ… Basic Commentary: Text-to-speech commentary for possession changes
  • βœ… Video Annotation: Visual overlay of detections, tracking IDs, and possession indicators

Upcoming Features (MVP Phase 2-3)

  • πŸ”„ Advanced Event Detection: Goals, passes, fouls, corners, throw-ins
  • πŸ”„ Team Classification: Automatic team identification using color analysis
  • πŸ”„ Contextual Commentary: LLM-powered natural commentary generation
  • πŸ”„ Action Classification: Detailed activity recognition (shooting, dribbling, tackling)
  • πŸ”„ Multi-language Support: Commentary in English, Hindi, and Nepali

πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     INPUT VIDEO STREAM                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              COMPUTER VISION MODULE                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   YOLOv8     β”‚β†’ β”‚  ByteTrack   β”‚β†’ β”‚  Event       β”‚     β”‚
β”‚  β”‚  Detection   β”‚  β”‚   Tracking   β”‚  β”‚  Analysis    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              EVENT DETECTION MODULE                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Temporal   β”‚β†’ β”‚   Action     β”‚β†’ β”‚    Event     β”‚     β”‚
β”‚  β”‚   Analysis   β”‚  β”‚ Recognition  β”‚  β”‚ Classificationβ”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         NATURAL LANGUAGE GENERATION MODULE                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚    LLM       β”‚β†’ β”‚   Context    β”‚β†’ β”‚     TTS      β”‚     β”‚
β”‚  β”‚              β”‚  β”‚   Builder    β”‚  β”‚  Synthesis   β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              AUDIO-VIDEO SYNCHRONIZATION                     β”‚
β”‚                    & OUTPUT GENERATION                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU (recommended for real-time processing)
  • FFmpeg (for video processing)

Setup Instructions

  1. Clone the repository
git clone https://github.com/fuseai-fellowship/Football-Commentary-Generation.git
cd Football-Commentary-Generation
  1. Install uv (if not already installed)
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or using pip
pip install uv
  1. Install project dependencies
# Install all dependencies including dev dependencies
uv sync

# Or install only production dependencies
uv sync --no-dev
  1. Configure environment variables
cp .env.example .env
# Edit .env and add your Roboflow API key
ROBOFLOW_API_KEY=your_api_key_here
  1. Install FFmpeg (if not already installed)
# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Alternative: Using pip

If you prefer using pip instead of uv:

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies using pyproject.toml
pip install -e .

# For development dependencies
pip install -e ".[dev]"

πŸ’» Usage

Running the Notebooks

# Navigate to notebooks directory
cd notebooks

# Run basic detection and tracking
jupyter notebook first_step.ipynb

# Run possession detection with commentary
jupyter notebook possession_with_commentary.ipynb

πŸ“ Project Structure

Football-Commentary-Generation/
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ first_step.ipynb                  # Basic detection & tracking
β”‚   β”œβ”€β”€ possession_with_commentary.ipynb  # Possession + commentary
β”‚   β”œβ”€β”€ tracking_experiments/             # Tracker comparison experiments
β”‚   β”‚   β”œβ”€β”€ 01_deepsort_experiment.ipynb
β”‚   β”‚   β”œβ”€β”€ 02_sort_experiment.ipynb
β”‚   β”‚   β”œβ”€β”€ 03_comparison_analysis.ipynb
β”‚   β”‚   └── tracker_utils.py
β”‚   └── play.mp4                          # Sample input video
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ detection/
β”‚   β”‚   β”œβ”€β”€ player_detector.py            # Player detection module
β”‚   β”‚   └── ball_detector.py              # Ball detection module
β”‚   β”œβ”€β”€ tracking/
β”‚   β”‚   └── tracker.py                    # Multi-object tracking
β”‚   β”œβ”€β”€ possession/
β”‚   β”‚   └── possession_analyzer.py        # Possession detection
β”‚   └── commentary/
β”‚       β”œβ”€β”€ event_detector.py             # Event detection
β”‚       └── text_generator.py             # NLG module
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                              # Raw video inputs
β”‚   β”œβ”€β”€ processed/                        # Processed outputs
β”‚   └── annotations/                      # Training annotations
β”‚
β”œβ”€β”€ results/
β”‚   β”œβ”€β”€ videos/                           # Output videos
β”‚   β”œβ”€β”€ metrics/                          # Performance metrics
β”‚   └── comparison_report.md              # Tracker comparison report
β”‚
β”œβ”€β”€ .gitignore                            # Git ignore rules
β”œβ”€β”€ requirements.txt                      # Python dependencies
β”œβ”€β”€ .env.example                          # Environment variables template
β”œβ”€β”€ readme.md                             # This file
└── LICENSE                               # MIT License

πŸ”§ Technical Implementation

1. Object Detection & Tracking

Model: YOLOv8 (via Roboflow API)

  • Classes: Players (Team A, Team B, Goalkeepers), Referees (Main, Assistant), Ball
  • Input Resolution: 640Γ—640 pixels
  • Confidence Threshold: 0.3
  • NMS Threshold: 0.5

Tracking Algorithm: ByteTrack

  • Features: Robust multi-object tracking with occlusion handling
  • Frame-to-frame association using Kalman filtering
  • Handles: Player identity consistency across frames

2. Possession Detection

Algorithm: Distance-based proximity detection

def detect_possession(player_centers, ball_center, threshold=100):
    distances = euclidean_distance(player_centers, ball_center)
    return argmin(distances) if min(distances) < threshold else None

Parameters:

  • Proximity threshold: 100 pixels (adjustable)
  • Update frequency: Per frame (30 FPS)

3. Commentary Generation

Current: Text-to-Speech (gTTS)

  • Generates audio for player names on possession change
  • Synchronizes with video timestamp using FFmpeg

Future: LLM-based generation (Gemini/GPT-4)

  • Context-aware commentary
  • Multiple commentary styles (analytical, entertaining)
  • Real-time event descriptions

4. Video Processing Pipeline

1. Frame Extraction (30 FPS)
2. Object Detection (YOLOv8)
3. Tracking Update (ByteTrack)
4. Possession Analysis
5. Event Detection [Future]
6. Commentary Generation
7. Audio Overlay (FFmpeg)
8. Output Video Creation

5. Performance Optimization

  • GPU Acceleration: CUDA-enabled inference using ONNX Runtime
  • Batch Processing: Frame batching for efficient GPU utilization
  • Model Optimization: Quantization and pruning for faster inference

πŸ—“οΈ Roadmap

Phase 1: Basic Detection & Tracking βœ… (Completed)

  • Player detection using YOLOv8
  • Ball detection and tracking
  • ByteTrack integration
  • Player identification system
  • Basic possession detection
  • Text-to-speech commentary

Phase 2: Event Detection & Commentary πŸ”„ (In Progress)

  • Team classification using color clustering
  • Advanced event detection (passes, shots, fouls)
  • Temporal action recognition
  • Tracker comparison experiments (ByteTrack vs DeepSORT vs SORT)
  • Enhanced possession accuracy
  • Contextual commentary generation

Phase 3: Advanced NLG & Multi-language πŸ“… (Planned)

  • LLM integration (Gemini/GPT-4)
  • Context-aware commentary
  • Multi-language support (Hindi, Nepali)
  • Real-time streaming capability
  • Performance optimization

Phase 4: Production Deployment 🎯 (Future)

  • Model quantization for edge devices
  • REST API development
  • Web-based interface
  • Cloud deployment
  • Mobile application

πŸ“Š Performance Metrics

TBC later

πŸ§ͺ Tracking Algorithm Experiments

As part of our research phase, we're conducting comprehensive experiments to compare different tracking algorithms:

Algorithm Pros Cons Use Case
ByteTrack Fast, handles occlusions well Requires tuning Real-time tracking
DeepSORT High accuracy, uses appearance Slower, GPU intensive Offline processing
SORT Very fast, simple Lower accuracy Quick prototyping

Experiment Results: Coming soon in results/comparison_report.md

🀝 Contributing

We welcome contributions! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guide for Python code
  • Add docstrings to all functions and classes
  • Write unit tests for new features
  • Update documentation as needed

πŸ‘₯ Team

FuseAI Fellowship - Football Commentary Generation Team

  • Bijay Shrestha
  • Sudip Shrestha

Mentor: Sushil Dyopla
Program: FuseAI Fellowship 2024

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Roboflow for object detection infrastructure
  • Supervision for computer vision utilities
  • ByteTrack for multi-object tracking
  • gTTS for text-to-speech synthesis
  • FFmpeg for audio-video processing
  • FuseAI Fellowship for mentorship and support

πŸ“š References

  1. Zhang, Y., et al. (2022). "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"
  2. Wojke, N., et al. (2017). "Simple Online and Realtime Tracking with a Deep Association Metric"
  3. Redmon, J., et al. (2016). "You Only Look Once: Unified, Real-Time Object Detection"

πŸ“§ Contact

For questions, suggestions, or collaboration opportunities:

🌟 Support

If you find this project useful, please consider:

  • ⭐ Starring the repository
  • πŸ› Reporting bugs and issues
  • πŸ’‘ Suggesting new features
  • πŸ“– Improving documentation
  • 🀝 Contributing code

Note: This project is under active development as part of the FuseAI Fellowship program. Features and documentation are continuously being updated. Star ⭐ the repo to stay updated with the latest developments!

Last Updated: October 2024

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors