This project implements a Hand Gesture Recognition System using computer vision and deep learning. The system captures hand gestures in real-time, processes them with Mediapipe, and classifies them into predefined categories using a trained PyTorch model.
- Real-Time Hand Gesture Detection: Uses Mediapipe to extract hand landmarks from live video.
- Customizable Dataset: Collects and processes gesture data for training.
- Deep Learning Model: Implements a PyTorch-based neural network for gesture classification.
- Interactive Inference: Displays bounding boxes and predicted gestures live on the webcam feed.
- Python >= 3.7
- Libraries:
opencv-pythonmediapipenumpytorchscikit-learnpickle
Install the dependencies using pip:
pip install opencv-python mediapipe numpy torch scikit-learn- Data Collection: Captures gesture data and saves it as labeled images.
- Data Processing: Extracts normalized hand landmarks and stores them in a structured format.
- Model Training: Trains a PyTorch model on the processed data.
- Model Testing: Evaluates the model’s performance on unseen data.
- Live Inference: Runs real-time gesture detection and classification.
- Start the data collection script:
python data_collection.py
- Use the webcam to capture gestures for each class. Press 'Q' when ready to start capturing data.
- Each class’s data is stored in a separate folder within
./data.
- Run the data processing script:
python data_processing.py
- This extracts hand landmarks using Mediapipe, normalizes them, and saves the data in
data.pickle.
- Train the gesture classification model:
python model_training.py
- The trained model is saved as
model.pth.
- Run the live inference script:
python live_inference.py
- The script displays the webcam feed with predictions and bounding boxes for recognized gestures. Press 'Q' to quit.
- Number of Classes: Adjust the
number_of_classesvariable in the data collection script. - Dataset Size: Modify
dataset_sizeto control the number of samples per class. - Model Parameters: Change the input size, hidden size, and output size in the model training script to suit your dataset.
- Mediapipe Hands: Detects hand landmarks and connections.
- PyTorch Model: A simple feedforward neural network for gesture classification.
- Label Mapping: Customize labels for gestures in the
labels_dictdictionary.
- Real-time webcam feed with:
- Bounding Box: Highlights detected hands.
- Predicted Gesture: Displays the recognized gesture above the bounding box.
- Lighting and Backgrounds: Train the model with diverse lighting conditions and backgrounds for robustness.
- Class Overlap: Ensure gestures are visually distinct for better accuracy.
- Additional Classes: Expand the dataset to include more gestures.
This project leverages:
- Mediapipe for efficient hand tracking and landmark detection.
- PyTorch for building and training the neural network.
This project is open-source under the MIT License. Contributions and improvements are welcome!
For questions or contributions, please open an issue or pull request on GitHub.