This project contains a PyTorch-based LSTM model for real-time sign language detection using MediaPipe keypoints.
improved_model.py- Improved LSTM model definition and training script with data augmentationload_model.py- Model loading and real-time predictionimproved_sign_language_model.pth- Trained model weights (12.7MB)requirements_pytorch.txt- Python dependencies
main.py- Data collection script for capturing sign language gesturesutils.py- Utility functions for MediaPipe detection and keypoint extraction
MP_Data/- Training data directory containing keypoint sequences for:hello/- Hello gesture data (30 sequences)thanks/- Thanks gesture data (30 sequences)iloveyou/- I love you gesture data (30 sequences)
env/- Virtual environment with all dependencies
# Create and activate virtual environment
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
# Install dependencies
pip install -r requirements_pytorch.txtpython improved_model.pypython load_model.pyThe improved model uses a robust LSTM-based architecture with data augmentation:
- Input: 30 frames × 1662 keypoints (pose + face + hands)
- LSTM Layers: 2-layer LSTM with 128 hidden units and dropout (0.2)
- Fully Connected: 128 → 64 → 3 units with ReLU activation
- Regularization: Dropout (0.3) at multiple layers
- Output: 3 classes (hello, thanks, iloveyou) with softmax
- Data Augmentation: Noise addition and time shifting
- Class Balancing: 3x augmentation for "hello" class
- Early Stopping: Prevents overfitting
- Learning Rate Scheduling: Adaptive learning rate
- Validation: 20% test split with stratification
- Training Accuracy: 100%
- Validation Accuracy: 100%
- Test Accuracy: 100% on all classes
- Model Size: 12.7MB
- Hello - Wave gesture
- Thanks - Thank you gesture
- I Love You - ILY sign gesture
- Run
python main.pyto collect training data - Follow the on-screen instructions to record gestures
- Each gesture requires 30 sequences of 30 frames each
- Run
python improved_model.pyto train the model - Training includes data augmentation and validation
- Best model is automatically saved as
improved_sign_language_model.pth
- Run
python load_model.pyfor webcam prediction - Make sign language gestures in front of the camera
- Ensure good lighting and clear hand visibility
- Press 'q' to quit
from load_model import load_model, predict_sign
# Load the trained model
model, actions = load_model()
# Make predictions
predicted_sign, confidence, probabilities = predict_sign(model, actions, sequence_data)- Python 3.8+
- PyTorch 2.0+
- MediaPipe
- OpenCV
- NumPy
- Scikit-learn
- TensorBoard (for training logs)
See requirements_pytorch.txt for exact versions.
- Ensure good lighting for MediaPipe detection
- Position hands clearly in front of the camera
- Use exact same gestures as training data
- Check hand visibility - hands must be fully visible
- The model achieves 100% accuracy on training data
- Real-time performance depends on webcam conditions
- Lower confidence threshold if needed for real-time use
sign_language_detection/
├── improved_model.py # Training script
├── improved_sign_language_model.pth # Model file
├── load_model.py # Model loading & prediction
├── main.py # Data collection
├── utils.py # Utilities
├── requirements_pytorch.txt # Dependencies
├── README.md # This file
└── MP_Data/ # Training data
├── hello/ # Hello gestures
├── thanks/ # Thanks gestures
└── iloveyou/ # I love you gestures