A comprehensive Video Management System with AI-powered object detection, face recognition, and intelligent video analysis capabilities. Built with Python, PySide6 (Qt6), and ONNX Runtime for real-time inference.
This VMS Client is a desktop application that provides:
- Real-time video detection from cameras or video files
- AI-powered object detection using YOLO-based ONNX models
- Face recognition with database management
- Video playback analysis with natural language queries
- Detection database for storing and querying results
- SOS alert system for security events
- Multi-model support with configurable detection classes
VMSPython/
├── vms_gui.py # Main entry point
├── register_face.py # Face registration utility
├── requirements.txt # Python dependencies
│
├── vms_gui/ # Main application package
│ ├── app.py # Main application window
│ ├── config.py # Configuration and constants
│ │
│ ├── detection/ # Detection engine
│ │ ├── engine.py # DetectionEngine, ONNXRunner, VideoCapture
│ │ ├── face_recognition.py # FaceRecognizer with embedding extraction
│ │ ├── face_database.py # Face embeddings database
│ │ └── detection_database.py # Detection results database
│ │
│ └── gui/ # GUI components
│ ├── components.py # TopBar, BottomBar UI components
│ ├── model_config.py # Model configuration panel
│ ├── video_display.py # Live video display widget
│ ├── video_player.py # Video playback widget
│ ├── results_panel.py # Detection results display
│ ├── chatbot.py # Natural language query interface
│ └── gemini_parser.py # Query parser (Gemini API + fallback)
│
├── models/ # ONNX model files
│ ├── best.onnx # Face detection model
│ ├── w600k_mbf.onnx # Face recognition/embedding model
│ ├── yolo11npRETRAINED.onnx # General object detection (COCO)
│ └── Fire_Event_best.onnx # Fire/smoke detection
│
└── storage/ # Data storage
└── db/ # SQLite databases
├── app.db # Application database
├── events.sqlite # Events database
├── face_embeddings.db # Face recognition database
└── detection_results.db # Detection results database
- Video Input →
VideoCapture(supports cameras, video files, V4L2 devices) - Frame Processing →
DetectionEngine.process_frame() - Model Inference →
ONNXRunner.infer()(multiple models supported) - Face Recognition →
FaceRecognizer.recognize_face()(if enabled) - Results Storage →
DetectionDatabase.save_detection() - GUI Display →
VideoDisplay/VideoPlayerwith real-time visualization - Query System →
ChatBot→QueryParser→ Database queries
- Real-time camera feed with AI detection overlay
- Support for USB webcams, V4L2 devices (Linux/Raspberry Pi), and video files
- Configurable resolution and frame rate
- Multiple camera source selection
- Bounding box visualization with class labels and confidence scores
- Face Detection: Detect faces in real-time (
best.onnx) - Object Detection: COCO classes (person, car, etc.) (
yolo11npRETRAINED.onnx) - Fire/Smoke Detection: Fire and smoke detection (
Fire_Event_best.onnx) - Custom Models: Support for any ONNX YOLO-based model
- Per-model configuration (confidence threshold, enabled classes)
- Automatic class detection from model metadata
- Embedding-based recognition using
w600k_mbf.onnx - Face database with multiple embeddings per person
- Registration tool (
register_face.py) captures 90 images per person (30 from each angle) - Real-time recognition during live detection
- Known/Unknown face classification
- Cosine similarity matching with configurable threshold
- Load and analyze video files
- Frame-by-frame detection processing
- Detection results panel with thumbnails
- Jump to detection timestamps
- Export detection frames
- Chatbot interface for video analysis queries
- Gemini API integration for intelligent query parsing (with fallback parser)
- Example queries:
- "find all humans from 10 min to 15 min and save them"
- "find tigers in the video"
- "find all unknown faces from 5:00 to 10:00"
- "find whatever you see and save to database"
- Query results displayed in results panel
- Save detection images to filesystem
- SQLite-based storage for all detection results
- Stores: timestamp, class, confidence, bounding box, frame image, model name
- Query interface with filters (class, time range, recognized name)
- Statistics by class and model
- Efficient indexing for fast queries
- Configurable triggers based on detection counts
- Unknown face alerts: Trigger when unknown faces detected
- Known face alerts: Trigger when specific known faces detected
- Class-based alerts: Custom thresholds for any detection class
- Visual SOS indicator in top bar
- User confirmation before triggering
- Enable/disable models independently
- Adjust confidence thresholds per model
- Filter by detection classes (enable only specific classes)
- Real-time configuration changes
- Model auto-discovery from
models/directory
- Python: 3.9 or higher (3.12 recommended)
- OS: Windows 10/11, Linux (Ubuntu/Debian/Raspberry Pi OS), macOS 10.15+
- RAM: 2GB minimum (4GB+ recommended)
- Storage: 1GB free space
- Camera: USB webcam, V4L2 device, or video files
All dependencies are listed in requirements.txt:
- PySide6 (Qt6 GUI framework)
- OpenCV (computer vision)
- ONNX Runtime (AI model inference)
- NumPy (numerical computing)
- SQLite3 (built-in, database)
- Google Generative AI (optional, for Gemini query parsing)
git clone <repository-url>
cd VMSPythonpython3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtPlace your ONNX model files in the models/ directory:
best.onnx- Face detection modelw600k_mbf.onnx- Face recognition/embedding modelyolo11npRETRAINED.onnx- General object detection (COCO classes)Fire_Event_best.onnx- Fire/smoke detection
Note: Models are auto-discovered on startup. The application will detect model types from filenames and metadata.
# Activate virtual environment (if using)
source venv/bin/activate
# Run the application
python vms_gui.py-
Select Video Source:
- Camera index (0, 1, 2, etc.)
- Linux device path (
/dev/video0,/dev/video1) - Video file path
-
Configure Models:
- Enable/disable models in the left panel
- Adjust confidence thresholds
- Select detection classes to filter
-
Start Detection:
- Click "Start" button
- Detection results appear in real-time
- Bounding boxes show detected objects
-
Face Recognition (if enabled):
- Load face recognition model in model config
- Recognized faces show name labels
- Unknown faces marked as "Unknown"
-
Switch to Playback Tab:
- Click "PlayBack" tab in top bar
- Load video file using file dialog
-
Analyze Video:
- Video plays with detection overlay
- Detection results saved to database automatically
- Results panel shows all detections
-
Query Detections:
- Use chatbot to query detections
- Examples:
- "find all persons from 0:00 to 5:00"
- "find unknown faces and save them"
- "find cars in the video"
- Results appear in results panel
- Click results to jump to timestamp
Register faces for recognition:
python register_face.pyProcess:
- Enter person's name
- Capture 30 images from front angle
- Capture 30 images from left angle
- Capture 30 images from right angle
- Total: 90 images per person for robust recognition
Requirements:
- Face detection model (
best.onnx) - Face recognition model (
w600k_mbf.onnx) - Camera access
Models are configured in the GUI:
- Enable/Disable: Toggle model on/off
- Confidence Threshold: Minimum confidence for detections (0.0-1.0)
- Class Filtering: Enable/disable specific detection classes
- Face Recognition: Load recognition model for face identification
- Source Selection: Choose camera index or device path
- Resolution: Select preset or custom resolution
- Live Resolution Change: Change resolution while camera is running
Configure in Model Config panel:
- Unknown Face SOS: Enable and set count threshold
- Known Face SOS: Enable and set count threshold
- Class-based SOS: Set thresholds for any detection class
Databases are stored in storage/db/:
face_embeddings.db- Face recognition databasedetection_results.db- Detection results databaseapp.db- Application databaseevents.sqlite- Events database
-
Face Detection (
best.onnx)- Classes:
["face"] - Detects faces in video frames
- Used with face recognition
- Classes:
-
Object Detection (
yolo11npRETRAINED.onnx)- Classes: COCO classes (80 classes)
- Person, car, bicycle, etc.
- General purpose detection
-
Fire/Smoke Detection (
Fire_Event_best.onnx)- Classes:
["fire", "smoke"] - Fire and smoke detection
- Classes:
-
Face Recognition (
w600k_mbf.onnx)- Not a detection model
- Extracts face embeddings for recognition
- Used with face detection model
- Place
.onnxfile inmodels/directory - Model type detected from filename or metadata
- Classes auto-detected from model output
- Enable and configure in GUI
CREATE TABLE detections (
id INTEGER PRIMARY KEY,
video_path TEXT,
timestamp REAL,
time_string TEXT,
class_name TEXT,
recognized_name TEXT,
confidence REAL,
bbox_x1, bbox_y1, bbox_x2, bbox_y2 INTEGER,
model_name TEXT,
frame_image BLOB,
num_objects INTEGER,
created_at TIMESTAMP
)Stores face embeddings for recognition:
- Face ID, name, embedding vector
- Multiple embeddings per person supported
The chatbot uses Gemini API (with fallback parser) to understand queries:
Supported Actions:
find- Search for detectionssave- Save detections to filesystem
Query Examples:
"find all humans from 10 min to 15 min and save them"
"find tigers in the video"
"find all unknown faces from 5:00 to 10:00"
"find whatever you see and save to database"
"find persons and save them"
Query Parameters:
class_name: Object class (person, car, face, etc.)recognized_name: For faces ("Unknown", "known", or specific name)time_start: Start time in secondstime_end: End time in secondssave_to_db: Whether to save images to filesystem
Linux/Raspberry Pi:
# Check available cameras
ls /dev/video*
# Test camera permissions
v4l2-ctl --list-devicesWindows:
- Check Device Manager for camera
- Ensure no other application is using camera
- Try different camera indices (0, 1, 2)
- Verify
.onnxfiles are inmodels/directory - Check file permissions
- Ensure models are valid ONNX format
- Check console for error messages
- Ensure face detection model is enabled
- Load face recognition model in model config
- Register faces using
register_face.py - Check face database has registered faces
- Reduce camera resolution
- Lower target FPS
- Disable unused models
- Use smaller model files
- Close other applications
- Check
storage/db/directory exists - Verify write permissions
- Check disk space
- Database files are SQLite, can be opened with SQLite tools
- Local Operation: All processing happens locally
- No Internet Required: Core functionality works offline
- Data Storage: All data stored locally in SQLite databases
- Face Privacy: Face embeddings stored locally, not shared
- Optional Gemini API: Only used for query parsing (can use fallback parser)
vms_gui/- Main application packagevms_gui/detection/- Detection engine and modelsvms_gui/gui/- GUI componentsmodels/- ONNX model filesstorage/- Data storage
DetectionEngine: Main detection orchestratorONNXRunner: ONNX model inferenceVideoCapture: Camera/video file handlingFaceRecognizer: Face recognition with embeddingsDetectionDatabase: Detection results storageVMSClientApp: Main application window
- Add New Model Type: Update
detect_model_classes()inengine.py - Add New Detection Class: Update
COCO_CLASSESinconfig.py - Custom Query Parser: Extend
QueryParseringemini_parser.py - New GUI Component: Add to
vms_gui/gui/
[Specify your license here]
[Contributing guidelines]
For issues and questions:
- Check this README
- Review console output for errors
- Check database files for data integrity
- Verify model files are valid
✅ Real-time video detection
✅ Multi-model support (face, object, fire detection)
✅ Face recognition with database
✅ Video playback analysis
✅ Natural language query system
✅ Detection database with SQLite
✅ SOS alert system
✅ Configurable model settings
✅ Cross-platform support (Windows, Linux, macOS)
✅ Raspberry Pi compatible
Built with: Python, PySide6, OpenCV, ONNX Runtime, SQLite