DeepStream applications using NVIDIA GPU acceleration with Rust, Python, and CUDA.
Zero-copy CUDA rectangle drawing on live RTSP video. See detailed section below.
GPU-accelerated video scaling with RTSP input/output support.
Quick Start:
# Test with default settings (test pattern, 640x480)
./test_scale.sh
# Scale to 1280x720
./test_scale.sh test 1280 720
# Scale from RTSP input to RTSP output (1920x1080)
./test_scale_rtsp_output.sh
# Scale to 4K from RTSP source
./test_scale_rtsp_output.sh rtsp://172.20.96.1:8554/live 3840 2160
# Scale to 720p with custom output URL
./test_scale_rtsp_output.sh rtsp://172.20.96.1:8554/live 1280 720 rtsp://localhost:8557/scaledTest Scripts:
./test_scale.sh [input] [width] [height]- General testing (test pattern, camera, file, RTSP)./test_scale_rtsp_output.sh [input_url] [width] [height] [output_url]- RTSP input to RTSP output
RTSP Output Usage:
# Default: scales rtsp://172.20.96.1:8554/live to 1920x1080
./test_scale_rtsp_output.sh
# View the scaled stream
ffplay rtsp://localhost:8557/ds-scale
vlc rtsp://localhost:8557/ds-scaleEnvironment Variables:
RTSP_URL- Input RTSP stream URL (default: rtsp://172.20.96.1:8554/live)OUTPUT_WIDTH- Output width (default: 1920 for RTSP, 640 for display)OUTPUT_HEIGHT- Output height (default: 1080 for RTSP, 480 for display)RTSP_OUTPUT- Enable RTSP output (set to "enabled")RTSP_OUTPUT_PORT- RTSP server port (default: 8557)SHOW_DISPLAY- Show X11 window (default: false when RTSP enabled)
Features:
- GPU-accelerated scaling (NVIDIA nvvideoconvert)
- High-quality interpolation (method=5)
- Zero-copy GPU processing (NVMM memory)
- Scale up or down to any resolution
- RTSP server output for remote viewing
- H.264 encoding at 4Mbps bitrate
Port Assignments:
- Detect (Rust): 8555
- Detect (Python): 8556
- Scale: 8557
- Draw Rectangle: 8555 (processor), 8556 (server output)
Examples:
Scale down for streaming:
./test_scale_rtsp_output.sh rtsp://camera:554/high 640 480Scale up for display:
./test_scale_rtsp_output.sh rtsp://camera:554/low 1920 1080Scale to standard resolutions:
# 480p
./test_scale_rtsp_output.sh rtsp://172.20.96.1:8554/live 854 480
# 720p
./test_scale_rtsp_output.sh rtsp://172.20.96.1:8554/live 1280 720
# 1080p
./test_scale_rtsp_output.sh rtsp://172.20.96.1:8554/live 1920 1080
# 4K
./test_scale_rtsp_output.sh rtsp://172.20.96.1:8554/live 3840 2160Notes:
- Video is STRETCHED to exact dimensions (maintains no aspect ratio)
- For aspect ratio preservation, calculate matching dimensions manually
- RTSP port automatically extracted from output URL
- All processing happens on GPU for maximum efficiency
Zero-copy CUDA rectangle drawing on live RTSP video streams. Draws 5-pixel thick white rectangle outlines directly on GPU memory.
Quick Start:
cd draw_rect
# Start both processor and server
./run_split.sh
# View the output stream
ffplay rtsp://localhost:8556/liveArchitecture:
- Split architecture for instant client connections
- Processor (port 8555): Connects to source, draws rectangles with CUDA
- Server (port 8556): Re-streams for multiple clients with zero delay
Features:
- ✅ Zero-copy GPU drawing (no CPU transfers)
- ✅ 5-pixel thick rectangles for high visibility
- ✅ Hardware H.264 encoding (NVIDIA NVENC)
- ✅ Low latency (~50-100ms end-to-end)
- ✅ Multiple rectangles per frame
- ✅ Production-ready Docker deployment
Custom Configuration:
# Custom source and resolution
INPUT_RTSP=rtsp://camera:554/live WIDTH=1920 HEIGHT=1080 ./run_split.sh
# Custom ports
PROCESSOR_PORT=8557 SERVER_PORT=8558 ./run_split.shPerformance (960x540 @ 60fps):
- GPU Compute: ~5%
- NVENC: ~20-30%
- CPU: ~5-10% per container
- Latency: ~50-100ms
Documentation:
- draw_rect/README.md - Complete documentation
- draw_rect/GSTREAMER_COMPONENTS_EXPLAINED.md - Pipeline deep dive
- draw_rect/ENCODER_ANALYSIS.md - Encoder optimization guide
Port Assignments:
- 8554: Source RTSP input
- 8555: Processor output (with CUDA rectangles)
- 8556: Final output for clients
GPU-accelerated object detection with YOLO11 models using Rust.
Quick Start:
# Detect people
./test_detect.sh person
# Detect cups
./test_detect.sh cup
# Detect other objects (80 COCO classes available)
./test_detect.sh "cell phone"Using Different Models:
# Fast nano model (11MB, 2-3x faster)
MODEL_CONFIG=/models/config_infer_yolo11n.txt ./test_detect.sh person
# Standard model (37MB, better accuracy) - default
./test_detect.sh personEnvironment Variables:
MODEL_CONFIG- Model configuration file (default: yolo11s)OUTPUT_WIDTH- Display width (default: 1280)OUTPUT_HEIGHT- Display height (default: 720)RTSP_OUTPUT- Enable RTSP output (set to "enabled")RTSP_OUTPUT_PORT- RTSP output port (default: 8554)SHOW_DISPLAY- Show X11 window (default: true)
The detect application can stream the processed video with bounding boxes to an RTSP server. This allows you to view the detection stream remotely or integrate it with other applications.
Basic Usage:
./test_detect_rtsp_output.sh personDetect Different Objects:
./test_detect_rtsp_output.sh car # Detect cars
./test_detect_rtsp_output.sh dog # Detect dogs
./test_detect_rtsp_output.sh bicycle # Detect bicycles
./test_detect_rtsp_output.sh "cell phone" # Multi-word objects need quotesView the Stream:
In another terminal, connect to the RTSP stream:
# Using ffplay (recommended for low latency)
ffplay rtsp://localhost:8555/ds-detect
# Using VLC
vlc rtsp://localhost:8555/ds-detect
# Using GStreamer directly
gst-launch-1.0 rtspsrc location=rtsp://localhost:8555/ds-detect ! decodebin ! autovideosinkAdvanced Configuration:
Use environment variables to customize the RTSP server:
# Use faster nano model for better performance
MODEL_CONFIG=/models/config_infer_yolo11n.txt ./test_detect_rtsp_output.sh person
# Change RTSP port (if 8555 is already in use)
RTSP_OUTPUT_PORT=8556 ./test_detect_rtsp_output.sh car
# Change input source
RTSP_URL=rtsp://192.168.1.100:8554/camera1 ./test_detect_rtsp_output.sh person
# Change output resolution
OUTPUT_WIDTH=1920 OUTPUT_HEIGHT=1080 ./test_detect_rtsp_output.sh car
# Combine multiple options
MODEL_CONFIG=/models/config_infer_yolo11n.txt \
RTSP_OUTPUT_PORT=8556 \
OUTPUT_WIDTH=1920 \
OUTPUT_HEIGHT=1080 \
./test_detect_rtsp_output.sh bicycleEnvironment Variables:
| Variable | Description | Default |
|---|---|---|
MODEL_CONFIG |
Model configuration file path | /models/config_infer_yolo11n.txt |
RTSP_URL |
Input RTSP stream URL | rtsp://172.20.96.1:8554/live |
RTSP_OUTPUT_PORT |
RTSP server output port | 8555 |
OUTPUT_WIDTH |
Stream output width | 1280 |
OUTPUT_HEIGHT |
Stream output height | 720 |
RTSP Stream Details:
- URL:
rtsp://localhost:<PORT>/ds-detect - Default Port: 8555
- Protocol: H.264 over RTP
- Latency: Optimized for low-latency streaming
- GPU Acceleration: Uses NVENC hardware encoder for minimal CPU usage
Troubleshooting:
If you can't connect to the RTSP stream:
-
Check if server is running:
ss -tln | grep 8555 -
Check for port conflicts:
RTSP_OUTPUT_PORT=8556 ./test_detect_rtsp_output.sh person
-
Test with ffplay first (simpler than VLC):
ffplay -rtsp_transport tcp rtsp://localhost:8555/ds-detect
-
Check Docker network: The container uses
--network host, so the RTSP port is directly accessible on localhost. -
Verify engine file exists: The TensorRT engine needs to be pre-built. On first run, it will take 2-3 minutes to build the engine from the ONNX model.
Python implementation of the detect application - identical features to the Rust version.
Quick Start:
cd detect_py
# Detect people
./test_detect_rtsp_output_py.sh person
# Detect cars
./test_detect_rtsp_output_py.sh car
# Detect with custom port
./test_detect_rtsp_output_py.sh dog 8556View the Stream:
ffplay rtsp://localhost:8556/ds-detectNote: Python version uses port 8556 by default (Rust version uses 8555).
Why Python?
- Faster development and prototyping
- Easier to modify and experiment with
- Same performance (uses native GStreamer plugins)
- Pre-installed dependencies in DeepStream container
- Great for learning and testing
Full Documentation: See detect_py/README.md for complete Python version documentation.
Comparison: Both Rust and Python versions use identical GStreamer pipelines and achieve the same performance. Choose Python for quick experiments and Rust for production deployments requiring maximum type safety.
Available Objects (80 COCO classes): person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, toothbrush
Two YOLO11 models are available:
| Model | Size | Speed | Best For |
|---|---|---|---|
| yolo11n | 11MB | Fastest | Real-time, multiple cameras |
| yolo11s | 37MB | Fast | General purpose (default) |
Both detect the same 80 COCO object classes with GPU acceleration via TensorRT.
- Clone DeepStream-YOLO (external dependency):
./setup_deepstream_yolo.shThis script:
- Clones the DeepStream-YOLO repository
- Applies CUDA 12.8 compatibility patch
- Compiles the YOLO parser library
Note: The deepstream-yolo/ directory is excluded from git (managed separately).
- Docker with NVIDIA GPU support
- NVIDIA drivers installed
- X11 forwarding for display
To use custom YOLO11 models, you need to export them with DeepStream compatibility:
Export Script:
./export_yolo11_fixed.sh <model.pt> <model_directory>Examples:
# Export yolo11n.pt from models directory
./export_yolo11_fixed.sh yolo11n.pt /mnt/d/github/deepstream_tests/models
# Export yolo11m.pt from a custom location
./export_yolo11_fixed.sh yolo11m.pt /path/to/your/modelsWhat it does:
- Runs in PyTorch 2.4 Docker container (compatible version)
- Adds DeepStreamOutput layer to fix bounding box coordinates
- Exports to ONNX format with
--dynamicflag - Outputs
<model>.pt.onnxin the specified directory
Important:
- Standard Ultralytics YOLO export produces incorrect bounding boxes
- The DeepStreamOutput layer transposes output and extracts coordinates correctly
- Without this layer, all bounding boxes appear in the upper-left corner
Supported Models:
- yolo11n (nano), yolo11s (small), yolo11m (medium), yolo11l (large), yolo11x (xlarge)
- All variants export to the same format with 80 COCO classes
On first run, DeepStream builds a GPU-optimized TensorRT engine from the ONNX model:
Build Process:
- Takes ~1-2 minutes on first run
- Creates
.enginefiles in themodels/directory - Example:
yolo11n.pt.onnx→yolo11n.pt.onnx_b1_gpu0_fp32.engine
Caching:
- Engine files are cached and reused on subsequent runs
- Startup time: ~2 seconds (vs 2 minutes without cache)
- Rebuilds automatically if ONNX file changes
Important:
.enginefiles are GPU-specific (not portable between different GPUs)- Large files (100-200MB+)
- Excluded from git (in
.gitignore) - Safe to delete - will rebuild automatically when needed
- Detection works with RTSP streams, local video files, webcams, or test patterns
- Only one object class can be detected at a time (filtered for performance)