Recognize following gestures captured on webcam attached to Smart-TV and process to relevant operation based getures.
Thumbs up: Increase the volume
Thumbs down: Decrease the volume
Left swipe: 'Jump' backwards 10 seconds
Right swipe: 'Jump' forward 10 seconds
Stop: Pause the movie
In this project, We are using folowing 2 architectures to build the model and that will be able to predict the 5 gestures correctly.
- Convolution - RNN architecture
- CNN with 3D convolution