Jarvis is a fully capable, extensible desktop voice assistant built in Python. This project started as a hobby in Class 8, gradually evolved over the years, and was even showcased at a school exhibition. Today, it has grown into a feature-rich assistant capable of voice commands, web automation, translation, media control, smart-home triggers, and more.
Jarvis is a voice-driven personal assistant built with modular Python scripts. It listens to commands through speech recognition and performs actions such as:
- Searching Wikipedia
- Controlling system apps
- Managing Spotify music
- Opening games & software
- Translating speech
- Reading news
- Opening the camera
- Setting alarms
- Automating smart bulb actions
- Keyboard & volume automation …and much more. The architecture is split into multiple small modules, making the code more maintainable and scalable.
- Hotword-less continuous listening
- Speech recognition (Google)
- High-quality offline TTS using pyttsx3
- Wikipedia search
- Google queries & website opening
- Live weather extraction from Google
- News headlines via NewsAPI
- Open Chrome, CMD, VS Code, Steam, Photoshop, Audacity, Premiere Pro, Minecraft, Fall Guys
- PC volume control using pynput keyboard
- YouTube controls (pause/play/mute) using pyautogui
- Spotify search
- Auto-click based playback/like/add-to-playlist
- Live webcam feed using OpenCV
- Dedicated alarm.py script
- Config saved through Alarmtext.txt
- Auto music playback when time matches
- Trigger smart bulb automation via keyboard search + UI navigation
- Color change, brightness interactions
- Real-time speech translation using googletrans + gTTS
- Auto-generated voice output
- Joke generator using pyjokes
Here is a clean explanation of all modules in the codebase:
- The core engine. Handles:
- Voice input/output
- Command parsing
- All high-level task routing
- Wikipedia search
- App automation
- Smart device interactions
- Temperature extraction
- Camera access
- Email sending
- Spotify automation
- Translating via translator module
- News reading (via NewsRead module)
- Calling alarm system
- This is the brain of your assistant.
- Handles alarm scheduling and ringing:
- Reads time from Alarmtext.txt
- Compares current PC time with target
- Plays music (music.mp3)
- Auto-reset alarm text file
- A utility tool:
- Prints current mouse coordinates
- Helps you record pyautogui click positions for automation
- Provides a clean greeting function:
- Detects morning/afternoon/evening
- Speaks greeting messages
- Reusable inside other modules
- Custom keyboard controller using pynput:
- Volume up/down functions
- Smooth press-release loops
- News reading engine:
- Fetches category-wise headlines
- Uses NewsAPI
- Reads each headline aloud
- Offers "continue/stop" interactive CLI
- Full-feature speech → translation → speech system:
- Listens via microphone
- Uses googletrans to translate any input
- Uses gTTS to create temporary output audio
- Plays the file, then deletes it
- Alarmtext.txt → stores temporary alarm time
- music.mp3 → plays when alarm triggers
- settings.json (VS Code) → personal environment setup
- pycache/ → Python cache files (ignored in README)
Jarvis/
│
├── jarvis.py
├── alarm.py
├── click coordinate.py
├── greetme.py
├── keyboard.py
├── NewsRead.py
├── Translator.py
│
├── Alarmtext.txt
├── music.mp3
├── settings.json (if present)
│
├── __pycache__/
└── README.md (generated)Install the required libraries:
pip install pyttsx3 speechrecognition wikipedia pyautogui opencv-python
pip install requests beautifulsoup4 pyjokes selenium googletrans==3.1.0a0
pip install gTTS playsound pynput keyboard🔧 Additional Setup
- Chrome and Apps path: Update Chrome and Apps path inside jarvis.py if needed
- NewsAPI key: Replace with your own API key inside NewsRead.py
- Alarm music: Ensure music.mp3 exists in project root
- Microphone required
- Jarvis boots & greets the user
- Begins listening continuously
- Speech is converted to text
- Query is matched with command blocks
- Corresponding module is triggered
- Module performs action (web automation, opening apps, etc.)
- Jarvis confirms completion & waits for next instruction The system is based on simple conditional command parsing, making it easy to add new commands.
Speak commands such as:
- “Wikipedia Elon Musk”
- “Search Instagram”
- “Open Chrome”
- “Open Command Prompt”
- “Open VS Code”
- “Open Steam”
- “Open Photoshop”
- “Play music”
- “Like this song”
- “Add this song to the playlist”
- “Open camera”
- “Set an alarm”
- “Tell me a joke”
- Add an NLP model (transformer-based) to understand natural language beyond keywords
- Build a context-aware conversation engine
- Train a lightweight intent classifier for smarter routing
- Add embeddings to remember user preferences
- Integrate offline LLM (e.g., GPT4All, LLaMA, Whisper)
- Add wake-word engine (“Hey Jarvis”)
- Add GUI dashboard
- Create plugin system for custom commands
- Add robust error-handling & logs
- Cloud sync for preferences and history
- Python 3.x
- pyttsx3 (Offline TTS)
- speech_recognition (Voice input)
- wikipedia
- opencv-python (Camera module)
- pyautogui (Automation)
- requests + BeautifulSoup (Web scraping)
- pynput (Keyboard/volume control)
- googletrans + gTTS (Translation)
- selenium (Browser automation)
- NewsAPI (News)
- Modular Python design
- Rule-based command engine
- Web automation
- Multimedia automation
- Basic natural language processing (keyword parsing)
- Multi-threaded interaction (via separate alarm module)
- System-level automation
- Developer: Darsh Yadav
- Project started in Class 8, gradually improved over the years
- Presented at a school exhibition
- Powered entirely by Python and open-source libraries