Ares.AI 📱⚡

Control your phone with one sentence. No taps. No scrolls. Just intent → action.

Ares.AI turns your high-level instructions like "Open WhatsApp and message John" into precise, automated mobile actions. Unlike traditional voice assistants that stop at simple tasks, Ares understands your screen and handles full workflows—like ordering a product or booking a ride—with real-time UI interaction.

🧠 Inspiration

Siri and Google Assistant can set alarms or play music—but what if you want to complete multi-step tasks like:

Booking an Uber
Ordering your cart on Amazon
Navigating settings and toggling modes

We asked: Why does this still take dozens of taps?
Ares.AI was born from this frustration—to act like a smart human assistant with AI precision.

🚀 What It Does

🔍 Understands high-level instructions like "Message Alex on Instagram"
🧩 Breaks it into step-by-step UI actions (tap, scroll, type, etc.)
👁️ Analyzes your phone screen via screenshots
🧠 Maintains context of what’s done, what’s next, and what failed
🔄 Retries, scrolls, adapts, or gives fallback instructions if stuck
✅ Executes real-time interactions with feedback loop

⚙️ Core Architecture

🧭 Goal Planning with Gemini 2.5 Pro

Input: "Book Uber to airport"
Output: Sequence of structured atomic actions like:
- Open app → Tap search bar → Type destination → Tap "Book"

Built using function calling with Gemini 2.5 Pro

🧠 Stateful Goal Execution

Tracks progress per instruction
Detects failure loops or stuck states
Retries intelligently or attempts fallback actions

👁️ Visual Grounding via Gemini Vision

Screenshots are sent to Gemini Vision with contextual prompts
Identifies the correct bounding box to tap/type/scroll
Uses screenshot hashing to detect redundant frames
Adapts if the element is missing (scroll, wait, retry)

🧱 Built With

Android Studio — UI automation + screen capture
Kotlin — Native Android agent logic
Python — Server + reasoning loop
Gemini 2.5 Pro + Vision — Planning and grounding
Figma — UI prototyping

🚧 Challenges We Faced

🌀 UI Inconsistency — Varying app layouts required adaptive vision grounding
⚡ Real-Time Performance — Balancing model calls with latency
🧭 State Recovery — Detecting dead-ends and designing recovery heuristics
🧠 Natural Loop Avoidance — Avoiding repeated steps when stuck

🤖 Outcome

Ares.AI feels like a human assistant with:

AI-level consistency
Visual awareness
Resilience in unknown app flows

From intent → screen understanding → action execution
Ares closes the loop in mobile automation.

📍 Roadmap (What's Next)

🔐 Permission-aware automation (auto-detect required permissions)
🧠 Long-term memory for task continuity across sessions
🌐 Web interface for remote task triggering
📊 Logs & analytics for debugging agent behavior
🤝 Community plugin system (custom atomic actions)

🎥 Demo

🤝 Contributing

We welcome PRs, feature ideas, and collaborations!
Please open an issue to start the conversation.

📜 License

MIT License

🧠 Shoutout

Inspired by the simplicity of real human assistants and powered by Google’s Gemini.
Ares doesn’t ask how—it just gets it done.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
backend		backend
frontend		frontend
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ares.AI 📱⚡

🧠 Inspiration

🚀 What It Does

⚙️ Core Architecture

🧭 Goal Planning with Gemini 2.5 Pro

🧠 Stateful Goal Execution

👁️ Visual Grounding via Gemini Vision

🧱 Built With

🚧 Challenges We Faced

🤖 Outcome

📍 Roadmap (What's Next)

🎥 Demo

🤝 Contributing

📜 License

🧠 Shoutout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

invcble/Ares.AI--On-Device-GPT-Operator

Folders and files

Latest commit

History

Repository files navigation

Ares.AI 📱⚡

🧠 Inspiration

🚀 What It Does

⚙️ Core Architecture

🧭 Goal Planning with Gemini 2.5 Pro

🧠 Stateful Goal Execution

👁️ Visual Grounding via Gemini Vision

🧱 Built With

🚧 Challenges We Faced

🤖 Outcome

📍 Roadmap (What's Next)

🎥 Demo

🤝 Contributing

📜 License

🧠 Shoutout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages