A sophisticated Flask application that combines Foundry Local AI models with Playwright browser automation to create intelligent, adaptive web browsing experiences. The system uses AI for task planning, step-by-step execution, and smart CAPTCHA avoidance.
- Dynamic Task Planning: AI analyzes natural language tasks and creates detailed step-by-step automation plans
- Intelligent Action Generation: Real-time decision making for browser actions based on page context
- Adaptive Execution: Smart responses to different website layouts and content
- Context-Aware Navigation: AI understands page content to make optimal choices
- Multi-Pattern Detection: Detects various CAPTCHA types (reCAPTCHA, hCAPTCHA, Cloudflare, etc.)
- Smart Evasion: Automatically navigates to alternative sites when CAPTCHAs are encountered
- Keyword Analysis: Text-based CAPTCHA detection for comprehensive coverage
- Seamless Fallbacks: Continues automation on privacy-focused alternatives
- Step-by-Step Documentation: Before/after screenshots for every automation step
- Visual Progress Tracking: Complete visual record of the automation process
- Interactive Web Viewer: Click to expand screenshots in the browser interface
- Organized Storage: Timestamped screenshots with unique session IDs
- Real-Time Planning: AI generates 3-5 step plans tailored to each specific task
- Error Recovery: Graceful handling of failures with continued execution
- Progressive Actions: NAVIGATE, FILL, CLICK, PRESS, SCROLL, WAIT, EXTRACT, SCREENSHOT
- Success Verification: AI validates task completion at each step
- Improved Prompts: More specific, constrained AI prompts for better decision making
- Lower Temperature: Reduced randomness (0.1) for consistent, reliable actions
- Fallback Selectors: Multiple CSS selector strategies for finding elements
- Smart URL Generation: Task-specific starting URLs (DuckDuckGo for search, Wikipedia for info, etc.)
- Multiple Selector Attempts: Tries common selectors if AI-generated ones fail
- Graceful Degradation: Continues automation even if individual steps fail
- Extended Timeouts: Longer waits for page loads and network activity (30s)
- Fallback Actions: Uses Enter key if click actions fail
- Curated Site List: Only uses proven reliable sites (DuckDuckGo, Wikipedia, Reddit, BBC)
- Smart Starting Points: Automatically chooses best starting URL based on task type
- CAPTCHA Avoidance: Prefers sites with minimal bot detection
- Robust Browser Config: Optimized launch arguments for better compatibility
- Error Documentation: Screenshots captured even when steps fail
- Better Organization: Clearer naming with before/after/error states
- Session Tracking: Unique IDs prevent screenshot conflicts
- Visual Debugging: Complete automation trail for troubleshooting
app.py
- Main Flask application with AI integrationconfig.py
- Configuration for AI models and Flask settings- Foundry Local Integration - Local AI model for planning and actions
- Playwright Automation - Headless browser control for actual web interactions
- Task Analysis - AI breaks down natural language requests into actionable steps
- Dynamic Planning - Generate 4-8 specific steps with context awareness
- Action Generation - AI determines exact browser actions (selectors, inputs, clicks)
- Execution Monitoring - Real-time adaptation based on page responses
- CAPTCHA Handling - Automatic detection and alternative routing
pip install -r requirements.txt
playwright install
- Ensure Foundry Local is installed and running
- Configure model access
- Verify connection to local AI endpoint
python app.py
http://127.0.0.1:5000
βββ app.py # Main Flask application with AI automation
βββ config.py # Configuration settings
βββ requirements.txt # Python dependencies
βββ templates/ # HTML templates
β βββ index.html # Home page with interface selection
β βββ chat.html # Direct AI chat interface
β βββ browser.html # Browser automation interface
βββ static/
βββ screenshots/ # Generated automation screenshots
GET /
- Home page with interface selectionGET /chat
- Direct AI chat interfaceGET /browser
- Browser automation interface
POST /generate
- Generate AI responses for chatPOST /automate
- Execute browser automation with AI planning
- AI Task Planning - Converts natural language to browser actions
- Dynamic Step Execution - Real-time action generation based on page state
- CAPTCHA Detection - Multi-layer detection with smart alternatives
- Screenshot Documentation - Complete visual automation records
The AI can handle complex automation tasks like:
"Search for Python programming tutorials on Google"
"Find the latest news about artificial intelligence"
"Look up information about climate change on Wikipedia"
"Browse Reddit for technology discussions"
"Check the weather forecast for New York"
"Find open source projects on GitHub"
- Default Model: Phi-4 via Foundry Local
- Planning Mode: Low temperature (0.3) for consistent planning
- Action Mode: Very low temperature (0.2) for precise actions
- Context Awareness: Analyzes page content for smart decisions
- Task Understanding: Converts natural language to automation steps
- Selector Intelligence: AI generates appropriate CSS selectors
- Context Adaptation: Actions adapt to different website layouts
- Error Recovery: Smart fallbacks when actions fail
- Selector-Based: Multiple CSS selector patterns
- Content Analysis: Keyword detection in page text
- Visual Verification: Screenshot analysis capabilities
- Behavioral Patterns: Recognition of common CAPTCHA flows
- Alternative Sites: DuckDuckGo, Bing, Yahoo, StartPage
- Privacy-Focused: Preference for CAPTCHA-resistant platforms
- Smart Routing: Automatic redirection when CAPTCHAs detected
- Session Management: Clean browser profiles to avoid triggers
- Before/After Shots: Every action documented visually
- Session Tracking: Unique IDs for organized screenshot sets
- Step Descriptions: AI-generated descriptions for each screenshot
- Web Interface: Interactive viewing with click-to-expand
- Timestamped Files: Easy chronological organization
- Session Grouping: Screenshots grouped by automation session
- Metadata Integration: Descriptions and context stored with images
- Cleanup Tools: Automatic management of old screenshots
# AI Model Configuration
AI_CONFIG = {
'default_model': 'phi-4', # Foundry Local model
'max_tokens': 4096, # Response length limit
'temperature': 0.7, # Response creativity
'timeout': 30 # Request timeout
}
# Flask Configuration
FLASK_CONFIG = {
'debug': True, # Development mode
'host': '127.0.0.1', # Server host
'port': 5000 # Server port
}
- Flask - Web framework for the interface
- Playwright - Browser automation engine
- OpenAI - API client compatible with Foundry Local
- Foundry Local SDK - Local AI model management
- Python
- Foundry Local installed and configured
- Playwright browsers installed (
playwright install
)
- Browser Visibility: Automation runs in non-headless mode for demonstration
- CAPTCHA Awareness: System automatically detects and avoids CAPTCHAs
- AI Planning: Each task gets a custom AI-generated execution plan
- Screenshot Storage: All automation steps are visually documented
MIT License - Open source and free to modify for your automation needs.