Professional-grade open source intelligence tool for dark web investigations
Features β’ Installation β’ Usage β’ Configuration β’ Documentation β’ Contributing
- Overview
- Features
- Installation
- Quick Start
- Usage
- Configuration
- Architecture
- Troubleshooting
- Security
- Contributing
- License
- Acknowledgments
Robin is an advanced AI-powered Open Source Intelligence (OSINT) tool designed for conducting dark web investigations. It combines the power of Large Language Models (LLMs) with automated dark web search and content analysis to provide comprehensive threat intelligence reports.
- Intelligent Query Refinement: Uses AI to optimize search queries for better dark web results
- Multi-Engine Search: Searches across 15+ dark web search engines simultaneously
- AI-Powered Filtering: Automatically filters and ranks results by relevance
- Content Extraction: Scrapes and analyzes content from dark web sites
- IOC Extraction: Automatically extracts Indicators of Compromise (IPs, domains, emails, hashes, crypto addresses, etc.)
- Comprehensive Reporting: Generates detailed investigation summaries with actionable insights
- Dual Interface: Command-line interface for automation and web UI for interactive use
- People Search (OSINT): Person-centric deep people search across dark web, Telegram, clear web, and optional people APIs (Hunter, EmailRep, HIBP), with unified person profile and narrative summary
Robin includes a People Search mode for person-centric OSINT. You provide one or more identifiers (name, email, username, phone); Robin expands them into search queries, runs dark web + Telegram + clear web search, optionally calls people APIs (Hunter, EmailRep, HIBP), and produces a person profile plus an investigation summary and IOCs.
- Inputs: At least one of name, email, username, phone (comma-separated for multiple emails/usernames).
- Sources: Existing dark web (15+ engines) and optional Telegram; clear web (DuckDuckGo, optional Google Custom Search); optional people APIs (Hunter.io, EmailRep.io, Have I Been Pwned for breach presence only).
- Output: Structured person profile (emails, usernames, phones, social links, dark/clear web mentions, IOCs, API snippets) and a people-focused narrative summary. Same export options (Markdown, JSON, PDF, IOCs).
- Legal / ethics: People search must be used only for lawful purposes (e.g. authorized investigations, research). Do not use for stalking or harassment. Only public or semi-public data is aggregated; HIBP is used only for breach presence with API key and ToS compliance.
CLI: robin people --name "John Doe" --email j@example.com --username johndoe
API: POST /investigate/people with JSON body { "name", "email", "username", "phone" }
Web UI: Select "People Search" mode and fill in the person identifier fields.
-
π€ Multi-Model LLM Support
- OpenAI GPT-4o, GPT-4.1
- Anthropic Claude 3.5 Sonnet
- Google Gemini 2.5 Flash
- Local models via Ollama (Llama 3.1, etc.)
-
π Advanced Search Capabilities
- Concurrent search across 15+ dark web search engines
- Automatic search engine health monitoring
- Priority-based engine selection
- Result deduplication and ranking
-
π·οΈ Intelligent Scraping
- Concurrent multi-threaded scraping
- Automatic Tor routing for .onion sites
- User-Agent rotation
- Content cleaning and extraction
- Retry mechanisms with exponential backoff
-
π§ AI-Powered Analysis
- Query refinement for optimal search results
- Intelligent result filtering (top 20 most relevant)
- Comprehensive investigation summary generation
- Context-aware artifact extraction
-
π Tor Integration
- Automatic Tor circuit rotation
- Multiple Tor instance support for improved performance
- Circuit health monitoring
- Exit node information tracking
- Connection verification and retry logic
-
π IOC Extraction
- Automatic extraction of 11+ IOC types:
- IPv4/IPv6 addresses
- Domain names (including .onion)
- Email addresses
- URLs
- Hash values (MD5, SHA1, SHA256)
- Cryptocurrency addresses (Bitcoin, Ethereum)
- Phone numbers
- IOC deduplication and merging
- Multiple export formats (JSON, CSV, Text)
- Automatic extraction of 11+ IOC types:
-
π Export Options
- Markdown reports
- JSON with full metadata
- CSV for structured data
- Separate IOC exports
- Customizable output formats
-
π» Web UI (Streamlit)
- Real-time progress tracking with percentages
- Interactive IOC visualization with tabs
- Search history and saved queries
- Result preview with expandable sections
- Tor status dashboard
- Statistics and metrics display
- Advanced settings panel
- Multiple export format selection
-
π₯οΈ CLI Interface
- Full-featured command-line interface
- Progress indicators with spinners
- Configurable logging levels
- Batch processing support
- Script-friendly output
-
π Resilience & Reliability
- Comprehensive error handling
- Retry mechanisms with exponential backoff
- Graceful degradation on failures
- Connection pooling for performance
- Health monitoring and automatic recovery
-
π Observability
- Structured logging system
- Configurable log levels (DEBUG, INFO, WARNING, ERROR)
- File and console logging
- Performance metrics tracking
- Operation statistics
-
π‘οΈ Security
- Input validation and sanitization
- Query length limits
- URL format validation
- Secure API key handling
- Tor circuit isolation
- Error message sanitization
-
Tor: Required for dark web access
- Linux/Windows (WSL):
sudo apt install tor - macOS:
brew install tor - Verify Tor is running:
tor --version
- Linux/Windows (WSL):
-
Python 3.10+ (for development installation)
-
Docker (for containerized deployment)
The easiest way to run Robin with all dependencies:
# Pull the latest image
docker pull apurvsg/robin:latest
# Run with Web UI
docker run --rm \
-v "$(pwd)/.env:/app/.env" \
--add-host=host.docker.internal:host-gateway \
-p 8501:8501 \
apurvsg/robin:latest ui --ui-port 8501 --ui-host 0.0.0.0
# Run CLI mode
docker run --rm \
-v "$(pwd)/.env:/app/.env" \
--add-host=host.docker.internal:host-gateway \
apurvsg/robin:latest cli -m gpt4o -q "your query here"Download the appropriate binary for your system from the latest release:
# Linux
wget https://github.com/apurvsinghgautam/robin/releases/latest/download/robin-linux.zip
unzip robin-linux.zip
chmod +x robin
./robin --help
# macOS
wget https://github.com/apurvsinghgautam/robin/releases/latest/download/robin-macos.zip
unzip robin-macos.zip
chmod +x robin
./robin --helpFor development or customization:
# Clone the repository
git clone https://github.com/apurvsinghgautam/robin.git
cd robin
# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Verify installation
python main.py --helpCreate a .env file in the project root:
cp .env.example .envEdit .env and add your API keys:
# Required: At least one LLM provider API key
OPENAI_API_KEY=your_openai_api_key_here
# OR
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# OR
GOOGLE_API_KEY=your_google_api_key_here
# Optional: For local models
OLLAMA_BASE_URL=http://127.0.0.1:11434Ensure Tor is running:
# Check if Tor is running
tor --version
# Start Tor service (if not running)
# Linux/WSL
sudo systemctl start tor
# macOS
brew services start torCLI Mode:
robin cli -m gpt4o -q "ransomware payments" -t 8 --extract-iocsWeb UI Mode:
robin ui --ui-port 8501
# Open http://localhost:8501 in your browserrobin cli -m gpt4o -q "your search query" -t 12# With IOC extraction and JSON export
robin cli -m claude-3-5-sonnet-latest \
-q "data breach credentials" \
-t 8 \
--extract-iocs \
--format json \
--output investigation_report
# With custom logging
robin cli -m gpt4o \
-q "zero-day exploits" \
--log-level DEBUG \
--log-file robin.log \
--extract-iocs \
--format both| Option | Short | Description | Default |
|---|---|---|---|
--model |
-m |
LLM model (gpt4o, gpt-4.1, claude-3-5-sonnet-latest, llama3.1, gemini-2.5-flash) | gpt4o |
--query |
-q |
Dark web search query (required) | - |
--threads |
-t |
Number of concurrent threads for scraping | 5 |
--output |
-o |
Output filename (without extension) | Auto-generated |
--format |
-f |
Output format (markdown, json, both, pdf, all) | markdown |
--extract-iocs |
- | Extract and export Indicators of Compromise | false |
--telegram |
- | Include Telegram OSINT search (public posts and joined chats) | false |
--rotate-circuit |
- | Enable Tor circuit rotation during scraping | false |
--rotate-interval |
- | Rotate Tor circuit after N requests | TOR_ROTATE_INTERVAL |
--skip-health-check |
- | Skip search engine health check for faster startup | false |
--save-db |
- | Save investigation to SQLite database | false |
--log-level |
- | Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
--log-file |
- | Optional log file path | None |
# Basic investigation
robin cli -m gpt4o -q "ransomware payments"
# High-performance investigation with IOC extraction
robin cli -m gpt-4.1 -q "sensitive credentials exposure" -t 16 --extract-iocs --format both
# Using local Ollama model
robin cli -m llama3.1 -q "zero days" -t 8
# With detailed logging
robin cli -m gemini-2.5-flash -q "threat actor profiles" --log-level DEBUG --log-file debug.log
# With Telegram OSINT (requires TELEGRAM_* env vars)
robin cli -m gpt4o -q "ransomware" --telegram --extract-iocs
# With circuit rotation and PDF output
robin cli -m gpt4o -q "ransomware" --rotate-circuit --format pdf --extract-iocs
# Save to database
robin cli -m gpt4o -q "data breach" --extract-iocs --save-db
# People Search (at least one of --name, --email, --username, --phone)
robin people --name "John Doe" --email j@example.com --username johndoe --extract-iocs --format json
robin people --email target@example.com --telegramProcess multiple queries from a file (one query per line):
robin batch -b queries.txt -m gpt4o -t 8 --extract-iocs --format allRun the REST API for programmatic access:
# Start API server (default: http://0.0.0.0:8000)
robin api --port 8000
# With API key (set ROBIN_API_KEY in .env)
robin api -p 8000Endpoints: GET /health, POST /search, POST /investigate. Docs at /docs.
# Default (localhost:8501)
robin ui
# Custom port and host
robin ui --ui-port 8080 --ui-host 0.0.0.0-
Settings Panel (Sidebar)
- LLM model selection
- Thread count configuration
- IOC extraction toggle
- Include Telegram search (when configured)
- Export format selection
-
Advanced Settings (Expandable)
- Tor circuit rotation
- Multi-instance Tor configuration
- Timeout settings
-
Search History
- View recent queries
- Quick re-run from history
- Save favorite queries
-
Tor Status Dashboard
- Connection status
- Active circuit count
- Exit node information
- Rotation statistics
-
Statistics Panel
- Total queries executed
- IOCs extracted
- Results found
- Average query time
-
Real-time Progress
- Progress bars with percentages
- Stage-by-stage status updates
- ETA calculations
-
IOC Visualization
- Tabs organized by IOC type
- Count metrics per type
- Export options (JSON, CSV, Text)
-
Result Preview
- Expandable result cards
- URL and title display
- Content preview (first 200 chars)
-
Export Options
- Multiple format downloads
- Separate IOC exports
- Custom filename support
Create a .env file in the project root with the following variables:
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
GOOGLE_API_KEY=your_google_api_key
OLLAMA_BASE_URL=http://127.0.0.1:11434 # For local Ollama# Tor Control Port (for circuit rotation)
TOR_CONTROL_PORT=9051
# Tor Control Password (if configured)
TOR_CONTROL_PASSWORD=
# Circuit Rotation Settings
TOR_ROTATE_INTERVAL=5 # Rotate after N requests
TOR_ROTATE_ON_ERROR=true # Rotate on errors
# Multi-Instance Tor (for performance)
TOR_MULTI_INSTANCE=false # Enable multiple Tor instances
TOR_INSTANCE_COUNT=3 # Number of instances
TOR_INSTANCE_START_PORT=9050 # Starting portSEARCH_TIMEOUT=20 # Search request timeout (seconds)
SCRAPE_TIMEOUT=45 # Scraping timeout (seconds)To include Telegram in investigations (public posts and joined chats), obtain API credentials from my.telegram.org and set:
TELEGRAM_API_ID=your_api_id # Integer from my.telegram.org
TELEGRAM_API_HASH=your_api_hash
TELEGRAM_SESSION_PATH=robin_telegram.session # Optional; default robin_telegram.session
TELEGRAM_ENABLED=true- First-time login: The first time you use Telegram OSINT, you must authorize the app (phone number + code). Run a query with
--telegram(CLI) or enable "Include Telegram search" (UI); if the session is not yet authorized, follow the instructions to complete login. Session data is stored inTELEGRAM_SESSION_PATHso you do not need to log in again. - CLI: Use the
--telegramflag to merge Telegram results with dark web results. - Web UI: Enable the "Include Telegram search" checkbox in Settings.
- Legal / ToS: Use only for lawful OSINT (e.g. threat intelligence, authorized investigations). Comply with Telegram's Terms of Service and applicable laws. Only public channel posts and (optionally) search within your own joined chats are used; no access to private chats.
People Search uses clear-web search (DuckDuckGo, optional Google CSE) and optional people APIs for enrichment:
# Clear-web search (People Search)
CLEAR_WEB_SEARCH_ENABLED=true
DUCKDUCKGO_ENABLED=true
GOOGLE_CSE_ID= # Optional; requires GOOGLE_API_KEY
CLEAR_WEB_MAX_RESULTS=30
CLEAR_WEB_TIMEOUT=15
# People APIs (Hunter, EmailRep, HIBP)
PEOPLE_APIS_ENABLED=false
HUNTER_API_KEY=
EMAILREP_API_KEY=
HIBP_API_KEY= # Have I Been Pwned β breach presence only- People search must be used only for lawful purposes (e.g. authorized investigations, research). Do not use for stalking or harassment.
- HIBP is used only for breach presence (has this email been in a breach?) with API key and ToS compliance; no raw breach data.
Edit .streamlit/config.toml for UI customization:
[server]
runOnSave = true
[theme]
base = "dark"
primaryColor = "#FF4B4B"
backgroundColor = "#0E1117"
secondaryBackgroundColor = "#262730"
textColor = "#FAFAFA"
font = "sans serif"βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface Layer β
β ββββββββββββββββ ββββββββββββββββ β
β β CLI Mode β β Web UI β β
β β (main.py) β β (ui.py) β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
βββββββββββΌβββββββββββββββββββββββββββββββΌββββββββββββββββ
β β
ββββββββββββββββ¬βββββββββββββββββ
β
βββββββββββββββΌββββββββββββββββ
β Core Workflow Engine β
β (main.py) β
βββββββββββββββ¬ββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββββββΌβββββββ ββββββββΌβββββββ ββββββββΌβββββββ
β LLM Layer β β Search Layerβ β Scrape Layerβ
β (llm.py) β β (search.py) β β (scrape.py) β
βββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β
βββββββββββββββΌββββββββββββββββ
β Utility Layer β
β (utils.py) β
β - Logging β
β - Validation β
β - Retry Mechanisms β
β - IOC Extraction β
βββββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββββ
β Tor Management Layer β
β - tor_controller.py β
β - tor_pool.py β
βββββββββββββββββββββββββββββββ
- User Input β Query validation
- Query Refinement β LLM optimizes search query
- Dark Web Search β Concurrent search across 15+ engines via Tor
- Result Filtering β LLM selects top 20 relevant results
- Content Scraping β Concurrent scraping with Tor routing
- IOC Extraction β Automatic extraction (if enabled)
- Summary Generation β LLM generates comprehensive report
- Export β Multiple format options
main.py: CLI entry point and workflow orchestrationui.py: Streamlit web interfacellm.py: LLM operations (refinement, filtering, summarization)llm_utils.py: LLM configuration and model managementsearch.py: Dark web search engine integrationtelegram_osint.py: Telegram OSINT (public posts and joined-chat search via Telethon)scrape.py: Content scraping with Tor support (and pre-filled content for Telegram)tor_controller.py: Tor circuit rotation and managementtor_pool.py: Multiple Tor instance managementutils.py: Utilities (logging, validation, IOC extraction, retry mechanisms)config.py: Configuration management
Problem: Tor connection verification failed
Solutions:
- Verify Tor is running:
tor --version - Check Tor service status:
# Linux/WSL sudo systemctl status tor # macOS brew services list | grep tor
- Restart Tor service:
sudo systemctl restart tor # Linux brew services restart tor # macOS
- Verify Tor SOCKS port (default: 9050):
netstat -an | grep 9050
Problem: Failed to initialize LLM
Solutions:
- Verify API key is set in
.envfile - Check API key validity
- Verify API quota/credits available
- For Ollama: Ensure Ollama is running and accessible
curl http://127.0.0.1:11434/api/tags
Problem: No results found
Solutions:
- Try refining your query (be more specific)
- Check Tor connection status
- Verify search engines are accessible
- Increase timeout values in
.env - Check logs for specific errors:
robin cli -m gpt4o -q "test" --log-level DEBUG --log-file debug.log
Problem: Failed to scrape results
Solutions:
- Reduce thread count (
-t 3instead of-t 16) - Increase scrape timeout in
.env - Enable circuit rotation for better anonymity
- Check Tor circuit health
Problem: Application crashes or becomes slow
Solutions:
- Reduce thread count
- Limit number of results processed
- Use IOC extraction selectively
- Clear cache in Web UI
Enable detailed logging for troubleshooting:
# CLI with debug logging
robin cli -m gpt4o -q "your query" --log-level DEBUG --log-file debug.log
# Check log file
tail -f debug.log-
Increase Threads: Use more threads for faster processing
robin cli -m gpt4o -q "query" -t 16 -
Enable Multi-Instance Tor: For better concurrency
TOR_MULTI_INSTANCE=true TOR_INSTANCE_COUNT=5
-
Optimize Timeouts: Adjust based on your network
SEARCH_TIMEOUT=15 SCRAPE_TIMEOUT=30
-
API Key Security
- Never commit
.envfiles to version control - Use environment variables in production
- Rotate API keys regularly
- Use separate keys for development/production
- Never commit
-
Tor Security
- Keep Tor updated to latest version
- Use circuit rotation for sensitive investigations
- Monitor exit node information
- Consider using VPN in addition to Tor
-
Data Privacy
- Be cautious with sensitive queries
- Review LLM provider privacy policies
- Encrypt stored results if containing sensitive data
- Implement data retention policies
-
Input Validation
- All queries are validated and sanitized
- URL format validation before processing
- Length limits prevent abuse
- Legitimate security research
- Authorized penetration testing
- Law enforcement investigations (with proper authorization)
- Academic research
- Threat intelligence gathering (for defensive purposes)
Do NOT use for:
- Unauthorized access to systems
- Illegal activities
- Harassment or doxxing
- Violating terms of service
Always ensure compliance with:
- Local and international laws
- Institutional policies
- Terms of service of APIs and services used
- Ethical guidelines for security research
- CHANGELOG.md: Version history and changes
- DEEP_ANALYSIS.md: Comprehensive codebase analysis
- RESEARCH_AND_IMPROVEMENTS.md: Research findings and recommendations
- IMPLEMENTATION_SUMMARY.md: Implementation details
- QUICK_IMPROVEMENTS.md: Quick reference guide
For programmatic usage, see the inline documentation in source files:
main.py: CLI command referencellm.py: LLM operation functionssearch.py: Search engine integrationscrape.py: Scraping functionsutils.py: Utility functions
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Add tests if applicable
- Update documentation
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
# Clone your fork
git clone https://github.com/your-username/robin.git
cd robin
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install development dependencies
pip install black flake8 mypy pytest
# Run tests (when available)
pytest
# Format code
black .
# Lint code
flake8 .We welcome contributions in:
- New search engines
- Additional LLM providers
- UI/UX improvements
- Performance optimizations
- Documentation
- Bug fixes
- Test coverage
- Security enhancements
- Follow PEP 8 style guide
- Use type hints
- Add docstrings to functions
- Write clear commit messages
- Update CHANGELOG.md for user-facing changes
Typical performance metrics (varies by query and network):
- Query Refinement: 2-5 seconds
- Search (15 engines): 30-60 seconds
- Filtering: 5-10 seconds
- Scraping (20 URLs): 60-120 seconds
- Summary Generation: 10-30 seconds
- Total Time: ~2-4 minutes per investigation
- Use appropriate thread count (8-12 for most systems)
- Enable multi-instance Tor for better concurrency
- Cache results when possible
- Use faster LLM models for non-critical operations
- Process results in batches for large investigations
- API server mode (RESTful API)
- Database integration for result storage (SQLite)
- Threat intelligence platform integration (STIX, MISP export)
- Advanced analytics and visualization
- Query templates and saved searches
- Batch processing mode
- PDF report generation
- Multi-language support
- Plugin system for extensibility
- Unit and integration tests
See RESEARCH_AND_IMPROVEMENTS.md for detailed roadmap.
This project is licensed under the MIT License - see the LICENSE file for details.
- Idea Inspiration: Thomas Roccia and his demo of Perplexity of the Dark Web
- Tools Inspiration: OSINT Tools for the Dark Web repository
- LLM Prompt Inspiration: OSINT-Assistant repository
- Logo Design: Tanishq Rupaal
- LangChain - LLM framework
- Streamlit - Web UI framework
- BeautifulSoup - HTML parsing
- Stem - Tor control library
- Click - CLI framework
- Tor Project - Anonymity network
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See Documentation section
When reporting bugs, please include:
- Robin version
- Operating system
- Python version (if using development install)
- Steps to reproduce
- Error messages/logs
- Configuration (sanitized, no API keys)
We welcome feature requests! Please:
- Check existing issues first
- Provide detailed use case
- Explain expected behavior
- Consider implementation complexity
If you find Robin useful, please consider giving it a star on GitHub!
Made with β€οΈ by Apurv Singh Gautam

