Reasonlytics is an intelligent data analysis agent that bridges the gap between raw data and meaningful insights. Built on LangGraph and powered by local LLMs via Ollama, Reasonlytics enables users to interact with their datasets using natural language queries and receive comprehensive analysis with human-readable explanations.
Reasonlytics combines the power of multiple AI agents working in harmony to deliver a seamless data analysis experience:
-
🧠 Intelligent Query Understanding: Automatically classifies whether you want visualizations or data analysis
-
🐍 Dynamic Code Generation: Creates optimized pandas code tailored to your specific dataset and query
-
⚡ Safe Code Execution: Runs analysis in a secure, isolated environment
-
📊 Smart Visualization: Generates charts and plots when requested
-
💡 Contextual Reasoning: Provides clear, business-friendly explanations of results
-
🔍 Instant Dataset Insights: Automatically analyzes new datasets and suggests exploration questions
The system follows a modular, agent-based architecture powered by LangGraph, open-source LLMs, and Pandas for structured, explainable data analysis workflows.
- Orchestrates the multi-step reasoning flow using
MessagesState - Executes 8 interconnected nodes — covering data ingestion, insight generation, query classification, code synthesis, execution, and result explanation
- Employs custom @tool decorators for modular tool execution (e.g., DataFrameSummaryTool, DataInsightAgent, CodeExecutionTool)
- Deterministic routing ensures reproducible outputs (
DATA_INPUT → INSIGHT → QUERY → CODE_GEN → EXECUTION → EXPLANATION)
- Model: Qwen2.5-Coder-7B-Instruct-Q4_K_M (configurable open-source LLM)
- Inference Engine: Ollama / vLLM for local and GPU-accelerated deployments
- Prompt Templates: Modular prompt blocks for summary, query classification, and code generation
- Output Parsing: Structured text extraction with safety filtering for Python and visualization code
- Data Engine: Pandas DataFrame (loaded from CSV, Excel, or SQL sources)
- Schema Summary: Auto-generated dataset overview including types, missing values, and size metadata
- Validation: Pre-execution code checks ensure only safe read-only operations (no file I/O, no external writes)
- Renderer: Matplotlib / Seaborn for chart generation
- Result Display: Inline rendering of visual and textual insights
- Reasoning Layer: Generates natural-language explanations summarizing trends, outliers, and actionable insights
- Integrated logging for every node step with timestamped traces
- Easily extensible to support other LLMs (Mistral, Gemma2, Llama3)
- Can integrate with external data APIs or cloud storage connectors
-
🗣️ Natural Language Interface: Ask data-driven questions like “Show me sales trends by region” or “What’s the correlation between price and sales?” — no coding required. The agent understands intent and translates your query into executable Python or SQL automatically.
-
🤖 Automated Insights & Reasoning: On every dataset upload or query, the agent instantly summarizes key patterns and relationships. It also provides clear explanations of results (e.g., “North region leads with 35% of total sales”) using an integrated Reasoning LLM.
-
📊 Multi-Modal Data Analysis: Seamlessly handles both data analytics and visualization requests. The system dynamically decides whether to return a table, chart, or statistical summary based on the query context.
-
💡 Code Transparency & Safe Execution: Every output comes with the generated pandas/matplotlib code for verification and learning. Code execution is sandboxed to ensure complete safety and prevent unauthorized operations.
-
🔒 Local, Private, and Configurable: Runs entirely on your own infrastructure using Ollama and LangGraph, ensuring full data privacy. Supports multiple open-source LLMs like Qwen2.5, CodeGemma, Llama 3, and Mistral, with easy configuration for different workflows.
📦 llm-data-analyst-agent-langgraph-ollama
│
├── configuration.py # Environment setup and configurations
├── FastAPI.py # API layer for backend integration
├── compile_agent.py # Core LangGraph workflow
├── streamlit_app.py # Streamlit frontend for user interaction
├── agent_tools.py # Core agent and tools logic
└── README.md
└── License
- 📈 Sales Performance Analysis: “Show me total revenue by product category for the last quarter.”
The agent automatically aggregates the data, generates a bar chart, and explains key insights — such as which regions or products drive the highest revenue.
- 🏪 Retail Demand Forecasting: “Visualize weekly sales trends for top 5 products.”
The agent produces time-series plots, highlights seasonal patterns, and provides a reasoning summary to support inventory or marketing decisions.
- 👩💼 HR Analytics Dashboard: “What’s the average salary by department?” or “Plot employee attrition by age group.”
The agent creates pandas aggregations and visual insights to help HR teams identify trends and optimize workforce planning.
- 💰 Financial Data Insights: “Compare average returns across investment portfolios” or “Show me expense distribution by category.”
It generates precise visual summaries and explains financial performance differences in natural language.
- 🧠 Exploratory Data Analysis (EDA) Assistant: “Give me a quick summary and possible questions to explore.”
The agent detects schema, missing values
Streamlit Interface
- Python 3.10+
- CUDA-compatible GPU (optional, for faster processing)
- 8GB+ RAM recommended
git clone https://github.com/Ginga1402/llm-data-analyst-agent-langgraph-ollama.git
cd llm-data-analyst-agent-langgraph-ollamapip install -r requirements.txt# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the required model
ollama pull codegemma:7b-instruct-v1.1-q4_K_SUpdate the paths in configuration.py to match your system:
model_name = "qwen2.5-coder:7b-instruct-q4_K_M"- Start the FastAPI Server:
python FastAPI.pyThe API will be available at http://localhost:8000
- Launch the Streamlit Interface:
streamlit run streamlit_app.pyThe web interface will open at http://localhost:8501
| Technology | Description | Link |
|---|---|---|
| LangChain | Framework for building LLM-driven applications and chains | LangChain |
| LangGraph | State-based agent orchestration for complex LLM workflows | LangGraph |
| Ollama | Local LLM inference engine for privacy-focused AI | Ollama |
| Mistral 7B (Q4_K_M) | Quantized instruction-tuned model for query classification | Mistral AI |
| Qwen2.5-Coder 7B (Q4_K_M) | Specialized code generation model for pandas operations | Qwen Models |
| Qwen2.5 7B (Q4_K_M) | General-purpose reasoning model for data insights | Qwen Models |
| Pandas | Data manipulation and analysis library for Python | Pandas |
| Matplotlib | Comprehensive plotting library for data visualization | Matplotlib |
| Streamlit | Web framework for building interactive data applications | Streamlit |
| PyTorch | Deep learning framework with CUDA support | PyTorch |
| NumPy | Fundamental package for scientific computing | NumPy |
| FastAPI | High-performance API framework for Python | FastAPI |
| Pydantic | Data validation using Python type annotations | pydantic.dev |
Contributions to this project are welcome! If you have ideas for improvements, bug fixes, or new features, feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find Reasonlytics useful, please consider giving it a star ⭐ on GitHub!