Ask your database anything β in plain English. Get SQL, data, and interactive charts.
Local LLMs only. No data leaves your machine. Works with DuckDB, Exasol, PostgreSQL, MySQL, SQLite, and anything SQLAlchemy supports.
- Natural language β SQL β powered by any local LLM (Ollama, MLX, LM Studio, vLLM, etc.)
- 4 tabs: Ask (chat), Build (visual query builder), Metrics (saved KPIs), Schema (ER diagram)
- Interactive charts β Plotly bar, line, area, scatter, pie with dual-axis support and live controls
- Visual Query Builder β table / dimension / measure selector with filters, sort, and limit β no SQL required
- Schema Relationship Map β auto-generated Mermaid ER diagram with detected join paths
- Metrics Catalog β define, save, and reuse KPI queries with one click
- Auto-correct retry loop β failed queries are automatically diagnosed and fixed by the LLM, up to 3 attempts; the UI shows live attempt progress and a summary of what was corrected
- Enriched Knowledge Base β 203 domain SQL patterns across eCommerce, Finance, Marketing, Product, and BI; each pattern includes inflation/deflation causes, causal relationships, graduated SQL assets, and anti-patterns
- Semantic embeddings β optional in-process embeddings (
pip install talonsight[embeddings]) improve SQL pattern retrieval and schema table matching; understands business vocabulary ("burn rate" β expense tables, "churn" β cancellation tables) - Smart schema retrieval β for databases with 15+ tables, only the relevant subset is sent to the LLM instead of the full schema; join-connected tables are always included
- Knowledge Base β ChromaDB-backed store for SQL patterns; injects similar patterns as few-shot examples into the prompt
- Join inference β detects join paths by exact and fuzzy column-name matching; explicitly warns the LLM about table pairs that cannot be joined
- Access Control β restrict queries to specific schemas and/or tables; SQL safety validator (allowlist SELECT/WITH only)
- Dialect hints β built-in prompt guidance for DuckDB (
QUALIFY,GROUP BY ALL,TRY_CAST, date functions), PostgreSQL (timestamp cast safety, interval arithmetic), and Exasol - MLX auto-start β on Apple Silicon, the MLX server starts automatically on first query; model is downloaded once if not already cached
- Pre-fill with
.envβ set default paths, model, and URL so the UI is ready on launch
pip install talonsight # core β DuckDB, PostgreSQL, SQLite, MySQL
pip install talonsight[embeddings] # + semantic embeddings (recommended)
pip install talonsight[exasol] # + Exasol support
pip install talonsight[mlx] # + Apple Silicon MLX LLM backend
pip install talonsight[all] # everythingRequirements: Python β₯ 3.9, and a local LLM server (see LLM Setup below).
| Extra | Package | What it does |
|---|---|---|
embeddings |
fastembed |
In-process semantic embeddings via ONNX. Model (~130 MB) downloaded once on first use. No server needed. |
exasol |
pyexasol, sqlalchemy-exasol |
Exasol database connectivity |
mlx |
mlx-lm, transformers>=4.47 |
Apple Silicon LLM inference (M-series only) |
postgres |
psycopg2-binary |
PostgreSQL driver |
mysql |
pymysql |
MySQL driver |
# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model β qwen3:8b is the recommended starting point
ollama pull qwen3:8b # recommended β best quality/speed on modern hardware
ollama pull qwen2.5-coder:7b # good alternative for pure SQL tasks
ollama pull qwen2.5-coder:14b # better quality, needs more RAMApple Silicon (M-series)? Use the MLX backend β see MLX Setup. The server and model are managed automatically; no terminal command needed.
Using LM Studio or vLLM? Choose "OpenAI-compatible API" in the sidebar's LLM Backend expander.
pip install talonsight[embeddings]The nomic-ai/nomic-embed-text-v1.5 model (~130 MB) is downloaded automatically on first connect. After installing, select FastEmbed (in-process) in the sidebar's Embeddings expander before connecting.
talonsightOpens at http://localhost:8501.
The sidebar keeps the most important controls above the fold:
- Database β pick connection type (DuckDB / Exasol / PostgreSQL / SQLAlchemy URL) and enter credentials or a file path
- Click π Load schemas & tables to preview available schemas (optional but recommended)
- Access Control β select the schema and optionally restrict to specific tables
- Click β‘ Connect
Advanced settings β LLM backend, Embeddings, Knowledge Base, Metrics directory, and Options β are in collapsed expanders below the buttons.
After connecting, talonsight generates 5 starter questions based on your actual schema and data profile.
Type in plain English β talonsight generates SQL, runs it read-only, and shows a plain-English summary, an interactive chart, and the raw data table.
If the generated SQL fails, talonsight automatically retries up to 3 times: the LLM diagnoses the error, rewrites the SQL, re-validates safety, and re-executes β all without any action from you. The UI shows live attempt progress ("
Use the chart controls row below each answer to switch chart type, change the x-axis, or select which measures to plot β without re-running the query.
Create .env in your working directory (gitignored):
EXACHAT_DUCKDB_PATH=/path/to/your/database.duckdb
EXACHAT_OLLAMA_URL=http://localhost:11434
EXACHAT_OLLAMA_MODEL=qwen3:8bType a question, get SQL + a plain-English summary + an interactive Plotly chart + the raw data table.
- Follow-up questions work naturally β "now filter by last 90 days", "also show average order value"
- Every answer shows a Generated SQL expander, timing, and suggested follow-up questions
- If a query fails and is auto-corrected, a collapsed green expander shows the original SQL, the error, and what was fixed
Pick a table, add dimensions (GROUP BY columns) and measures (aggregated columns with SUM / AVG / COUNT / MIN / MAX), set filters, sort order, and row limit β then click βΆ Run.
The builder generates clean, schema-qualified SQL and renders the same interactive chart + table as the Ask tab. Dimensions can be reordered with ββ buttons. Great for ad-hoc exploration without writing SQL.
Define a metric once (name + SQL or question), save it to the Metrics Catalog, and re-run it in any future session with one click. Metrics persist to disk as JSON files in the configured directory.
Auto-generated entity-relationship diagram using Mermaid.js. Tables show all column names and their SQL data types. Solid lines indicate exact column-name join paths; dashed lines indicate fuzzy root matches. Tables with no detected join path are shown in isolation.
from talonsight import TalonSight
# DuckDB (local file)
chat = TalonSight("duckdb:///path/to/analytics.duckdb")
chat = TalonSight("./my_data.duckdb") # bare path works too
# Exasol
chat = TalonSight("exa+pyexasol://user:pass@host:8563/MY_SCHEMA")
# PostgreSQL
chat = TalonSight("postgresql://user:pass@localhost:5432/mydb")
# SQLite / MySQL / anything SQLAlchemy supports
chat = TalonSight("sqlite:///local.db")
chat = TalonSight("mysql+pymysql://user:pass@host:3306/db")result = chat.ask("Top 10 customers by total spend")
print(result.summary) # "The top customer is Acme Corp with $2.3M..."
print(result.sql) # SELECT customer_name, SUM(total) AS total_spend ...
print(result.data) # pandas DataFrame
print(result.chart_config) # {"chart_type": "bar", "x": "customer_name", ...}
# Auto-correction fields β populated when a query failed and recovered
print(result.auto_corrected) # True / False
print(result.original_sql) # the SQL that first failed (if corrected)
print(result.original_error) # the DB error from the first attempt
print(result.correction_explanation) # what the LLM says it fixedfrom talonsight.llm import OllamaBackend, OpenAICompatibleBackend, MLXBackend
# Ollama
llm = OllamaBackend(model="qwen3:8b")
# OpenAI-compatible (LM Studio, vLLM, etc.)
llm = OpenAICompatibleBackend(base_url="http://localhost:1234/v1", model="qwen2.5-coder-14b")
# Apple Silicon MLX β server starts automatically on first query
llm = MLXBackend(base_url="http://localhost:8080/v1", model="mlx-community/Qwen3-8B-4bit")
chat = TalonSight("./data.duckdb", llm=llm)# In-process via fastembed (recommended β pip install talonsight[embeddings])
chat = TalonSight("./data.duckdb", embedding_backend="fastembed")
# Via Ollama embedding server (ollama pull nomic-embed-text)
chat = TalonSight("./data.duckdb",
embedding_backend="ollama",
embedding_url="http://localhost:11434",
embedding_model="nomic-embed-text",
)
# Via any OpenAI-compatible embedding API
chat = TalonSight("./data.duckdb",
embedding_backend="openai",
embedding_url="http://localhost:1234/v1",
embedding_model="nomic-embed-text",
)chat = TalonSight(
"exa+pyexasol://readonly_user:pass@host:8563/PROD",
allowed_schemas=["SALES", "ANALYTICS"],
allowed_tables=["CUSTOMERS", "ORDERS", "PRODUCTS"],
extra_context="""
- revenue columns are in EUR
- fiscal year starts April 1
- ORDERS.status: 'active', 'cancelled', 'refunded'
""",
)from talonsight import TalonSight
with TalonSight("duckdb:///sales.duckdb") as chat:
monthly = chat.ask("Monthly revenue for the last 12 months")
top_products = chat.ask("Top 5 products by units sold this quarter")
monthly.data.to_csv("monthly_revenue.csv", index=False)
top_products.data.to_csv("top_products.csv", index=False)| Model | Command | Quality | Notes |
|---|---|---|---|
qwen3:8b |
ollama pull qwen3:8b |
βββββ | Recommended default |
qwen2.5-coder:7b |
ollama pull qwen2.5-coder:7b |
ββββ | Good for SQL-heavy workloads |
qwen2.5-coder:14b |
ollama pull qwen2.5-coder:14b |
βββββ | Better quality, needs more RAM |
deepseek-coder-v2:16b |
ollama pull deepseek-coder-v2:16b |
βββββ | Excellent for complex joins |
MLX runs models natively on Apple M-series chips via Metal β typically 20β30% faster than Ollama on the same hardware.
pip install talonsight[mlx]In the sidebar: LLM Backend β MLX (Apple Silicon).
The server starts automatically. On first query, talonsight spawns the MLX server in the background, waits for the model to load into memory (~30 s the first time), and then runs your query. If the model isn't cached locally, it is downloaded automatically before the server starts. No terminal command needed at any point.
Default server URL: http://localhost:8080/v1, model: mlx-community/Qwen3-8B-4bit.
| Model | Size | Notes |
|---|---|---|
mlx-community/Qwen3-8B-4bit |
~5 GB | Recommended |
mlx-community/Qwen3-8B-8bit |
~9 GB | Higher quality |
mlx-community/Qwen2.5-Coder-7B-Instruct-4bit |
~4 GB | SQL-focused |
Any server implementing /v1/chat/completions works β LM Studio, vLLM, text-generation-webui, LocalAI. Select "OpenAI-compatible API" in the LLM Backend expander.
When a generated SQL query fails at execution, talonsight does not surface the raw database error immediately. Instead, it automatically retries up to 3 attempts:
- The DB error is passed back to the LLM along with the original question and failed SQL
- The LLM diagnoses the error and returns a corrected query
- The corrected SQL is re-validated for safety and re-executed
- If it succeeds, the result is returned normally with a collapsed notice showing what was fixed
The Streamlit UI shows live status during retries: "
On success after a retry, a collapsed green expander shows:
- The original (failed) SQL
- The database error it produced
- The corrected SQL that was used
- The LLM's explanation of what was wrong
If all 3 attempts fail, the final error is shown with a collapsible detail panel.
The QueryResult object exposes the full correction metadata:
result = chat.ask("...")
if result.auto_corrected:
print("Original SQL:", result.original_sql)
print("Error:", result.original_error)
print("Fix:", result.correction_explanation)Embeddings power two things in talonsight:
- SQL pattern retrieval β finds relevant SQL technique patterns (window functions, YoY comparisons, etc.) to inject as examples into the prompt
- Schema table retrieval β for databases with 15+ tables, retrieves only the relevant tables per query instead of dumping the full schema into the prompt
| Backend | Setup | Quality | Notes |
|---|---|---|---|
| Bag of words (default) | None β works offline | Keyword matching only | Good for small schemas and standard SQL vocabulary |
| FastEmbed (recommended) | pip install talonsight[embeddings] |
Semantic | In-process ONNX, no server. Model auto-downloaded (~130 MB). |
| Ollama | ollama pull nomic-embed-text |
Semantic | Requires Ollama running separately |
| OpenAI-compatible | Any /v1/embeddings server |
Semantic | LM Studio, etc. |
pip install talonsight[embeddings]That's it. On first connect, the nomic-ai/nomic-embed-text-v1.5 model is downloaded (~130 MB) and cached at ~/.cache/fastembed/. All subsequent connects are instant.
In the sidebar: Embeddings β FastEmbed (in-process).
With bag-of-words, retrieval is purely keyword-based. With semantic embeddings:
"burn rate"correctly retrieves yourMONTHLY_EXPENSEStable"churn"matchesCANCELLATIONSeven with no shared words"headcount trend"finds theEMPLOYEESandDEPARTMENTStables- SQL patterns like
"year-over-year comparison"match a question phrased as"how has revenue changed vs last year?"
| Schema size | Behaviour |
|---|---|
| β€ 15 tables | Full schema always included β maximum accuracy |
| > 15 tables | Top 10 most relevant tables retrieved per query; FK-connected tables always included to preserve JOIN paths |
talonsight ships with 203 enriched SQL patterns across five business domains. Each pattern includes:
- Metric nature β what the metric measures, what it does not, and common misconceptions
- Graduated SQL assets β basic, intermediate, and advanced SQL templates
- Inflation & deflation causes β common reasons the metric is over- or under-counted, each with a diagnostic SQL snippet
- Causal relationships β if/then chains linking symptoms to root causes
- Anti-patterns β specific SQL mistakes with explanations of why they break the metric
| Domain | Patterns | Examples |
|---|---|---|
| eCommerce Orders & Revenue | 10 | GMV, net revenue, AOV, refund rate, discount rate, RPV |
| eCommerce Customer Lifecycle | 10 | CLV, repeat rate, RFM, cohort retention, CAC, TBO |
| eCommerce Funnel & Conversion | 9 | Conversion rate, cart abandonment, checkout funnel, PDP-to-cart |
| eCommerce Inventory & Merchandising | 10 | Inventory turnover, DIO, stockout, sell-through, dead stock |
| Finance Revenue | 9 | Recognized revenue, deferred revenue, bookings vs billings, upsell rate |
| Finance Unit Economics | 4 | Gross margin, contribution margin, COGS per unit, operating leverage |
| Marketing Attribution & Spend | 7 | CAC by channel, ROAS, CPC, CPL, blended CAC, payback period |
| Marketing Campaign Performance | 5 | CPA by creative, audience overlap, incrementality, frequency fatigue |
| Product Engagement | 8 | DAU/MAU, session length, feature adoption, stickiness, breadth of use |
Patterns are embedded using the same embedding backend as schema retrieval, so business vocabulary in questions ("conversion dropped", "what's our payback?") maps correctly to the right pattern.
You can also load your own patterns:
chat = TalonSight("./data.duckdb", kb_path="/path/to/your/patterns/")Or seed the KB with a questionβSQL pair:
chat.kb.load_file("/path/to/pattern.json")Patterns persist at ~/.talonsight/kb/ by default. Point the UI to a custom directory via the π Knowledge Base expander or TALONSIGHT_KB_PATH in .env.
- Allowlist-only: Only
SELECTandWITH(CTE) pass.INSERT,UPDATE,DELETE,DROP,ALTER,CREATE,EXEC,CALL,EXPORT,IMPORT,COPY,ATTACH,DETACH,INSTALL,LOADare all blocked before execution. - No
exec()oreval(): LLM output is never executed as Python code. - Pattern matching: Blocks
read_csv/read_parquet/read_json(DuckDB file access),pg_sleep,BENCHMARK, statement stacking (;-separated queries),SET,PRAGMA. - Access control enforcement: The LLM prompt explicitly lists allowed tables; the safety validator cross-checks the generated SQL against the allowlist β including after auto-correction rewrites.
- Read-only connections: DuckDB files always opened with
read_only=True. SQLAlchemy usesSET TRANSACTION READ ONLYwhere supported. - Suspicious query warnings:
UNION SELECT, tautology injections, and system table access trigger a visible warning badge without blocking execution.
Use a read-only database user in production. The safety layer is defence-in-depth, not a substitute for proper DB permissions.
The diagram shows the full SQL generation pipeline: from a natural language question through KB retrieval, schema narrowing, LLM generation, safety validation, auto-correct retry loop, and result enrichment.
See ARCHITECTURE.md for the full component breakdown and a Mermaid flowchart of the pipeline.
from talonsight import TalonSight
from talonsight.llm import OllamaBackend
chat = TalonSight(
connection="duckdb:///sales.duckdb",
llm=OllamaBackend(model="qwen3:8b"),
# Schema scoping
schema="main",
# Access control
allowed_schemas=["SALES", "ANALYTICS"],
allowed_tables=["CUSTOMERS", "ORDERS", "PRODUCTS"],
# Business context injected into every prompt
extra_context="revenue is in EUR. fiscal year starts April 1.",
# Query limits
max_rows=10000,
# Embeddings β controls both KB pattern retrieval and schema table retrieval
# "bow" (default) | "fastembed" | "ollama" | "openai"
embedding_backend="fastembed",
embedding_url="", # not needed for fastembed
embedding_model="nomic-ai/nomic-embed-text-v1.5",
# Knowledge Base
kb_path=None, # path to extra KB JSON files (built-in patterns always loaded)
# Charts
chart_library="auto", # "plotly", "altair", or "auto"
# Metrics
metrics_path=None, # path to metrics JSON directory (~/.talonsight/metrics/ by default)
)| Module | Purpose |
|---|---|
app.py |
Streamlit UI β 4 tabs (Ask / Build / Metrics / Schema), compact sidebar, chart controls, retry status |
app_builder.py |
Visual Query Builder β dimension / measure / filter / sort UI β SQL |
core.py |
Engine β orchestrates the full ask() pipeline including auto-correct retry loop |
llm.py |
LLM backends β Ollama, MLX (auto-start), OpenAI-compatible; dialect hints; fix_sql() for auto-correction |
schema.py |
Schema introspection + join inference (exact + fuzzy column-name matching) |
safety.py |
SQL validation β allowlist, DDL/DML blocking, injection pattern detection |
connection.py |
Connection management β pyexasol, DuckDB native (read-only), SQLAlchemy |
builder.py |
QueryBuilder β programmatic SELECT / GROUP BY / filter / sort β schema-qualified SQL |
metrics.py |
Metrics Catalog β save / load / run named KPI queries from JSON |
kb.py |
Knowledge Base + Schema Index β ChromaDB store; 203 built-in domain patterns; bag-of-words / FastEmbed / Ollama / OpenAI embeddings |
charts.py |
Auto-charting β Plotly bar / line / area / scatter / pie |
cli.py |
CLI entrypoint β setup wizard, MLX server auto-start, model auto-download |
setup_wizard.py |
First-time setup β LLM backend selection, MLX model cache check and download |
- SQL accuracy = LLM quality. Smaller models produce worse SQL. 7B+ recommended; 14B+ for complex schemas or many tables.
- Safety layer is regex-based. It catches known patterns but is not a full SQL parser. Always use a read-only database user.
- Join inference is heuristic. Column-name similarity works well for conventional naming; semantic joins (different names, same concept) are not detected automatically, though semantic embeddings reduce this gap for schema retrieval.
- Charts are LLM-suggested. Usually correct β use the chart controls in the UI to override type, axes, and measures.
- Embedding model download required on first use. FastEmbed downloads ~130 MB on first connect; requires an internet connection once.
- MLX is Apple Silicon only. The MLX backend requires an M-series Mac. On other platforms, use Ollama or an OpenAI-compatible API instead.
MIT





