Releases: Agentic-Analyst/stock-analyst
v1.0.0 — Production-Grade Multi-Agent Equity Research Engine
v1.0.0 — Production-Grade Multi-Agent Equity Research Engine
Autonomous equity research — from raw financial data to institutional-quality analyst reports — in under 7 minutes. Traditional analyst workflow takes 6–12 hours for one report. This system does it end-to-end without human intervention, and has been battle-tested across hundreds of analyses on production infrastructure at vynnai.com.
| Step | Traditional Analyst | This System |
|---|---|---|
| Financial data collection | 30–60 min | < 5 seconds |
| DCF model (9-tab Excel) | 2–4 hours | < 10 seconds |
| News research (17–18 articles) | 1–2 hours | ~3 minutes |
| Professional analyst report | 3–6 hours | ~3 minutes |
| Total | 6–12 hours | < 7 minutes |
Architecture
Supervisor-worker architecture on LangGraph's cyclical state graph. An LLM-powered supervisor classifies user intent (COMPREHENSIVE / MODEL_ONLY / QUICK_NEWS / CUSTOM), extracts tickers from natural language, and routes to specialized agents with strict dependency enforcement. If LLM routing fails, a deterministic rule-based fallback takes over via _resolve_dependencies() — no silent failures.
Dependency chain: financial_data_agent → model_generation_agent → news_analysis_agent → report_generator_agent
The supervisor enforces this ordering even if the LLM suggests otherwise. Objective-driven early termination allows MODEL_ONLY workflows to stop after model + summary, and QUICK_NEWS to stop after news + summary.
Technical Highlights
The LLM Never Invents Numbers
The core design principle. All financial calculations — expected returns, price targets, rating bands, sensitivity matrices — are computed deterministically in RecommendationCalculator. The LLM only writes the narrative explanation. A 3-layer architecture ensures integrity:
RecommendationCalculator(deterministic) — pure Python math with sector-aware premiums, volatility caps, and time decay. OutputsFixedNumbers(immutable, auditable).EvidenceExtractor→ LLM Explainer — builds an evidence pack with unique IDs (E1, E2, …), source quality scoring (primary > tier-1 > syndication), then prompts the LLM to write narrative using only the provided numbers and evidence.RecommendationValidator— regex-based number verification againstFixedNumbers, citation coverage check (≥95% required), auto-correction of LLM number deviations. Blocks publication if validation fails.
Rating bands: STRONG BUY (>20%) · BUY (10–20%) · HOLD (−5% to +10%) · SELL (−20% to −5%) · STRONG SELL (<−20%).
9-Tab DCF with Custom Formula Evaluator
The Financial Model Agent generates a banker-grade Excel workbook with 9 tabs (Raw, Keys_Map, Assumptions, Historical, Projections, Valuation Perpetual Growth, Valuation Exit Multiple, Sensitivity, Summary) plus a hidden LLM_Inferred tab storing raw LLM assumptions.
Dual DCF valuation: Perpetual Growth method (Gordon Growth Model) + Exit Multiple method (terminal EV/EBITDA) with blended fair value. Sensitivity matrices: WACC vs. Terminal Growth and Growth vs. Margin.
The system needs to generate Excel files (for human analysts) and use computed values programmatically (for downstream LLM agents). Rather than requiring Excel, a custom FormulaEvaluator (1,293 lines) interprets the same formulas that appear in the Excel tabs — resolving cell references, cross-tab references, arithmetic, and functions like SUMIFS. This ensures the workbook and JSON output are always consistent.
6 sector-specific strategies: Generic DCF, SaaS (Rule of 40), REIT (FFO/AFFO), Bank (Excess Returns), Utility, Energy NAV — pluggable via Strategy pattern without code changes.
News Intelligence Pipeline
A 3-stage autonomous pipeline:
| Stage | Module | Description |
|---|---|---|
| Scraping | article_scraper.py (747 lines) |
AI-generated search queries across financial, management, industry, and competitive categories → Google News via SerpAPI → newspaper3k parsing |
| Filtering | article_filter.py (564 lines) |
LLM batch scoring (0–10) against investment thesis → MongoDB persistence |
| Screening | article_screener.py (815 lines) |
Deep LLM analysis → structured Catalyst, Risk, Mitigation dataclasses with confidence scores, direct quotes, source URLs, evidence chains, and timelines |
Database-aware caching: skips re-scraping when recent articles exist. If MongoDB is unavailable, falls back to local file storage.
Anti-Hallucination Safeguards
Embedded in all 33 externalized prompt templates: explicit "NEVER fabricate data" instructions, hard rules for number formatting and citation requirements, structured JSON output schemas, and source verification gates ("If you cannot find the source article, DO NOT include the claim").
Experiment Results
Three experiments evaluating system performance, reproducibility, and output quality. These informed the production deployment.
Latency & Component Breakdown
| Component | Avg. Time | % of Total |
|---|---|---|
| News Analysis (scraping + filtering + screening) | 189.4s | 49.4% |
| Report Generator | 167.6s | 43.8% |
| Financial Data (Yahoo Finance) | 4.7s | 1.2% |
| Model Generation (9-tab DCF) | 5.2s | 1.3% |
| Supervisor Overhead | 16.2s | 4.2% |
| Total | ~383s (6.4 min) | 100% |
LLM-intensive operations account for ~93% of total execution time. Financial data collection and DCF model building are near-instantaneous (<10s combined). Supervisor routing overhead is ~4%, validating the lightweight orchestration design.
Reproducibility & Stability (9 runs)
| Ticker | Success Rate | Mean Duration | CV (σ/μ) | Reproducibility Score |
|---|---|---|---|---|
| NVDA | 100% | 384.5s | 0.016 | 0.985 |
| AAPL | 100% | 215.8s | 0.033 | 0.969 |
| MSFT | 100% | 195.6s | 0.035 | 0.965 |
Three paraphrased prompts for NVDA ("Analyze NVDA stock…", "Give me a comprehensive analysis of NVIDIA…", "What's your investment recommendation for NVDA?") all correctly extracted the ticker, triggered identical 4-agent workflows, and completed within a 13-second window (378.9s–391.7s). Overall stability score: 0.983.
Qualitative Case Studies
| Company | Articles | Catalysts | Risks | DCF Fair Value | Market Price | Implied Upside |
|---|---|---|---|---|---|---|
| META | 18 | 7 (90% top conf.) | 6 | $604.06 | $621.71 | −2.8% (fairly valued) |
| NVDA | 17 | 7 (90% top conf.) | 5 | $208.82 | $188.15 | +11.0% (undervalued) |
AAPL demonstrated the supervisor's ability to intelligently route simple queries to a single agent instead of running the full 4-agent pipeline — optimizing both cost and latency.
Design Patterns
| Pattern | Implementation | Why |
|---|---|---|
| Supervisor + Worker | LangGraph cyclical graph with conditional edges | LLM proposes routing; dependency resolver enforces valid sequencing |
| Blackboard State | FinancialState dataclass passed between all agents |
Single source of truth; every agent reads/writes to shared state |
| Builder | Each Excel tab has a dedicated builder class | Tabs can be tested and modified independently |
| Deterministic Math + LLM Narrative | RecommendationCalculator → EvidenceExtractor → LLM → RecommendationValidator |
Numbers are code; narrative is LLM; validator ensures integrity |
| Strategy | Pluggable DCF strategies (SaaS, REIT, Bank, Utility, Energy) | Sector-aware modeling without code changes |
| Prompt Externalization | 33 markdown templates in prompts/ |
Version-controlled, auditable, hot-swappable without deployments |
Deployment
- Containerized as
fuzanwenn/stock-analyst:latest(~975 MB,python:3.11-slimbase) - Multi-arch Docker builds (linux/amd64 + linux/arm64)
- Spawned as ephemeral containers by the API layer via Docker-in-Docker
- Complete process isolation — memory leaks in one analysis never affect others
Known Limitations
- Agent execution is strictly sequential; parallel execution is architecturally possible but not yet implemented
- No streaming of intermediate agent reasoning to the user (handled at the API layer via SSE)
- System throughput: ~10 full analyses per hour (sequential execution)
Tech Stack
Python 3.11 · LangGraph · OpenAI API · Anthropic API · MongoDB · Docker · yfinance · openpyxl · SerpAPI · newspaper3k