v1.0.0 — Production-Grade Multi-Agent Equity Research Engine

Autonomous equity research — from raw financial data to institutional-quality analyst reports — in under 7 minutes. Traditional analyst workflow takes 6–12 hours for one report. This system does it end-to-end without human intervention, and has been battle-tested across hundreds of analyses on production infrastructure at vynnai.com.

Step	Traditional Analyst	This System
Financial data collection	30–60 min	< 5 seconds
DCF model (9-tab Excel)	2–4 hours	< 10 seconds
News research (17–18 articles)	1–2 hours	~3 minutes
Professional analyst report	3–6 hours	~3 minutes
Total	6–12 hours	< 7 minutes

Architecture

Supervisor-worker architecture on LangGraph's cyclical state graph. An LLM-powered supervisor classifies user intent (COMPREHENSIVE / MODEL_ONLY / QUICK_NEWS / CUSTOM), extracts tickers from natural language, and routes to specialized agents with strict dependency enforcement. If LLM routing fails, a deterministic rule-based fallback takes over via _resolve_dependencies() — no silent failures.

Dependency chain: financial_data_agent → model_generation_agent → news_analysis_agent → report_generator_agent

The supervisor enforces this ordering even if the LLM suggests otherwise. Objective-driven early termination allows MODEL_ONLY workflows to stop after model + summary, and QUICK_NEWS to stop after news + summary.

Technical Highlights

The LLM Never Invents Numbers

The core design principle. All financial calculations — expected returns, price targets, rating bands, sensitivity matrices — are computed deterministically in RecommendationCalculator. The LLM only writes the narrative explanation. A 3-layer architecture ensures integrity:

RecommendationCalculator (deterministic) — pure Python math with sector-aware premiums, volatility caps, and time decay. Outputs FixedNumbers (immutable, auditable).
EvidenceExtractor → LLM Explainer — builds an evidence pack with unique IDs (E1, E2, …), source quality scoring (primary > tier-1 > syndication), then prompts the LLM to write narrative using only the provided numbers and evidence.
RecommendationValidator — regex-based number verification against FixedNumbers, citation coverage check (≥95% required), auto-correction of LLM number deviations. Blocks publication if validation fails.

Rating bands: STRONG BUY (>20%) · BUY (10–20%) · HOLD (−5% to +10%) · SELL (−20% to −5%) · STRONG SELL (<−20%).

9-Tab DCF with Custom Formula Evaluator

The Financial Model Agent generates a banker-grade Excel workbook with 9 tabs (Raw, Keys_Map, Assumptions, Historical, Projections, Valuation Perpetual Growth, Valuation Exit Multiple, Sensitivity, Summary) plus a hidden LLM_Inferred tab storing raw LLM assumptions.

Dual DCF valuation: Perpetual Growth method (Gordon Growth Model) + Exit Multiple method (terminal EV/EBITDA) with blended fair value. Sensitivity matrices: WACC vs. Terminal Growth and Growth vs. Margin.

The system needs to generate Excel files (for human analysts) and use computed values programmatically (for downstream LLM agents). Rather than requiring Excel, a custom FormulaEvaluator (1,293 lines) interprets the same formulas that appear in the Excel tabs — resolving cell references, cross-tab references, arithmetic, and functions like SUMIFS. This ensures the workbook and JSON output are always consistent.

6 sector-specific strategies: Generic DCF, SaaS (Rule of 40), REIT (FFO/AFFO), Bank (Excess Returns), Utility, Energy NAV — pluggable via Strategy pattern without code changes.

News Intelligence Pipeline

A 3-stage autonomous pipeline:

Stage	Module	Description
Scraping	`article_scraper.py` (747 lines)	AI-generated search queries across financial, management, industry, and competitive categories → Google News via SerpAPI → `newspaper3k` parsing
Filtering	`article_filter.py` (564 lines)	LLM batch scoring (0–10) against investment thesis → MongoDB persistence
Screening	`article_screener.py` (815 lines)	Deep LLM analysis → structured `Catalyst`, `Risk`, `Mitigation` dataclasses with confidence scores, direct quotes, source URLs, evidence chains, and timelines

Database-aware caching: skips re-scraping when recent articles exist. If MongoDB is unavailable, falls back to local file storage.

Anti-Hallucination Safeguards

Embedded in all 33 externalized prompt templates: explicit "NEVER fabricate data" instructions, hard rules for number formatting and citation requirements, structured JSON output schemas, and source verification gates ("If you cannot find the source article, DO NOT include the claim").

Experiment Results

Three experiments evaluating system performance, reproducibility, and output quality. These informed the production deployment.

Latency & Component Breakdown

Component	Avg. Time	% of Total
News Analysis (scraping + filtering + screening)	189.4s	49.4%
Report Generator	167.6s	43.8%
Financial Data (Yahoo Finance)	4.7s	1.2%
Model Generation (9-tab DCF)	5.2s	1.3%
Supervisor Overhead	16.2s	4.2%
Total	~383s (6.4 min)	100%

LLM-intensive operations account for ~93% of total execution time. Financial data collection and DCF model building are near-instantaneous (<10s combined). Supervisor routing overhead is ~4%, validating the lightweight orchestration design.

Reproducibility & Stability (9 runs)

Ticker	Success Rate	Mean Duration	CV (σ/μ)	Reproducibility Score
NVDA	100%	384.5s	0.016	0.985
AAPL	100%	215.8s	0.033	0.969
MSFT	100%	195.6s	0.035	0.965

Three paraphrased prompts for NVDA ("Analyze NVDA stock…", "Give me a comprehensive analysis of NVIDIA…", "What's your investment recommendation for NVDA?") all correctly extracted the ticker, triggered identical 4-agent workflows, and completed within a 13-second window (378.9s–391.7s). Overall stability score: 0.983.

Qualitative Case Studies

Company	Articles	Catalysts	Risks	DCF Fair Value	Market Price	Implied Upside
META	18	7 (90% top conf.)	6	$604.06	$621.71	−2.8% (fairly valued)
NVDA	17	7 (90% top conf.)	5	$208.82	$188.15	+11.0% (undervalued)

AAPL demonstrated the supervisor's ability to intelligently route simple queries to a single agent instead of running the full 4-agent pipeline — optimizing both cost and latency.

Design Patterns

Pattern	Implementation	Why
Supervisor + Worker	LangGraph cyclical graph with conditional edges	LLM proposes routing; dependency resolver enforces valid sequencing
Blackboard State	`FinancialState` dataclass passed between all agents	Single source of truth; every agent reads/writes to shared state
Builder	Each Excel tab has a dedicated builder class	Tabs can be tested and modified independently
Deterministic Math + LLM Narrative	`RecommendationCalculator` → `EvidenceExtractor` → LLM → `RecommendationValidator`	Numbers are code; narrative is LLM; validator ensures integrity
Strategy	Pluggable DCF strategies (SaaS, REIT, Bank, Utility, Energy)	Sector-aware modeling without code changes
Prompt Externalization	33 markdown templates in `prompts/`	Version-controlled, auditable, hot-swappable without deployments

Deployment

Containerized as fuzanwenn/stock-analyst:latest (~975 MB, python:3.11-slim base)
Multi-arch Docker builds (linux/amd64 + linux/arm64)
Spawned as ephemeral containers by the API layer via Docker-in-Docker
Complete process isolation — memory leaks in one analysis never affect others

Known Limitations

Agent execution is strictly sequential; parallel execution is architecturally possible but not yet implemented
No streaming of intermediate agent reasoning to the user (handled at the API layer via SSE)
System throughput: ~10 full analyses per hour (sequential execution)

Tech Stack

Python 3.11 · LangGraph · OpenAI API · Anthropic API · MongoDB · Docker · yfinance · openpyxl · SerpAPI · newspaper3k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.