Skip to content

Releases: Agentic-Analyst/stock-analyst

v1.0.0 — Production-Grade Multi-Agent Equity Research Engine

01 Apr 05:19
1582df1

Choose a tag to compare

v1.0.0 — Production-Grade Multi-Agent Equity Research Engine

Autonomous equity research — from raw financial data to institutional-quality analyst reports — in under 7 minutes. Traditional analyst workflow takes 6–12 hours for one report. This system does it end-to-end without human intervention, and has been battle-tested across hundreds of analyses on production infrastructure at vynnai.com.

Step Traditional Analyst This System
Financial data collection 30–60 min < 5 seconds
DCF model (9-tab Excel) 2–4 hours < 10 seconds
News research (17–18 articles) 1–2 hours ~3 minutes
Professional analyst report 3–6 hours ~3 minutes
Total 6–12 hours < 7 minutes

Architecture

Supervisor-worker architecture on LangGraph's cyclical state graph. An LLM-powered supervisor classifies user intent (COMPREHENSIVE / MODEL_ONLY / QUICK_NEWS / CUSTOM), extracts tickers from natural language, and routes to specialized agents with strict dependency enforcement. If LLM routing fails, a deterministic rule-based fallback takes over via _resolve_dependencies() — no silent failures.

Dependency chain: financial_data_agent → model_generation_agent → news_analysis_agent → report_generator_agent

The supervisor enforces this ordering even if the LLM suggests otherwise. Objective-driven early termination allows MODEL_ONLY workflows to stop after model + summary, and QUICK_NEWS to stop after news + summary.


Technical Highlights

The LLM Never Invents Numbers

The core design principle. All financial calculations — expected returns, price targets, rating bands, sensitivity matrices — are computed deterministically in RecommendationCalculator. The LLM only writes the narrative explanation. A 3-layer architecture ensures integrity:

  1. RecommendationCalculator (deterministic) — pure Python math with sector-aware premiums, volatility caps, and time decay. Outputs FixedNumbers (immutable, auditable).
  2. EvidenceExtractor → LLM Explainer — builds an evidence pack with unique IDs (E1, E2, …), source quality scoring (primary > tier-1 > syndication), then prompts the LLM to write narrative using only the provided numbers and evidence.
  3. RecommendationValidator — regex-based number verification against FixedNumbers, citation coverage check (≥95% required), auto-correction of LLM number deviations. Blocks publication if validation fails.

Rating bands: STRONG BUY (>20%) · BUY (10–20%) · HOLD (−5% to +10%) · SELL (−20% to −5%) · STRONG SELL (<−20%).

9-Tab DCF with Custom Formula Evaluator

The Financial Model Agent generates a banker-grade Excel workbook with 9 tabs (Raw, Keys_Map, Assumptions, Historical, Projections, Valuation Perpetual Growth, Valuation Exit Multiple, Sensitivity, Summary) plus a hidden LLM_Inferred tab storing raw LLM assumptions.

Dual DCF valuation: Perpetual Growth method (Gordon Growth Model) + Exit Multiple method (terminal EV/EBITDA) with blended fair value. Sensitivity matrices: WACC vs. Terminal Growth and Growth vs. Margin.

The system needs to generate Excel files (for human analysts) and use computed values programmatically (for downstream LLM agents). Rather than requiring Excel, a custom FormulaEvaluator (1,293 lines) interprets the same formulas that appear in the Excel tabs — resolving cell references, cross-tab references, arithmetic, and functions like SUMIFS. This ensures the workbook and JSON output are always consistent.

6 sector-specific strategies: Generic DCF, SaaS (Rule of 40), REIT (FFO/AFFO), Bank (Excess Returns), Utility, Energy NAV — pluggable via Strategy pattern without code changes.

News Intelligence Pipeline

A 3-stage autonomous pipeline:

Stage Module Description
Scraping article_scraper.py (747 lines) AI-generated search queries across financial, management, industry, and competitive categories → Google News via SerpAPI → newspaper3k parsing
Filtering article_filter.py (564 lines) LLM batch scoring (0–10) against investment thesis → MongoDB persistence
Screening article_screener.py (815 lines) Deep LLM analysis → structured Catalyst, Risk, Mitigation dataclasses with confidence scores, direct quotes, source URLs, evidence chains, and timelines

Database-aware caching: skips re-scraping when recent articles exist. If MongoDB is unavailable, falls back to local file storage.

Anti-Hallucination Safeguards

Embedded in all 33 externalized prompt templates: explicit "NEVER fabricate data" instructions, hard rules for number formatting and citation requirements, structured JSON output schemas, and source verification gates ("If you cannot find the source article, DO NOT include the claim").


Experiment Results

Three experiments evaluating system performance, reproducibility, and output quality. These informed the production deployment.

Latency & Component Breakdown

Component Avg. Time % of Total
News Analysis (scraping + filtering + screening) 189.4s 49.4%
Report Generator 167.6s 43.8%
Financial Data (Yahoo Finance) 4.7s 1.2%
Model Generation (9-tab DCF) 5.2s 1.3%
Supervisor Overhead 16.2s 4.2%
Total ~383s (6.4 min) 100%

LLM-intensive operations account for ~93% of total execution time. Financial data collection and DCF model building are near-instantaneous (<10s combined). Supervisor routing overhead is ~4%, validating the lightweight orchestration design.

Reproducibility & Stability (9 runs)

Ticker Success Rate Mean Duration CV (σ/μ) Reproducibility Score
NVDA 100% 384.5s 0.016 0.985
AAPL 100% 215.8s 0.033 0.969
MSFT 100% 195.6s 0.035 0.965

Three paraphrased prompts for NVDA ("Analyze NVDA stock…", "Give me a comprehensive analysis of NVIDIA…", "What's your investment recommendation for NVDA?") all correctly extracted the ticker, triggered identical 4-agent workflows, and completed within a 13-second window (378.9s–391.7s). Overall stability score: 0.983.

Qualitative Case Studies

Company Articles Catalysts Risks DCF Fair Value Market Price Implied Upside
META 18 7 (90% top conf.) 6 $604.06 $621.71 −2.8% (fairly valued)
NVDA 17 7 (90% top conf.) 5 $208.82 $188.15 +11.0% (undervalued)

AAPL demonstrated the supervisor's ability to intelligently route simple queries to a single agent instead of running the full 4-agent pipeline — optimizing both cost and latency.


Design Patterns

Pattern Implementation Why
Supervisor + Worker LangGraph cyclical graph with conditional edges LLM proposes routing; dependency resolver enforces valid sequencing
Blackboard State FinancialState dataclass passed between all agents Single source of truth; every agent reads/writes to shared state
Builder Each Excel tab has a dedicated builder class Tabs can be tested and modified independently
Deterministic Math + LLM Narrative RecommendationCalculatorEvidenceExtractor → LLM → RecommendationValidator Numbers are code; narrative is LLM; validator ensures integrity
Strategy Pluggable DCF strategies (SaaS, REIT, Bank, Utility, Energy) Sector-aware modeling without code changes
Prompt Externalization 33 markdown templates in prompts/ Version-controlled, auditable, hot-swappable without deployments

Deployment

  • Containerized as fuzanwenn/stock-analyst:latest (~975 MB, python:3.11-slim base)
  • Multi-arch Docker builds (linux/amd64 + linux/arm64)
  • Spawned as ephemeral containers by the API layer via Docker-in-Docker
  • Complete process isolation — memory leaks in one analysis never affect others

Known Limitations

  • Agent execution is strictly sequential; parallel execution is architecturally possible but not yet implemented
  • No streaming of intermediate agent reasoning to the user (handled at the API layer via SSE)
  • System throughput: ~10 full analyses per hour (sequential execution)

Tech Stack

Python 3.11 · LangGraph · OpenAI API · Anthropic API · MongoDB · Docker · yfinance · openpyxl · SerpAPI · newspaper3k