Forecast monthly profit and revenue for a gift-hamper business. The stack combines company sales data, Firecrawl market scraping, a TensorFlow LSTM model, and Grafana dashboards — all runnable with Docker Compose.
Built for seasonal occasions (Christmas, Easter, corporate gifting) where demand spikes are predictable but hard to plan without both internal history and external market signals.
- Overview
- Architecture
- Data flow
- Project structure
- Prerequisites
- Quick start
- Configuration
- API reference
- Database schema
- Operator workflow
- Local development
- Troubleshooting
- License
| Layer | Technology | Role |
|---|---|---|
| Ingestion | FastAPI, CSV/JSON | Company monthly sales, revenue, profit |
| Market intel | Firecrawl | Scrape/search competitor and trend pages |
| Storage | PostgreSQL 16 | Time series, documents, forecasts |
| ML | TensorFlow/Keras LSTM | 6-month profit + revenue forecast |
| Viz | Grafana | Actual vs forecast dashboards |
Key behaviours
- Monthly grain aligned to retail seasonality
- Multivariate LSTM (sales, revenue, profit + market keyword features)
- Seasonal naive fallback when history is too short (< 18 months by default)
- Scheduled scrape, retrain, and forecast jobs via APScheduler
flowchart TB
subgraph clients [Clients]
Ops[Operator / CSV upload]
GFUI[Grafana UI]
end
subgraph ingest [Ingestion]
FC[Firecrawl API]
CSV[CSV or JSON uploads]
end
subgraph app [Python app container]
API[FastAPI]
SCH[APScheduler]
ETL[Market feature builder]
TRAIN[LSTM training]
INF[Forecast inference]
end
subgraph store [Data layer]
PG[(PostgreSQL)]
MV[(Model volume)]
end
subgraph viz [Visualization]
GF[Grafana]
end
Ops --> API
CSV --> API
API --> PG
SCH --> FC
FC --> ETL
ETL --> PG
PG --> TRAIN
TRAIN --> MV
TRAIN --> PG
PG --> INF
MV --> INF
INF --> PG
PG --> GF
GFUI --> GF
sequenceDiagram
participant Op as Operator
participant API as FastAPI
participant FC as Firecrawl
participant PG as PostgreSQL
participant ML as LSTM pipeline
participant GF as Grafana
Op->>API: POST /ingest/company-monthly/csv
API->>PG: Upsert company_monthly
Op->>API: POST /jobs/scrape-market
API->>FC: Search + scrape sources
FC-->>API: Markdown documents
API->>PG: market_documents + market_features_monthly
Op->>API: POST /jobs/retrain
API->>PG: Load time series
ML->>PG: Save model_run + artifact
Op->>API: POST /jobs/forecast
ML->>PG: Write forecasts
GF->>PG: SQL queries for dashboards
Op->>GF: View profit vs forecast
.
├── config/
│ └── market_sources.yaml # Firecrawl queries and seed URLs
├── data/
│ └── sample_company_monthly.csv
├── grafana/provisioning/ # Datasource + dashboard as code
├── ml/ # Dataset, LSTM model, train/infer
├── sql/migrations/ # Postgres init schema
├── src/
│ ├── main.py # FastAPI + scheduler entrypoint
│ ├── routers/ # Ingest and job endpoints
│ └── services/ # Firecrawl client, feature aggregation
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── .env.example
└── LICENSE
- Docker Desktop (or Docker Engine + Compose v2)
- Optional: Firecrawl API key for market scraping
- Ports available: 3000 (Grafana), 8001 (API — mapped from container 8000)
Security: Never commit
.envor API keys. If a key was exposed, rotate it in the Firecrawl dashboard before use.
git clone <your-repo-url>
cd Lstm
cp .env.example .envEdit .env and set FIRECRAWL_API_KEY (optional for testing without scraping).
docker compose up --build -d| Service | URL | Default credentials |
|---|---|---|
| API | http://localhost:8001 | — |
| API docs (Swagger) | http://localhost:8001/docs | — |
| Grafana | http://localhost:3000 | admin / admin |
Postgres runs on the internal Docker network only (not exposed to the host by default).
curl -X POST "http://localhost:8001/ingest/company-monthly/csv" \
-F "file=@data/sample_company_monthly.csv"
curl -X POST http://localhost:8001/jobs/retrain
curl -X POST http://localhost:8001/jobs/forecastOptional — refresh market signals (requires Firecrawl key):
curl -X POST http://localhost:8001/jobs/scrape-marketOpen Grafana → folder Hamper Forecast → dashboard Hamper Profit Forecast.
Copy .env.example to .env:
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql://hamper:...@postgres:5432/hamper_forecast |
SQLAlchemy connection string |
FIRECRAWL_API_KEY |
— | Firecrawl Bearer token |
GRAFANA_ADMIN_USER |
admin |
Grafana login |
GRAFANA_ADMIN_PASSWORD |
admin |
Change in production |
MODEL_DIR |
/models |
Path for saved .keras model |
LSTM_WINDOW |
12 |
Months of history per training window |
FORECAST_HORIZON |
6 |
Months to predict ahead |
MIN_TRAINING_ROWS |
18 |
Minimum monthly rows before LSTM (else fallback) |
Market scrape targets live in config/market_sources.yaml (search queries, competitor URLs, keyword list).
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/ingest/company-monthly |
JSON batch of monthly rows |
POST |
/ingest/company-monthly/csv |
CSV file upload |
POST |
/ingest/products |
Product lineup metadata |
POST |
/jobs/scrape-market |
Firecrawl scrape + feature aggregation |
POST |
/jobs/aggregate-market-features |
Recompute monthly market features |
POST |
/jobs/retrain |
Train LSTM and record model_runs |
POST |
/jobs/forecast |
Write profit/revenue forecasts |
period,sales_volume,revenue,profit,currency,notes
2024-01-01,440,13200,3300,GBP,JanuarySupported date formats: YYYY-MM-DD, YYYY-MM, MM/YYYY, DD/MM/YYYY.
erDiagram
company_monthly {
date period PK
numeric sales_volume
numeric revenue
numeric profit
varchar currency
}
products {
int id PK
varchar name
varchar category
}
market_documents {
int id PK
text source_url
timestamptz scraped_at
text markdown
varchar query_tag
}
market_features_monthly {
date period PK
int doc_count
numeric avg_keyword_score
int gift_hits
int hamper_hits
}
model_runs {
int id PK
timestamptz started_at
jsonb metrics
text artifact_path
}
forecasts {
int id PK
timestamptz generated_at
date target_month
varchar metric
numeric point_estimate
int model_run_id FK
}
model_runs ||--o{ forecasts : produces
flowchart LR
A[Upload monthly CSV] --> B{Need market refresh?}
B -->|Yes| C[POST /jobs/scrape-market]
B -->|No| D[POST /jobs/retrain]
C --> D
D --> E[POST /jobs/forecast]
E --> F[Review Grafana dashboard]
F --> G[Adjust market_sources.yaml]
G --> C
| Schedule | Job |
|---|---|
| Daily 02:00 | Market scrape + feature aggregation |
| 1st of month 03:00 | Retrain LSTM + generate forecast |
- Window: 12 months (
LSTM_WINDOW) - Horizon: 6 months (
FORECAST_HORIZON) - Fallback: seasonal naive from last 12 months when data or trained model is unavailable
- Artifacts: persisted in Docker volume
modelstoreat/models
- Profit — actual vs forecast (time series)
- Latest forecast table (profit + revenue)
- Market documents per month
- Revenue — actual vs forecast
There was no test suite initially, so pytest reported collected 0 items. Tests are now organized by marker:
| Marker | Requires | What it covers |
|---|---|---|
unit |
Nothing extra | Date parsing, dataset helpers, Firecrawl error handling |
integration |
PostgreSQL on port 5433 | API ingest, jobs, market features |
ml |
PostgreSQL + TensorFlow | Full CSV → train → LSTM forecast pipeline |
docker compose up -d postgres
docker compose --profile test run --rm testOr on Windows:
.\scripts\test.ps1pip install -r requirements-dev.txt
pytest -m unitStart Postgres with the exposed test port, then:
pip install -r requirements.txt -r requirements-dev.txt
set TEST_DATABASE_URL=postgresql://hamper:hamper_secret@localhost:5433/hamper_forecast_test
pytest -m "integration or ml"Note: ML tests need TensorFlow (included in the Docker image). Local Python 3.13 may not support TensorFlow — use Docker for the full suite.
Without Docker (Postgres must be reachable separately):
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install -r requirements.txt
# Point at local or containerized Postgres
set DATABASE_URL=postgresql://hamper:hamper_secret@localhost:5432/hamper_forecast # Windows
export DATABASE_URL=postgresql://hamper:hamper_secret@localhost:5432/hamper_forecast # Unix
python -m src.mainRun only database + Grafana via Docker:
docker compose up postgres grafana -d| Issue | Fix |
|---|---|
pytest collects 0 items |
Tests live under tests/ — run from repo root; use Docker command above for the full suite |
ModuleNotFoundError locally |
Install deps with Docker, or pip install -r requirements.txt -r requirements-dev.txt on Python 3.11 |
| Empty Grafana charts | Ingest data, then POST /jobs/forecast |
| LSTM training skipped | Need ≥ 18 monthly rows; sample CSV has 24 |
| Firecrawl errors | Check API key, quota, and URLs in market_sources.yaml |
| Port 8001 in use | Change "8001:8000" in docker-compose.yml |
| Postgres connection failed | Use host postgres inside Docker, localhost when running app on host |
This project is licensed under the MIT License.