A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.
-
Production-Ready Architecture
- FastAPI for high-performance async API endpoints with uvloop optimization
- LangGraph integration for AI agent workflows with state persistence
- Langfuse for LLM observability and monitoring
- Structured logging with environment-specific formatting and request context
- Rate limiting with configurable rules per endpoint
- PostgreSQL with pgvector for data persistence and vector storage
- Docker and Docker Compose support
- Prometheus metrics and Grafana dashboards for monitoring
-
AI & LLM Features
- Long-term memory with mem0ai and pgvector for semantic memory storage
- LLM Service with automatic retry logic using tenacity
- Multiple LLM model support (GPT-4o, GPT-4o-mini, GPT-5, GPT-5-mini, GPT-5-nano)
- Streaming responses for real-time chat interactions
- Tool calling and function execution capabilities
-
Security
- JWT-based authentication
- Session management
- Input sanitization
- CORS configuration
- Rate limiting protection
-
Developer Experience
- Environment-specific configuration with automatic .env file loading
- Comprehensive logging system with context binding
- Clear project structure following best practices
- Type hints throughout for better IDE support
- Easy local development setup with Makefile commands
- Automatic retry logic with exponential backoff for resilience
-
Model Evaluation Framework
- Automated metric-based evaluation of model outputs
- Integration with Langfuse for trace analysis
- Detailed JSON reports with success/failure metrics
- Interactive command-line interface
- Customizable evaluation metrics
- Python 3.13+
- PostgreSQL (see Database setup)
- Docker and Docker Compose (optional)
- Clone the repository:
git clone <repository-url>
cd <project-directory>- Create and activate a virtual environment:
uv sync- Copy the example environment file:
cp .env.example .env.[development|staging|production] # e.g. .env.development- Update the
.envfile with your configuration (see.env.examplefor reference)
- Create a PostgreSQL database (e.g Supabase or local PostgreSQL)
- Update the database connection settings in your
.envfile:
POSTGRES_HOST=db
POSTGRES_PORT=5432
POSTGRES_DB=cool_db
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres- You don't have to create the tables manually, the ORM will handle that for you.But if you faced any issues,please run the
schemas.sqlfile to create the tables manually.
- Install dependencies:
uv sync- Run the application:
make [dev|staging|prod] # e.g. make dev- Go to Swagger UI:
http://localhost:8000/docs- Build and run with Docker Compose:
make docker-build-env ENV=[development|staging|production] # e.g. make docker-build-env ENV=development
make docker-run-env ENV=[development|staging|production] # e.g. make docker-run-env ENV=development- Access the monitoring stack:
# Prometheus metrics
http://localhost:9090
# Grafana dashboards
http://localhost:3000
Default credentials:
- Username: admin
- Password: adminThe Docker setup includes:
- FastAPI application
- PostgreSQL database
- Prometheus for metrics collection
- Grafana for metrics visualization
- Pre-configured dashboards for:
- API performance metrics
- Rate limiting statistics
- Database performance
- System resource usage
The project includes a robust evaluation framework for measuring and tracking model performance over time. The evaluator automatically fetches traces from Langfuse, applies evaluation metrics, and generates detailed reports.
You can run evaluations with different options using the provided Makefile commands:
# Interactive mode with step-by-step prompts
make eval [ENV=development|staging|production]
# Quick mode with default settings (no prompts)
make eval-quick [ENV=development|staging|production]
# Evaluation without report generation
make eval-no-report [ENV=development|staging|production]- Interactive CLI: User-friendly interface with colored output and progress bars
- Flexible Configuration: Set default values or customize at runtime
- Detailed Reports: JSON reports with comprehensive metrics including:
- Overall success rate
- Metric-specific performance
- Duration and timing information
- Trace-level success/failure details
Evaluation metrics are defined in evals/metrics/prompts/ as markdown files:
- Create a new markdown file (e.g.,
my_metric.md) in the prompts directory - Define the evaluation criteria and scoring logic
- The evaluator will automatically discover and apply your new metric
Reports are automatically generated in the evals/reports/ directory with timestamps in the filename:
evals/reports/evaluation_report_YYYYMMDD_HHMMSS.json
Each report includes:
- High-level statistics (total trace count, success rate, etc.)
- Per-metric performance metrics
- Detailed trace-level information for debugging
The application uses a flexible configuration system with environment-specific settings:
.env.development- Local development settings.env.staging- Staging environment settings.env.production- Production environment settings
Key configuration variables include:
# Application
APP_ENV=development
PROJECT_NAME="FastAPI LangGraph Agent"
DEBUG=true
# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=mydb
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
# LLM Configuration
OPENAI_API_KEY=your_openai_api_key
DEFAULT_LLM_MODEL=gpt-4o
DEFAULT_LLM_TEMPERATURE=0.7
MAX_TOKENS=4096
# Long-Term Memory
LONG_TERM_MEMORY_COLLECTION_NAME=agent_memories
LONG_TERM_MEMORY_MODEL=gpt-4o-mini
LONG_TERM_MEMORY_EMBEDDER_MODEL=text-embedding-3-small
# Observability
LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com
# Security
SECRET_KEY=your_secret_key_here
ACCESS_TOKEN_EXPIRE_MINUTES=30
# Rate Limiting
RATE_LIMIT_ENABLED=trueThe application includes a sophisticated long-term memory system powered by mem0ai and pgvector:
- Semantic Memory Storage: Stores and retrieves memories based on semantic similarity
- User-Specific Memories: Each user has their own isolated memory space
- Automatic Memory Management: Memories are automatically extracted, stored, and retrieved
- Vector Search: Uses pgvector for efficient similarity search
- Configurable Models: Separate models for memory processing and embeddings
- Memory Addition: During conversations, important information is automatically extracted and stored
- Memory Retrieval: Relevant memories are retrieved based on conversation context
- Memory Search: Semantic search finds related memories across conversations
- Memory Updates: Existing memories can be updated as new information becomes available
The LLM service provides robust, production-ready language model interactions with automatic retry logic and multiple model support.
- Multiple Model Support: Pre-configured support for GPT-4o, GPT-4o-mini, GPT-5, and GPT-5 variants
- Automatic Retries: Uses tenacity for exponential backoff retry logic
- Reasoning Configuration: GPT-5 models support configurable reasoning effort levels
- Environment-Specific Tuning: Different parameters for development vs production
- Fallback Mechanisms: Graceful degradation when primary models fail
| Model | Use Case | Reasoning Effort |
|---|---|---|
| gpt-5 | Complex reasoning tasks | Medium |
| gpt-5-mini | Balanced performance | Low |
| gpt-5-nano | Fast responses | Minimal |
| gpt-4o | Production workloads | N/A |
| gpt-4o-mini | Cost-effective tasks | N/A |
- Automatically retries on API timeouts, rate limits, and temporary errors
- Max Attempts: 3
- Wait Strategy: Exponential backoff (1s, 2s, 4s)
- Logging: All retry attempts are logged with context
The application uses structlog for structured, contextual logging with automatic request tracking.
- Structured Logging: All logs are structured with consistent fields
- Request Context: Automatic binding of request_id, session_id, and user_id
- Environment-Specific Formatting: JSON in production, colored console in development
- Performance Tracking: Automatic logging of request duration and status
- Exception Tracking: Full stack traces with context preservation
Every request automatically gets:
- Unique request ID
- Session ID (if authenticated)
- User ID (if authenticated)
- Request path and method
- Response status and duration
- Event Names: lowercase_with_underscores
- No F-Strings: Pass variables as kwargs for proper filtering
- Context Binding: Always include relevant IDs and context
- Appropriate Levels: debug, info, warning, error, exception
The application uses uvloop for enhanced async performance (automatically enabled via Makefile):
Performance Improvements:
- 2-4x faster asyncio operations
- Lower latency for I/O-bound tasks
- Better connection pool management
- Reduced CPU usage for concurrent requests
- Database: Async connection pooling with configurable pool size
- LangGraph Checkpointing: Shared connection pool for state persistence
- Redis (optional): Connection pool for caching
- Only successful responses are cached
- Configurable TTL based on data volatility
- Cache invalidation on updates
- Supports Redis or in-memory caching
POST /api/v1/auth/register- Register a new userPOST /api/v1/auth/login- Authenticate and receive JWT tokenPOST /api/v1/auth/logout- Logout and invalidate session
POST /api/v1/chatbot/chat- Send message and receive responsePOST /api/v1/chatbot/chat/stream- Send message with streaming responseGET /api/v1/chatbot/history- Get conversation historyDELETE /api/v1/chatbot/history- Clear chat history
GET /health- Health check with database statusGET /metrics- Prometheus metrics endpoint
For detailed API documentation, visit /docs (Swagger UI) or /redoc (ReDoc) when running the application.
whatsapp-food-order/
βββ app/
β βββ api/
β β βββ v1/
β β βββ auth.py # Authentication endpoints
β β βββ chatbot.py # Chat endpoints
β β βββ api.py # API router aggregation
β βββ core/
β β βββ config.py # Configuration management
β β βββ logging.py # Logging setup
β β βββ metrics.py # Prometheus metrics
β β βββ middleware.py # Custom middleware
β β βββ limiter.py # Rate limiting
β β βββ langgraph/
β β β βββ graph.py # LangGraph agent
β β β βββ tools.py # Agent tools
β β βββ prompts/
β β βββ __init__.py # Prompt loader
β β βββ system.md # System prompts
β βββ models/
β β βββ user.py # User model
β β βββ session.py # Session model
β βββ schemas/
β β βββ auth.py # Auth schemas
β β βββ chat.py # Chat schemas
β β βββ graph.py # Graph state schemas
β βββ services/
β β βββ database.py # Database service
β β βββ llm.py # LLM service with retries
β βββ utils/
β β βββ __init__.py
β β βββ graph.py # Graph utility functions
β βββ main.py # Application entry point
βββ evals/
β βββ evaluator.py # Evaluation logic
β βββ main.py # Evaluation CLI
β βββ metrics/
β β βββ prompts/ # Evaluation metric definitions
β βββ reports/ # Generated evaluation reports
βββ grafana/ # Grafana dashboards
βββ prometheus/ # Prometheus configuration
βββ scripts/ # Utility scripts
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Application Docker image
βββ Makefile # Development commands
βββ pyproject.toml # Python dependencies
βββ schema.sql # Database schema
βββ SECURITY.md # Security policy
βββ README.md # This file
For security concerns, please review our Security Policy.
This project is licensed under the terms specified in the LICENSE file.
Contributions are welcome! Please ensure:
- Code follows the project's coding standards
- All tests pass
- New features include appropriate tests
- Documentation is updated
- Commit messages follow conventional commits format
For issues, questions, or contributions, please open an issue on the project repository