Skip to content

OGsiji/data-contract-validator

Repository files navigation

πŸ›‘οΈ Data Contract Validator

Prevent production API breaks by validating data contracts between your data pipelines and API frameworks

PyPI version Tests License: MIT

🎯 What This Solves

Ever deployed a DBT model change only to break your FastAPI in production? This tool prevents that by validating data contracts between your data pipelines and APIs before deployment.

DBT Models          Contract           FastAPI Models
(What data          Validator          (What APIs
 produces)          ↕️ VALIDATES ↕️      expect)
     ↓                   ↓                   ↓
   Schema              Finds              Schema
 Extraction          Mismatches         Extraction

⚑ Quick Start

Installation

pip install data-contract-validator

30-Second Setup

# 1. Initialize in your project
contract-validator init --interactive

# 2. Test setup
contract-validator test

# 3. Validate contracts
contract-validator validate

# 4. Commit and push - you're protected! πŸ›‘οΈ

Basic Usage

# Validate local DBT project against FastAPI models
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/models.py

# Validate across repositories (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api-repo" \
  --fastapi-path "app/models.py"

πŸ” Real Example: Production Validation

Actual output from a production analytics project:

$ contract-validator validate

πŸ” Starting contract validation...
πŸ“Š Extracting source schemas...
   βœ… Found 14 DBT models (user_analytics_summary: 54 columns)
🎯 Extracting target schemas...  
   βœ… Found 3 FastAPI models
πŸ” Validating schema compatibility...

πŸ›‘οΈ Results:
βœ… PASSED - 0 critical issues (no production breaks!)
⚠️  42 warnings (type mismatches to review)

Issues caught:
⚠️  user_analytics_summary.age_years: source 'varchar' vs target 'integer'
⚠️  user_analytics_summary.is_verified: source 'varchar' vs target 'boolean'
⚠️  user_analytics_summary.user_created_at: source 'varchar' vs target 'timestamp'

πŸŽ‰ Your API contracts are protected!

🚨 What It Prevents

Before Data Contract Validation:

-- Analytics team changes DBT model
select
    user_id,
    email,
    -- total_orders,  ❌ REMOVED this column
    revenue
from users
# API team's FastAPI model (unchanged)
class UserAnalytics(BaseModel):
    user_id: str
    email: str
    total_orders: int  # ❌ Still expects this!
    revenue: float

Result: πŸ’₯ Production API breaks, angry customers, 2AM debugging

After Data Contract Validation:

$ git push

❌ VALIDATION FAILED
πŸ’₯ user_analytics.total_orders: FastAPI REQUIRES column but DBT removed it
πŸ”§ Fix: Add 'total_orders' back to DBT model or update FastAPI model

# Push blocked until fixed βœ‹

Result: πŸ›‘οΈ Production protected, issues caught in CI/CD

πŸ› οΈ Pre-commit Integration

Automatic Setup (Recommended)

# Initialize with pre-commit support
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Now every commit validates contracts automatically! πŸ›‘οΈ

Manual Setup

If you prefer manual setup:

  1. Install pre-commit:

    pip install pre-commit
  2. Add to .pre-commit-config.yaml:

    repos:
      - repo: https://github.com/OGsiji/data-contract-validator
        rev: v1.0.0
        hooks:
          - id: contract-validation
            name: Validate Data Contracts
            files: '^(.*models.*\.(sql|py)|\.retl-validator\.yml|dbt_project\.yml)$'
  3. Install hooks:

    pre-commit install

How It Works

$ git add models/user_analytics.sql
$ git commit -m "update user analytics model"

# Pre-commit automatically runs:
πŸ” Validating Data Contracts...
βœ… Contract validation passed
[main abc1234] update user analytics model

On Validation Failure

$ git commit -m "remove important column"

πŸ” Validating Data Contracts...
❌ CRITICAL: user_analytics.total_revenue missing
πŸ’‘ Fix the issue before committing

# Commit blocked until fixed! πŸ›‘οΈ

Skip Validation (Emergency Only)

# Only for emergencies!
git commit -m "emergency fix" --no-verify

Benefits of Pre-commit Integration

  • βœ… Catches issues before they reach CI/CD
  • βœ… Faster feedback loop (seconds, not minutes)
  • βœ… No broken commits in your git history
  • βœ… Team protection - everyone gets validation
  • βœ… Zero configuration after setup

πŸ“¦ GitHub Actions Integration

Add this to .github/workflows/validate-contracts.yml:

name: πŸ›‘οΈ Data Contract Validation

on:
  pull_request:
    paths:
      - 'models/**/*.sql'
      - 'dbt_project.yml'
      - '**/*models*.py'

jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Validate contracts
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        pip install data-contract-validator
        contract-validator validate

Auto-generated when you run contract-validator init!

πŸ”§ Configuration

Auto-Generated Config (.retl-validator.yml)

version: '1.0'
name: 'my-project-contracts'

source:
  dbt:
    project_path: '.'
    auto_compile: true

target:
  fastapi:
    # For GitHub repos
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    
    # For local files
    # type: "local"
    # path: "../my-api/models.py"

validation:
  fail_on: ['missing_tables', 'missing_required_columns']
  warn_on: ['type_mismatches', 'missing_optional_columns']

Command Line Options

contract-validator validate \
  --dbt-project ./dbt-project \           # DBT project path
  --fastapi-repo "org/repo" \             # GitHub repo
  --fastapi-path "app/models.py" \        # Path to models
  --github-token "$GITHUB_TOKEN" \        # For private repos
  --output json                           # json, terminal, github

πŸš€ Supported Frameworks

Data Sources βœ…

  • DBT (all adapters: Snowflake, BigQuery, Redshift, etc.)

API Frameworks βœ…

  • FastAPI (Pydantic + SQLModel)

Coming Soon πŸ”„

🎯 Output Formats

Terminal (Default)

πŸ›‘οΈ Data Contract Validation Results:
Status: βœ… PASSED
Critical: 0 | Warnings: 5

⚠️  Warnings:
  user_analytics.age: Type mismatch (varchar vs integer)
  user_analytics.country: Type mismatch (integer vs varchar)

πŸŽ‰ Your API contracts are protected!

JSON (for CI/CD)

{
  "success": true,
  "critical_issues": 0,
  "warnings": 5,
  "issues": [
    {
      "severity": "warning",
      "table": "user_analytics", 
      "column": "age",
      "message": "Type mismatch: source 'varchar' vs target 'integer'",
      "suggested_fix": "Update target to expect 'varchar' or fix source type"
    }
  ]
}

GitHub Actions

::warning::user_analytics.age: Type mismatch detected
βœ… Contract validation passed - no critical issues

πŸ—οΈ Architecture

Simple Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

# Initialize extractors
dbt = DBTExtractor(project_path='./dbt-project')
fastapi = FastAPIExtractor.from_github_repo('my-org/my-api', 'app/models.py')

# Run validation
validator = ContractValidator(source=dbt, target=fastapi)
result = validator.validate()

if not result.success:
    print(f"❌ {len(result.critical_issues)} critical issues found")
    for issue in result.critical_issues:
        print(f"πŸ’₯ {issue.table}.{issue.column}: {issue.message}")

CLI Interface

# Interactive setup
contract-validator init --interactive

# Test configuration
contract-validator test

# Run validation
contract-validator validate

# Setup pre-commit hooks
contract-validator setup-precommit --install-hooks

# Multiple output formats
contract-validator validate --output json

πŸ”„ Development Workflow

With Pre-commit (Recommended)

# Team workflow with automated validation
git clone your-dbt-project
cd your-dbt-project

# One-time setup for new team members
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Protected development workflow:
# 1. Make changes to DBT models
# 2. git add models/my_model.sql
# 3. git commit -m "update model"  # ← Validation runs here automatically
# 4. If validation passes β†’ commit succeeds
# 5. If validation fails β†’ fix issues first
# 6. git push  # ← CI/CD validation as backup

Manual Workflow

# Traditional workflow
# 1. Make changes
# 2. contract-validator validate  # Manual validation
# 3. git commit
# 4. git push

🀝 Contributing

We welcome contributions! This tool is actively used in production.

Development Setup

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator
pip install -e ".[dev]"
pytest

Adding New Extractors

from retl_validator.extractors import BaseExtractor

class MyFrameworkExtractor(BaseExtractor):
    def extract_schemas(self) -> Dict[str, Schema]:
        # Your implementation
        return schemas

Reporting Issues

πŸ“š Documentation

πŸŽ‰ Real-World Usage

This tool is actively preventing production incidents in:

  • Analytics pipelines with 50+ DBT models
  • Microservices architectures with multiple APIs
  • Data engineering teams using Snowflake, BigQuery, Redshift
  • Cross-repository validation in large organizations

Proven to catch:

  • βœ… Type mismatches (varchar vs integer)
  • βœ… Missing columns (API expects columns DBT doesn't provide)
  • βœ… Schema drift (gradual model changes)
  • βœ… Breaking changes before they reach production

πŸ›‘οΈ Multiple Layers of Protection

  1. Pre-commit hooks: Immediate feedback (fastest)
  2. CI/CD validation: Team protection (backup)
  3. Manual validation: Development testing
  4. Configuration files: Team standards

This creates a comprehensive safety net for your data contracts.

πŸ“„ License

MIT License - see LICENSE for details.

πŸ†˜ Support

⭐ Star the Project

If this tool helps you prevent production incidents, please ⭐ star the repository!


πŸ›‘οΈ Built by data engineers, for data engineers. Stop breaking production with data changes!

πŸš€ Get Started Now

pip install data-contract-validator
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# 2 minutes to production protection with automated validation!

About

An open-source contract validation system that prevents production breaks by ensuring data transformation outputs (DBT, Databricks, etc.) match API expectations (FastAPI, Django, etc.).

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages