Skip to content

vtnphan/rcta-app

Repository files navigation

RCTA Backend - Single-Cell RNA Analysis API

FastAPI backend for Rare Cell Type Analysis (RCTA) — Novel and Rare cell detection pipeline.

Features

  • Dataset Management: Upload and manage .h5ad files; datasets >50k cells are automatically subsampled to 10% (stratified by cell type, matching notebook Cell 6)
  • RCTA Novel (Figure 2A): KNN neighbourhood enrichment for unsupervised novel population detection
  • RCTA Rare (Figure 2B): Centroid-based deviation scoring with iterative refinement and bootstrap validation
  • 4-Model Validation: Side-by-side comparison of RCTA Rare, RCTA Novel, CIARA, and scNovel with AUROC/F1 metrics
  • Optuna Optimisation: TPE hyperparameter search for all four models
  • Preprocessing Pipeline: Matches notebook Cell 13 exactly — filter_genes → normalize → log1p → top-10k HVGs by variance → z-score → sklearn PCA(50) → sklearn KNN(15) → UMAP → Leiden(0.8)
  • AI Agent: LangChain-powered analysis insights
  • Async Processing: Background job execution with progress tracking

Installation

1. Create Virtual Environment

cd rcta-app
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

Running the Server

Important: Always run from the rcta-app/ directory on port 8000 (the Angular frontend expects port 8000).

cd rcta-app
source venv/bin/activate

# Development mode (auto-reload)
python app.py --dev --port 8000

# Or via uvicorn directly
uvicorn app:app --reload --host 0.0.0.0 --port 8000

Production Mode

uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4

The API will be available at: http://localhost:8000

API Documentation

Once the server is running, visit:

Directory Structure

rcta-app/
├── app.py                            # Main FastAPI application
├── requirements.txt                  # Python dependencies
├── services/
│   ├── __init__.py
│   ├── preprocessing_service.py      # Full preprocessing pipeline (notebook Cell 13)
│   ├── novel_cell_detection_service.py  # RCTA Novel (Figure 2A)
│   ├── rare_cell_detection_service.py   # RCTA Rare (Figure 2B)
│   ├── validation_service.py         # 4-model comparison (RCTA/CIARA/scNovel)
│   ├── optuna_service.py             # Hyperparameter optimisation
│   ├── visualization_service.py      # Plot generation
│   ├── ai_agent_service.py           # AI agent tools
│   └── file_service.py              # Upload + 10% subsampling
├── uploads/                          # Original uploaded files
├── temp_data/                        # Processing intermediates
└── visualizations/                   # Generated plots

API Endpoints

Health Check

  • GET / — API status
  • GET /health — Detailed service health

Dataset Management

  • POST /api/datasets/upload — Upload .h5ad file (auto-subsamples >50k cells)
  • GET /api/datasets — List uploaded datasets
  • GET /api/datasets/{dataset_id} — Get dataset metadata
  • DELETE /api/datasets/{dataset_id} — Delete dataset

Preprocessing

  • POST /api/preprocess/filter — QC cell filtering preview (does not run full pipeline)

Analysis (full RCTA pipeline)

  • POST /api/analysis/run — Async: preprocess → RCTA Novel → RCTA Rare → synthesise results
  • GET /api/analysis/job/{job_id} — Poll job status
  • GET /api/analysis/job/{job_id}/results — Get completed results

Novel & Rare (standalone)

  • POST /api/analysis/novel-cells — Run RCTA Novel on a preprocessed .h5ad
  • POST /api/analysis/rare-cells — Run RCTA Rare on a preprocessed .h5ad

4-Model Validation

  • GET /api/validation/datasets — List preprocessed .h5ad files available for validation (newest first)
  • POST /api/validation/run — Async: run RCTA Rare / RCTA Novel / CIARA / scNovel comparison
  • GET /api/validation/job/{job_id} — Poll validation job
  • GET /api/validation/job/{job_id}/results — Get validation metrics

Optimisation (Optuna)

  • POST /api/optimization/run — Async: Optuna TPE search over all model hyperparameters
  • GET /api/optimization/job/{job_id} — Poll optimisation job
  • GET /api/optimization/job/{job_id}/results — Get best hyperparameters

AI Agent

  • POST /api/ai/preprocessing-recommendations — Recommend QC parameters
  • POST /api/ai/interpret-qc — Interpret QC metrics
  • POST /api/ai/interpret-uncertainty — Interpret uncertainty analysis
  • POST /api/ai/cell-type-context — Biological context for cell types
  • POST /api/ai/workflow-recommendations — Full workflow guidance
  • POST /api/agent/chat — Chat with AI

Usage Example

1. Upload Dataset

import requests, time

r = requests.post(
    "http://localhost:8000/api/datasets/upload",
    files={"file": open("my_data.h5ad", "rb")}
)
dataset_id = r.json()["dataset_id"]
print(f"Dataset ID: {dataset_id}")

2. (Optional) QC Filter Preview

r = requests.post("http://localhost:8000/api/preprocess/filter", json={
    "dataset_id": dataset_id,
    "preprocessing": {
        "min_count_rna": 200, "max_count_rna": 5000,
        "min_feature_rna": 200, "max_feature_rna": 2500,
        "min_percent_mt": 0, "max_percent_mt": 20,
        "run_marker_validation": False
    }
})
print(r.json())  # filtered_cells, filtered_genes, cells_removed, ...

3. Run Full RCTA Analysis

r = requests.post("http://localhost:8000/api/analysis/run", json={
    "dataset_id": dataset_id,
    "preprocessing": {
        "min_count_rna": 200, "max_count_rna": 5000,
        "min_feature_rna": 200, "max_feature_rna": 2500,
        "min_percent_mt": 0, "max_percent_mt": 20,
        "run_marker_validation": False
    }
})
job_id = r.json()["job_id"]

# Poll until complete
while True:
    status = requests.get(f"http://localhost:8000/api/analysis/job/{job_id}").json()
    print(f"Status: {status['status']}  {status['progress']}%")
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(3)

results = requests.get(f"http://localhost:8000/api/analysis/job/{job_id}/results").json()
print(f"Clusters: {list(results['predictions']['cell_type_counts'].keys())}")
print(f"Novel cells: {results['novel_cells']['novel_cells_count']}")
print(f"Rare cells:  {results['rare_cells']['rare_cells_count']}")
print(f"Preprocessed file: {results['preprocessing']['adata_path']}")

4. Run 4-Model Validation

# List available preprocessed files (newest first)
files = requests.get("http://localhost:8000/api/validation/datasets").json()["datasets"]
adata_path = files[0]["path"]  # most recently created

r = requests.post("http://localhost:8000/api/validation/run", json={"adata_path": adata_path})
job_id = r.json()["job_id"]

while True:
    status = requests.get(f"http://localhost:8000/api/validation/job/{job_id}").json()
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(3)

val = requests.get(f"http://localhost:8000/api/validation/job/{job_id}/results").json()
print(val)

Troubleshooting

Port Already in Use

lsof -ti:8000 | xargs kill -9

CORS Issues

Ensure Angular dev server (http://localhost:4200) is in the allowed origins list in app.py (already set by default).

"No preprocessed files found" in the UI

  1. Upload your .h5ad file using the file picker
  2. Click Run RCTA Analysis — this runs the full pipeline and saves a *_preprocessed.h5ad file
  3. The "Preprocessed Dataset" dropdown refreshes automatically after analysis completes
  4. You can also click the ↻ button to manually refresh the list

File not found errors

Always start the server from the rcta-app/ directory so that temp_data/ and uploads/ resolve correctly:

cd /path/to/rcta-app
python app.py --dev --port 8000

Production Deployment

  • Use Gunicorn/Uvicorn workers
  • Implement Redis for job queue instead of in-memory jobs dict
  • Add authentication/authorization
  • Use S3 or object storage for datasets
  • Implement rate limiting
  • Add structured logging and monitoring

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors