RCTA Backend - Single-Cell RNA Analysis API

FastAPI backend for Rare Cell Type Analysis (RCTA) — Novel and Rare cell detection pipeline.

Features

Dataset Management: Upload and manage .h5ad files; datasets >50k cells are automatically subsampled to 10% (stratified by cell type, matching notebook Cell 6)
RCTA Novel (Figure 2A): KNN neighbourhood enrichment for unsupervised novel population detection
RCTA Rare (Figure 2B): Centroid-based deviation scoring with iterative refinement and bootstrap validation
4-Model Validation: Side-by-side comparison of RCTA Rare, RCTA Novel, CIARA, and scNovel with AUROC/F1 metrics
Optuna Optimisation: TPE hyperparameter search for all four models
Preprocessing Pipeline: Matches notebook Cell 13 exactly — filter_genes → normalize → log1p → top-10k HVGs by variance → z-score → sklearn PCA(50) → sklearn KNN(15) → UMAP → Leiden(0.8)
AI Agent: LangChain-powered analysis insights
Async Processing: Background job execution with progress tracking

Installation

1. Create Virtual Environment

cd rcta-app
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

Running the Server

Important: Always run from the rcta-app/ directory on port 8000 (the Angular frontend expects port 8000).

cd rcta-app
source venv/bin/activate

# Development mode (auto-reload)
python app.py --dev --port 8000

# Or via uvicorn directly
uvicorn app:app --reload --host 0.0.0.0 --port 8000

Production Mode

uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4

The API will be available at: http://localhost:8000

API Documentation

Once the server is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Directory Structure

rcta-app/
├── app.py                            # Main FastAPI application
├── requirements.txt                  # Python dependencies
├── services/
│   ├── __init__.py
│   ├── preprocessing_service.py      # Full preprocessing pipeline (notebook Cell 13)
│   ├── novel_cell_detection_service.py  # RCTA Novel (Figure 2A)
│   ├── rare_cell_detection_service.py   # RCTA Rare (Figure 2B)
│   ├── validation_service.py         # 4-model comparison (RCTA/CIARA/scNovel)
│   ├── optuna_service.py             # Hyperparameter optimisation
│   ├── visualization_service.py      # Plot generation
│   ├── ai_agent_service.py           # AI agent tools
│   └── file_service.py              # Upload + 10% subsampling
├── uploads/                          # Original uploaded files
├── temp_data/                        # Processing intermediates
└── visualizations/                   # Generated plots

API Endpoints

Health Check

GET / — API status
GET /health — Detailed service health

Dataset Management

POST /api/datasets/upload — Upload .h5ad file (auto-subsamples >50k cells)
GET /api/datasets — List uploaded datasets
GET /api/datasets/{dataset_id} — Get dataset metadata
DELETE /api/datasets/{dataset_id} — Delete dataset

Preprocessing

POST /api/preprocess/filter — QC cell filtering preview (does not run full pipeline)

Analysis (full RCTA pipeline)

POST /api/analysis/run — Async: preprocess → RCTA Novel → RCTA Rare → synthesise results
GET /api/analysis/job/{job_id} — Poll job status
GET /api/analysis/job/{job_id}/results — Get completed results

Novel & Rare (standalone)

POST /api/analysis/novel-cells — Run RCTA Novel on a preprocessed .h5ad
POST /api/analysis/rare-cells — Run RCTA Rare on a preprocessed .h5ad

4-Model Validation

GET /api/validation/datasets — List preprocessed .h5ad files available for validation (newest first)
POST /api/validation/run — Async: run RCTA Rare / RCTA Novel / CIARA / scNovel comparison
GET /api/validation/job/{job_id} — Poll validation job
GET /api/validation/job/{job_id}/results — Get validation metrics

Optimisation (Optuna)

POST /api/optimization/run — Async: Optuna TPE search over all model hyperparameters
GET /api/optimization/job/{job_id} — Poll optimisation job
GET /api/optimization/job/{job_id}/results — Get best hyperparameters

AI Agent

POST /api/ai/preprocessing-recommendations — Recommend QC parameters
POST /api/ai/interpret-qc — Interpret QC metrics
POST /api/ai/interpret-uncertainty — Interpret uncertainty analysis
POST /api/ai/cell-type-context — Biological context for cell types
POST /api/ai/workflow-recommendations — Full workflow guidance
POST /api/agent/chat — Chat with AI

Usage Example

1. Upload Dataset

import requests, time

r = requests.post(
    "http://localhost:8000/api/datasets/upload",
    files={"file": open("my_data.h5ad", "rb")}
)
dataset_id = r.json()["dataset_id"]
print(f"Dataset ID: {dataset_id}")

2. (Optional) QC Filter Preview

r = requests.post("http://localhost:8000/api/preprocess/filter", json={
    "dataset_id": dataset_id,
    "preprocessing": {
        "min_count_rna": 200, "max_count_rna": 5000,
        "min_feature_rna": 200, "max_feature_rna": 2500,
        "min_percent_mt": 0, "max_percent_mt": 20,
        "run_marker_validation": False
    }
})
print(r.json())  # filtered_cells, filtered_genes, cells_removed, ...

3. Run Full RCTA Analysis

r = requests.post("http://localhost:8000/api/analysis/run", json={
    "dataset_id": dataset_id,
    "preprocessing": {
        "min_count_rna": 200, "max_count_rna": 5000,
        "min_feature_rna": 200, "max_feature_rna": 2500,
        "min_percent_mt": 0, "max_percent_mt": 20,
        "run_marker_validation": False
    }
})
job_id = r.json()["job_id"]

# Poll until complete
while True:
    status = requests.get(f"http://localhost:8000/api/analysis/job/{job_id}").json()
    print(f"Status: {status['status']}  {status['progress']}%")
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(3)

results = requests.get(f"http://localhost:8000/api/analysis/job/{job_id}/results").json()
print(f"Clusters: {list(results['predictions']['cell_type_counts'].keys())}")
print(f"Novel cells: {results['novel_cells']['novel_cells_count']}")
print(f"Rare cells:  {results['rare_cells']['rare_cells_count']}")
print(f"Preprocessed file: {results['preprocessing']['adata_path']}")

4. Run 4-Model Validation

# List available preprocessed files (newest first)
files = requests.get("http://localhost:8000/api/validation/datasets").json()["datasets"]
adata_path = files[0]["path"]  # most recently created

r = requests.post("http://localhost:8000/api/validation/run", json={"adata_path": adata_path})
job_id = r.json()["job_id"]

while True:
    status = requests.get(f"http://localhost:8000/api/validation/job/{job_id}").json()
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(3)

val = requests.get(f"http://localhost:8000/api/validation/job/{job_id}/results").json()
print(val)

Troubleshooting

Port Already in Use

lsof -ti:8000 | xargs kill -9

CORS Issues

Ensure Angular dev server (http://localhost:4200) is in the allowed origins list in app.py (already set by default).

"No preprocessed files found" in the UI

Upload your .h5ad file using the file picker
Click Run RCTA Analysis — this runs the full pipeline and saves a *_preprocessed.h5ad file
The "Preprocessed Dataset" dropdown refreshes automatically after analysis completes
You can also click the ↻ button to manually refresh the list

File not found errors

Always start the server from the rcta-app/ directory so that temp_data/ and uploads/ resolve correctly:

cd /path/to/rcta-app
python app.py --dev --port 8000

Production Deployment

Use Gunicorn/Uvicorn workers
Implement Redis for job queue instead of in-memory jobs dict
Add authentication/authorization
Use S3 or object storage for datasets
Implement rate limiting
Add structured logging and monitoring

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
services		services
.gitignore		.gitignore
README.md		README.md
app.py		app.py
celltypist_tutorial_ml.ipynb		celltypist_tutorial_ml.ipynb
inspect_demo_data.py		inspect_demo_data.py
main.py		main.py
requirements.txt		requirements.txt
run_ground_truth_analysis.py		run_ground_truth_analysis.py
start.sh		start.sh
test_services.py		test_services.py
tools.py		tools.py

Folders and files

Latest commit

History

Repository files navigation

RCTA Backend - Single-Cell RNA Analysis API

Features

Installation

1. Create Virtual Environment

2. Install Dependencies

Running the Server

Production Mode

API Documentation

Directory Structure

API Endpoints

Health Check

Dataset Management

Preprocessing

Analysis (full RCTA pipeline)

Novel & Rare (standalone)

4-Model Validation

Optimisation (Optuna)

AI Agent

Usage Example

1. Upload Dataset

2. (Optional) QC Filter Preview

3. Run Full RCTA Analysis

4. Run 4-Model Validation

Troubleshooting

Port Already in Use

CORS Issues

"No preprocessed files found" in the UI

File not found errors

Production Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages