FastAPI backend for Rare Cell Type Analysis (RCTA) — Novel and Rare cell detection pipeline.
- Dataset Management: Upload and manage
.h5adfiles; datasets >50k cells are automatically subsampled to 10% (stratified by cell type, matching notebook Cell 6) - RCTA Novel (Figure 2A): KNN neighbourhood enrichment for unsupervised novel population detection
- RCTA Rare (Figure 2B): Centroid-based deviation scoring with iterative refinement and bootstrap validation
- 4-Model Validation: Side-by-side comparison of RCTA Rare, RCTA Novel, CIARA, and scNovel with AUROC/F1 metrics
- Optuna Optimisation: TPE hyperparameter search for all four models
- Preprocessing Pipeline: Matches notebook Cell 13 exactly — filter_genes → normalize → log1p → top-10k HVGs by variance → z-score → sklearn PCA(50) → sklearn KNN(15) → UMAP → Leiden(0.8)
- AI Agent: LangChain-powered analysis insights
- Async Processing: Background job execution with progress tracking
cd rcta-app
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtImportant: Always run from the rcta-app/ directory on port 8000 (the Angular frontend expects port 8000).
cd rcta-app
source venv/bin/activate
# Development mode (auto-reload)
python app.py --dev --port 8000
# Or via uvicorn directly
uvicorn app:app --reload --host 0.0.0.0 --port 8000uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4The API will be available at: http://localhost:8000
Once the server is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
rcta-app/
├── app.py # Main FastAPI application
├── requirements.txt # Python dependencies
├── services/
│ ├── __init__.py
│ ├── preprocessing_service.py # Full preprocessing pipeline (notebook Cell 13)
│ ├── novel_cell_detection_service.py # RCTA Novel (Figure 2A)
│ ├── rare_cell_detection_service.py # RCTA Rare (Figure 2B)
│ ├── validation_service.py # 4-model comparison (RCTA/CIARA/scNovel)
│ ├── optuna_service.py # Hyperparameter optimisation
│ ├── visualization_service.py # Plot generation
│ ├── ai_agent_service.py # AI agent tools
│ └── file_service.py # Upload + 10% subsampling
├── uploads/ # Original uploaded files
├── temp_data/ # Processing intermediates
└── visualizations/ # Generated plots
GET /— API statusGET /health— Detailed service health
POST /api/datasets/upload— Upload.h5adfile (auto-subsamples >50k cells)GET /api/datasets— List uploaded datasetsGET /api/datasets/{dataset_id}— Get dataset metadataDELETE /api/datasets/{dataset_id}— Delete dataset
POST /api/preprocess/filter— QC cell filtering preview (does not run full pipeline)
POST /api/analysis/run— Async: preprocess → RCTA Novel → RCTA Rare → synthesise resultsGET /api/analysis/job/{job_id}— Poll job statusGET /api/analysis/job/{job_id}/results— Get completed results
POST /api/analysis/novel-cells— Run RCTA Novel on a preprocessed.h5adPOST /api/analysis/rare-cells— Run RCTA Rare on a preprocessed.h5ad
GET /api/validation/datasets— List preprocessed.h5adfiles available for validation (newest first)POST /api/validation/run— Async: run RCTA Rare / RCTA Novel / CIARA / scNovel comparisonGET /api/validation/job/{job_id}— Poll validation jobGET /api/validation/job/{job_id}/results— Get validation metrics
POST /api/optimization/run— Async: Optuna TPE search over all model hyperparametersGET /api/optimization/job/{job_id}— Poll optimisation jobGET /api/optimization/job/{job_id}/results— Get best hyperparameters
POST /api/ai/preprocessing-recommendations— Recommend QC parametersPOST /api/ai/interpret-qc— Interpret QC metricsPOST /api/ai/interpret-uncertainty— Interpret uncertainty analysisPOST /api/ai/cell-type-context— Biological context for cell typesPOST /api/ai/workflow-recommendations— Full workflow guidancePOST /api/agent/chat— Chat with AI
import requests, time
r = requests.post(
"http://localhost:8000/api/datasets/upload",
files={"file": open("my_data.h5ad", "rb")}
)
dataset_id = r.json()["dataset_id"]
print(f"Dataset ID: {dataset_id}")r = requests.post("http://localhost:8000/api/preprocess/filter", json={
"dataset_id": dataset_id,
"preprocessing": {
"min_count_rna": 200, "max_count_rna": 5000,
"min_feature_rna": 200, "max_feature_rna": 2500,
"min_percent_mt": 0, "max_percent_mt": 20,
"run_marker_validation": False
}
})
print(r.json()) # filtered_cells, filtered_genes, cells_removed, ...r = requests.post("http://localhost:8000/api/analysis/run", json={
"dataset_id": dataset_id,
"preprocessing": {
"min_count_rna": 200, "max_count_rna": 5000,
"min_feature_rna": 200, "max_feature_rna": 2500,
"min_percent_mt": 0, "max_percent_mt": 20,
"run_marker_validation": False
}
})
job_id = r.json()["job_id"]
# Poll until complete
while True:
status = requests.get(f"http://localhost:8000/api/analysis/job/{job_id}").json()
print(f"Status: {status['status']} {status['progress']}%")
if status["status"] in ("completed", "failed"):
break
time.sleep(3)
results = requests.get(f"http://localhost:8000/api/analysis/job/{job_id}/results").json()
print(f"Clusters: {list(results['predictions']['cell_type_counts'].keys())}")
print(f"Novel cells: {results['novel_cells']['novel_cells_count']}")
print(f"Rare cells: {results['rare_cells']['rare_cells_count']}")
print(f"Preprocessed file: {results['preprocessing']['adata_path']}")# List available preprocessed files (newest first)
files = requests.get("http://localhost:8000/api/validation/datasets").json()["datasets"]
adata_path = files[0]["path"] # most recently created
r = requests.post("http://localhost:8000/api/validation/run", json={"adata_path": adata_path})
job_id = r.json()["job_id"]
while True:
status = requests.get(f"http://localhost:8000/api/validation/job/{job_id}").json()
if status["status"] in ("completed", "failed"):
break
time.sleep(3)
val = requests.get(f"http://localhost:8000/api/validation/job/{job_id}/results").json()
print(val)lsof -ti:8000 | xargs kill -9Ensure Angular dev server (http://localhost:4200) is in the allowed origins list in app.py (already set by default).
- Upload your
.h5adfile using the file picker - Click Run RCTA Analysis — this runs the full pipeline and saves a
*_preprocessed.h5adfile - The "Preprocessed Dataset" dropdown refreshes automatically after analysis completes
- You can also click the ↻ button to manually refresh the list
Always start the server from the rcta-app/ directory so that temp_data/ and uploads/ resolve correctly:
cd /path/to/rcta-app
python app.py --dev --port 8000- Use Gunicorn/Uvicorn workers
- Implement Redis for job queue instead of in-memory
jobsdict - Add authentication/authorization
- Use S3 or object storage for datasets
- Implement rate limiting
- Add structured logging and monitoring