SynaptiScan is a comprehensive, AI-powered screening application designed to analyze biomarkers associated with Parkinson's Disease (PD). It leverages a combination of multiple machine-learning models to evaluate voice acoustics, keystroke dynamics, mouse kinematics, rest tremor characteristics, and handwriting (spiral drawing) patterns to generate a comprehensive risk assessment score.
Click to view screenshots
![]() Landing Page |
![]() Dashboard |
![]() Health Dashboard |
![]() Cognitive Test |
![]() Voice Test |
![]() Drawing Test |
![]() Keystroke Test |
![]() Mouse Test |
![]() Tremor Test |
|
- Multi-Modal Assessment: Combines six separate biomarker testsβVoice, Keystroke, Mouse, Tremor, Handwriting, and Cognition.
- Robust Anti-Spam & Validation: Uses intelligent thresholds (e.g., cursor speed, duration) and integrates
faster-whisperfor strict voice evaluation, along with validation checks across handwriting and cognition tests, to prevent anomalous or fraudulent test submissions. - Real-Time Biomarker Extraction: Uses advanced techniques like webcam-based spatial tracking (Mediapipe), audio processing, and fine-motor kinematic tracking via the browser.
- Predictive ML Pipelines: Machine learning models trained on robust clinical datasets utilizing advanced class-balancing (SMOTE) and probabilistic calibrations.
- Comprehensive Dashboard: Interactive data visualization of assessment results using React and Recharts.
- Framework: React 19 with Vite
- Routing: React Router
- Styling: Tailwind CSS v4
- Animations: Framer Motion
- Icons: Lucide React
- Data Visualization: Recharts
- Network Requests: Axios
- Framework: FastAPI (Python 3.12+)
- Server: Uvicorn
- Database & ORM: PostgreSQL / SQLite with SQLAlchemy
- Authentication: JWT (JSON Web Tokens) with Passlib & bcrypt
- Machine Learning Algorithms: Scikit-Learn, XGBoost, PyTorch, Imbalanced-learn (SMOTE)
- Audio & Signal Processing: Praat-Parselmouth (acoustic extraction), Hugging Face Faster-Whisper (speech verification)
- Computer Vision & Tracking: OpenCV Headless, MediaPipe (client-side pose/hand land-marking)
- Data Manipulation: Pandas, NumPy, SciPy
The following diagram illustrates the complete end-to-end data pipeline from the moment a user begins a test to when the risk score is surfaced on their dashboard.
graph TD
classDef frontend fill:#3b82f6,stroke:#1d4ed8,stroke-width:2px,color:#fff;
classDef backend fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff;
classDef model fill:#8b5cf6,stroke:#6d28d9,stroke-width:2px,color:#fff;
classDef database fill:#f59e0b,stroke:#b45309,stroke-width:2px,color:#fff;
subgraph "Client (Frontend UI)"
UI_V[Voice Test]:::frontend
UI_K[Keystroke Test]:::frontend
UI_M[Mouse Test]:::frontend
UI_T[Tremor Test]:::frontend
UI_H[Handwriting]:::frontend
UI_C[Cognition]:::frontend
DASH[Dashboard Overview]:::frontend
end
subgraph "Server (FastAPI Backend)"
API[Ingestion API Endpoints]:::backend
subgraph Processing [Data Pipeline & ML]
SPAM[Anti-Spam Filter <br/> e.g., Whisper, Kinematic limits]:::backend
EXTRACT[Feature Extraction Engine <br/> Praat, NumPy, SciPy]:::backend
EVAL[ML Prediction Ensembles <br/> RF, XGBoost, SVM]:::model
end
DB[(PostgreSQL / SQLite <br/> Session Data)]:::database
end
UI_V & UI_K & UI_M & UI_T & UI_H & UI_C --->|Raw Sensor Data & WebM Blobs| API
API --> SPAM
SPAM -->|Validated Inputs| EXTRACT
EXTRACT -->|Computed Feature Vectors| EVAL
EVAL -->|Probabilistic Risk Scores| DB
DB -.->|Trend Analysis & Queries| DASH
SynaptiScan relies on six specifically calibrated models to evaluate the user's inputs. Due to the imbalanced nature of clinical datasets, most models leverage SMOTE (Synthetic Minority Over-sampling Technique) to establish balanced priors. The primary classification algorithm used across most tests is a Soft-Voting Ensemble comprising Random Forest, Gradient Boosting (GBM), eXtreme Gradient Boosting (XGBoost), and Support Vector Machines (SVM) wrapped with Isotonic Calibration to output true probabilistic risk scores rather than binary classifications.
Analyzes vocal tremors, phonation stability, and micro-fluctuations in speech.
- Dataset: UCI Parkinson's Disease Dataset (195 recordings).
- Extracted Features (16 MDVP Features):
- Pitch Metrics:
MDVP:Fo(Hz)(Average),MDVP:Fhi(Hz)(Maximum),MDVP:Flo(Hz)(Minimum) - Jitter Metrics:
MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Shimmer Metrics:
MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Tonal/Noise Ratios:
NHR(Noise-to-Harmonics),HNR(Harmonics-to-Noise)
- Pitch Metrics:
- Algorithm: SMOTE + Calibrated Soft-Voting Ensemble (Random Forest + GBM + XGBoost + SVM).
- Validation: Utilizes
faster-whisperfor real-time transcription validation to ensure the submitted audio correctly matches the prompted sentence, filtering out unintelligible or spam recordings.
Evaluates typing hesitation, dwell times, and flight times which correlate to bradykinesia and muscle rigidity.
- Dataset: PhysioNet Tappy Dataset (227 participants, ~200MB keystroke log data).
- Extracted Features (8 Features):
mean_dwell_time,std_dwell_time,dwell_iqr(Millisecond durations a key is depressed)mean_flight_time,std_flight_time,flight_iqr(Millisecond gaps between key releases and subsequent presses)typing_speed(Characters per second)error_rate(Backspace usage ratio)
- Algorithm: SMOTE + Calibrated Ensemble. Outputs are probabilistically corrected via Bayes Theorem to account for general-population screening priors (conservative 5% threshold).
Measures fine-motor control, velocity jitter, and directional changes via mouse movements.
- Dataset: ALAMEDA Accelerometer Dataset (Physiologically mapped continuously to 2D screen tracking).
- Extracted Features (11 Features):
- Spatial:
path_length(Total pixels traversed),direction_changes(X/Y velocity zero-crossings) - Temporal:
movement_time,average_velocity,velocity_jitter - Kinematic Moments:
mean_magnitude,variance,skewness,kurtosis - PCA Variants:
pc1_rms,pc1_std
- Spatial:
- Algorithm: SMOTE + Ensemble Predictors (Random Forest + GBM + XGBoost + SVM).
Quantifies rest tremors via webcam feed tracking localized hand landmarks.
- Dataset: ALAMEDA Accelerometer Dataset (Translating 3D positional shift into spectral properties).
- Extracted Features (8 Custom Frequency-Domain Features):
- Frequency Analysis:
peak_frequency_hz(Dominant FFT band between 3-12Hz),spectral_entropy,pc1_dom_freq,pc1_entropy - Power Distribution:
amplitude_mean(Signal amplitude),total_power,power_at_dom_freq,fft_rms(Root-mean-square of the FFT spectrum)
- Frequency Analysis:
- Algorithm: SMOTE + Ensemble Predictors. Integrates MediaPipe Tasks Vision (
hand_landmarker.task) locally for precise wrist displacement tracking before securely evaluating physiological tremor frequency derivatives on the backend.
Assesses micrographia and non-smooth drawing patterns typical of PD patients.
- Dataset: Shubhamjha97 Parkinson's Spirals/Meander kinematic dataset (77 clinical recordings).
- Extracted Features (15 Normalised Rate Features):
- Speed & Magnitude:
speed_st,speed_dy,magnitude_vel_st,magnitude_vel_dy - Acceleration & Jerk:
magnitude_acc_st,magnitude_acc_dy,magnitude_jerk_st,magnitude_jerk_dy - Vector Fluctuation:
ncv_st,ncv_dy(Number of Changes in Velocity),nca_st,nca_dy(Number of Changes in Acceleration) - Timings:
in_air_stcp(Pen-up time),on_surface_st,on_surface_dy(Drawing time)
- Speed & Magnitude:
- Algorithm: SMOTE + Isotonically Calibrated Gradient Boosting Classifier (GBM). Adjusts
ncv/ncavalues from variable dataset rates to standard per-second browser polling rates (~60 Hz).
Evaluates executive dysfunction and delayed reaction times using a web-based Stroop task.
- Dataset: High-fidelity simulated clinical dataset (100,000 algorithmic profiles mapping clinical Gaussian mixtures mapped to non-linear noise distributions).
- Extracted Features (4 Features):
congruent_rt_mean(Average ms latency for matching colors)incongruent_rt_mean(Average ms latency for mismatched text/colors)stroop_effect(Interference delay delta between incongruent and congruent)error_rate(Accuracy of tests)
- Algorithm: SMOTE + Isotonically Calibrated XGBoost Classifier (tuned via GridSearchCV).
- Validation: Implements bounds constraints and spam detection through accuracy thresholds and minimal response times to invalidate random clicking.
Across all six screening modalities, SynaptiScan's ensemble models demonstrate high sensitivity and specificity. The following metrics represent performance on held-out test sets from a combination of clinical datasets (UCI, PhysioNet, Zenodo) and high-fidelity clinical-distribution simulations.
| Assessment Mode | Accuracy | ROC-AUC | Sensitivity (Recall) | F1-Score (PD) |
|---|---|---|---|---|
| ποΈ Voice Acoustics | 74.0% | 0.830 | 78.4% | 0.817 |
| β¨οΈ Keystroke Dynamics | 99.4% | 0.99 | 98.8% | 0.994 |
| π±οΈ Mouse Kinematics | 98.0% | 0.98 | 98.1% | 0.981 |
| 𫨠Rest Tremor | 76.0% | 0.856 | 78.8% | 0.774 |
| βοΈ Handwriting | 96.7% | 0.98 | 96.7% | 0.967 |
| π§ Cognitive (Stroop) | 93.2% | 0.971 | 86.7% | 0.794 |
The voice model achieves a strong balance between identifying healthy controls and PD patients, with realistic overlap handling.
- Precision (PD): 85%
- Recall (PD): 78%
- Healthy F1: 0.55
Evaluated on the PhysioNet Tappy and ALAMEDA distributions, these models leverage SMOTE to handle class imbalance, resulting in near-perfect separation on kinematic features like velocity jitter and dwell-time variance.
The XGBoost ensemble for cognitive screening handles the non-linear overlap between elderly healthy controls and early-stage PD patients.
- ROC-AUC: 0.971
- PD F1-Score: 0.794
- Precision (PD): 73.2%
Tip
All models are wrapped with Isotonic Calibration, ensuring that the probability scores surfaced in the results dashboard correspond to actual clinical risk frequencies.
- Node.js (v18 or higher)
- Python 3.12+
uvpackage manager (recommended for backend)
- Navigate to the backend directory:
cd backend - Create a
.envfile in thebackenddirectory. Example:PORT=8000 CLIENT_URL=http://localhost:5173 DATABASE_URL=sqlite:///./synaptiscan.db SECRET_KEY=your_secret_key_here
- Install dependencies (this creates a
.venvusinguv.lock):uv sync
- Activate the virtual environment (optional if using
uv run):source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Run the model training pipeline to generate the models:
uv run python app/ml/training/train_models.py
- Start the FastAPI server (in development mode):
The API will be available at
uv run fastapi dev app/main.py --port 8000
http://localhost:8000
- Navigate to the frontend directory:
cd frontend - Create a
.envfile in thefrontenddirectory:VITE_API_URL=http://localhost:8000/api
- Install dependencies:
npm install
- Start the development server:
The application will be accessible at
npm run dev
http://localhost:5173








