Graph Genome Viewer

⚠️ AI-GENERATED PROTOTYPE ⚠️

This repository was assembled through iterative prompting of AI models (GitHub Copilot/Claude) to sketch a prototype code base for exploration.

NOT FOR PRODUCTION USE - This is a proof-of-concept for exploring pangenome visualization approaches.

Code may contain bugs, inefficiencies, or incorrect implementations

Not validated for scientific accuracy

No warranty or support provided

Use at your own risk for research/educational purposes only

A pangenome visualization tool built in Rust with egui for GPU-accelerated rendering.

AI-generated caveats and review focus

Parser fragility: GFA and GAF parsing defaults missing numbers to 0 and occasionally unwraps orientations (e.g., parse_ref in src/io/gfa.rs, parse_line in src/io/gaf.rs), so malformed input can panic or silently mask bad data.
Walk metadata defaults: GFA walk haplotype/start/end fields fall back to zero on parse failure (parse_walk in src/io/gfa.rs), which can hide upstream data issues.
Stats robustness: Alignment identity stats sort with partial_cmp(...).unwrap() and take a single midpoint for the median (src/analysis/stats.rs), so NaNs will panic and even-length medians are approximate.
Layout expectations: The Hierarchical layout option is unimplemented and currently reuses the Tube Map output (src/app.rs), while the force-directed layout uses random initialization without stabilization for large graphs (src/layout/force.rs), so layouts may be nondeterministic or overlap.

Features

Graph Visualization: Visualizations of graph genomes (GFA/GFA2 format)
Structural Variant View: Clear visualization of SVs (deletions, insertions, inversions, complex events)
Multiple Layout Algorithms:
- Tube Map (linear layout inspired by transit maps)
- Force-Directed (Fruchterman-Reingold algorithm)
Long-Read Alignment Analysis: Load and visualize alignment data (GAF format)
Interactive Navigation: Pan, zoom, and select segments
Color Schemes: Coverage heatmaps, GC content, alignment identity
Cross-Platform: Compiles to a static binary for Linux, macOS, and Windows

Quick Start: NA19240 SV Visualization

To visualize the structural variant representation at chr17:10984564-10993960:

# Run the GUI and load the example file
cargo run --release

# Then drag-and-drop data/na19240_chr17_sv.gfa or use File > Open

Or run the command-line evaluation:

cargo run --example eval_na19240_sv

This will show you the known SVs in the NA19240 sample with an ASCII diagram:

Legend: [REF]=Reference  [DEL]=Deletion  [INS]=Insertion  [INV]=Inversion  [CPX]=Complex

  GRCh38 hap0     │ [===]─[===]─[REF]─[REF]─[REF]─[---]─[===]─[===]  (5520 bp)
  NA19240 hap1    │ [===]─[===]─[DEL]─[---]─[===]─[===]              (3760 bp)
  NA19240 hap2    │ [===]─[===]─[INS]─[REF]─[REF]─[---]─[===]─[===]  (7263 bp)
  HG005 hap1      │ [===]─[===]─[CPX]─<[CPX]>─[REF]─[---]─[===]─[===] (4960 bp)

Building

Prerequisites

Rust 1.70+ (install via rustup)

Build Release Binary

cargo build --release

The binary will be at target/release/graph-genome-viewer.

Build Static Binary (Linux)

rustup target add x86_64-unknown-linux-musl
cargo build --release --target x86_64-unknown-linux-musl

Usage

Running the Application

cargo run --release

Or run the binary directly:

./target/release/graph-genome-viewer

Loading Files

File → Open GFA: Load a graph genome in GFA or GFA2 format
File → Open Alignments (GAF): Load alignment data
Drag & Drop: Drop GFA files directly onto the window

Navigation

Pan: Click and drag
Zoom: Mouse wheel
Select: Click on segments

Keyboard Shortcuts

Ctrl+O: Open file
Ctrl+Q: Quit
R: Reset view
F: Fit to window

Supported Formats

Input

GFA 1.0/1.1/2.0 (graph genomes)
GAF (graph alignments)

Export

PNG (planned)
SVG (planned)

Project Structure

src/
├── main.rs           # Entry point
├── app.rs            # Application state and eframe App impl
├── graph/            # Core graph data structures
│   ├── mod.rs        # GraphGenome, Orientation
│   ├── segment.rs    # Segment (node) type
│   ├── link.rs       # Link (edge) type
│   └── path.rs       # Path type
├── io/               # File parsing
│   ├── gfa.rs        # GFA parser
│   ├── gaf.rs        # GAF parser
│   └── alignment.rs  # Alignment data structures
├── layout/           # Graph layout algorithms
│   ├── tubemap.rs    # Linear tube map layout
│   └── force.rs      # Force-directed layout
├── render/           # Visualization rendering
│   ├── graph_renderer.rs
│   └── colors.rs     # Color palettes
├── analysis/         # Alignment statistics
│   ├── stats.rs      # Alignment statistics
│   └── fit.rs        # Goodness-of-fit metrics
└── ui/               # UI components
    ├── side_panel.rs
    ├── top_menu.rs
    └── status_bar.rs

License

MIT

Testing

Running Tests

# Run all tests
cargo test

# Run specific test module
cargo test --test gfa_tests

# Run with verbose output
cargo test --verbose

Test Data

Sample GFA files are provided in the data/ directory:

simple_bubble.gfa: A simple bubble structure for basic testing
na19240_chr17_sv.gfa: Simulated chr17 structural variant region for NA19240

Data Processing Scripts

Python scripts in scripts/ help process real-world pangenome data.

Setup

cd scripts
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Download Real NA19240 Data

The download_na19240_data.py script downloads real pangenome data from the Human Pangenome Reference Consortium (HPRC) and 1000 Genomes Project.

# Show information about available data sources
python download_na19240_data.py --info

# Download small test files (quick, for testing)
python download_na19240_data.py --output ../data/ --test

# Download HPRC chr17 pangenome data (~500MB-1GB)
python download_na19240_data.py --output ../data/ --chr17

# Download 1000 Genomes structural variant VCF
python download_na19240_data.py --output ../data/ --sv-vcf

Process Pangenome Data

The process_pangenome.py script provides additional processing capabilities:

# Generate sample test data
python process_pangenome.py generate-sample --output ../data/sample.gfa

# Generate realistic chr17 structural variant data
python process_pangenome.py generate-realistic --output ../data/chr17_sv.gfa

# Display GFA statistics
python process_pangenome.py stats --input ../data/na19240_chr17_sv.gfa

# Validate a GFA file
python process_pangenome.py validate --input ../data/sample.gfa

# Extract a genomic region from a larger GFA
python process_pangenome.py extract-region --input full.gfa --output region.gfa \
    --chrom chr17 --start 10984564 --end 10993960

Target Region

The scripts are configured to process the chr17:10984564-10993960 (hg38) region, which contains a known structural variant of interest in NA19240.

Continuous Integration

The project uses GitHub Actions for CI/CD:

Test Suite: Runs on every push/PR (Linux + macOS, stable + beta Rust)
Release Builds: Creates binaries for Linux and macOS (x86_64 + ARM64)
Data Processing: Validates Python scripts and sample data generation

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Genome Viewer

AI-generated caveats and review focus

Features

Quick Start: NA19240 SV Visualization

Building

Prerequisites

Build Release Binary

Build Static Binary (Linux)

Usage

Running the Application

Loading Files

Navigation

Keyboard Shortcuts

Supported Formats

Input

Export

Project Structure

License

Testing

Running Tests

Test Data

Data Processing Scripts

Setup

Download Real NA19240 Data

Process Pangenome Data

Target Region

Continuous Integration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graph Genome Viewer

AI-generated caveats and review focus

Features

Quick Start: NA19240 SV Visualization

Building

Prerequisites

Build Release Binary

Build Static Binary (Linux)

Usage

Running the Application

Loading Files

Navigation

Keyboard Shortcuts

Supported Formats

Input

Export

Project Structure

License

Testing

Running Tests

Test Data

Data Processing Scripts

Setup

Download Real NA19240 Data

Process Pangenome Data

Target Region

Continuous Integration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages