Skip to content

ForomePlatform/gs-annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GSAnnotator-Java

Vert.x 4.5.9 purple

Overview

GS-Annotator is a next-generation variant annotation platform designed for demanding research and clinical genomics workflows. It introduces an architecture optimized for speed, extensibility, and deployment simplicity.

The system processes standard VCF files and integrates annotations from curated gene models, clinical databases, population frequencies, and in silico prediction algorithms. Its flexible plugin framework and open API facilitate integration of custom annotation sources such as specialized variant scoring algorithms or domain-specific databases.

GS-Annotator’s backend leverages AStorage - a specialized server built on RocksDB, a high-performance embedded key-value store renowned for its efficient read/write operations on large datasets. This design is particularly beneficial in genomics, where queries must handle high variant counts with minimal latency. The system can support interactive analyses and batch processing of millions of variants while offloading compute-intensive tasks, substantially minimizing local resource consumption.

The Java-based API server utilizes Vert.x, an event-driven toolkit that ensures responsive performance across concurrent requests. Users can access functionality through an intuitive command-line interface or a RESTful API (documented with OpenAPI), enabling seamless integration with bioinformatics pipelines and downstream applications.

Additional features include support for genome liftover between reference assemblies and deep integration with AnFiSA, a comprehensive variant curation platform. Together, these tools create an end-to-end solution from variant calling to clinical interpretation.

Supported annotation databases:

  • dbNSFP v4.3a

  • gnomAD v4

  • SpliceAI v1.3

  • ClinVar

  • GERP

  • dbSNP

Supported assembly versions:

  • GRCh37

  • GRCh38

Setup: Building and Running [Linux/MacOS]

Clone the master branch and package the application as a JAR file:

git clone [email protected]:ForomePlatform/gs-annotator.git
cd gs-annotator
./mvnw clean package

The JAR file will be generated inside the target directory as annotator-1.0.0.jar.

  • On the first run the application creates a data folder in the user’s home directory with the name GSAnnotator by default if not specified otherwise.

  • The service is running on port 8000 by default if not specified otherwise.

Note
These properties can be adjusted using a config.json file.

config.json example:

{
    "dataDirectoryPath": "/home/user/ExampleAnnotator",
    "serverPort": 8000,
    "aStorageServerUrl": "http://localhost:8080/"
}

To start the application run:

cd target
java -jar annotator-1.0.0.jar [config_json_path]
Note
Annotator logs are being written in <dataDirectoryPath>/output_<currentTimeMillis>.log file. Some of the output is printed in terminal where the program is being run.

For detailed API specification access the OpenAPI UI via: http://localhost:8000/api.

Annotation

To perform annotation of variants in your specified VCF you’ll need three files: input .vcf file, annotation configuration .cfg file and PLINK sample information .fam file. In case of performing VCF liftover additional files will be required.

Note
To use the annotation service AStorage service should be running on the configured AStorage server URL.

Required files:

Annotator config .cfg file in JSON format:

The configuration file supports these fields:

  • "assembly": Final assembly version of the given VCF file (After VCF liftover, if performed).

  • "performLiftover": Boolean field to indicate if liftover should be performed to the given VCF.

Note
If "performLiftover" field is set to true, "chainFilePath" and "fastaFilePath" are mandatory. These files can be downloaded from UCSC Genome Browser.
Note
VCF liftover and all the necessary steps for it are performed using tools from Picard.
  • "chainFilePath": Local path to an appropriate Chain file.

  • "fastaFilePath": Local path to an appropriate FASTA file.

Note
VCF liftover requires an appropraite .dict file alongside the FASTA file. If such file can’t be located it’ll be automatically generated in the same directory as FASTA.

.cfg file example if the original VCF is in GRCh37, and we want to liftover it to GRCh38:

{
    "assembly": "GRCh38",
    "performLiftover": true,
    "chainFilePath": "/hg19ToHg38.over.chain",
    "fastaFilePath": "/hg38.fa"
}

.fam file example:

1	bgm9001a1	bgm9001u2	bgm9001u1	1	2
1	bgm9001u1	0	0	2	1
1	bgm9001u2	0	0	1	1

Input .vcf file:

Note
VCF file name should be structured as <case>_<platform>_<project>.vcf to correctly generate annotation metadata.

To run the annotation:

curl -X POST -OJ 'http://localhost:8000/annotation/anfisa' -H 'accept: application/jsonl' -H 'Content-Type: multipart/form-data' -F 'cfgFile=@<path to .cfg file>' -F 'famFile=@<path to .fam file>' -F 'vcfFile=@<path to .vcf file>'

API reference: Anfisa Annotation.

About

Next generation variant annotator for the results of genome sequencing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages