Skip to content

MuhammadMuneeb007/GWASPokerforPRS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GWASPokerforPRS

GWAS Summary Statistic Tool: A Meta-Analysis and Parsing Tool for Polygenic Risk Score Calculation

GWAS (genome-wide association study) summary statistic files are used to calculate polygenic risk scores (PRS). Multiple research groups provide these files for a specific phenotype or disease. Scanning the GWAS available in the GWAS Catalog revealed that for a particular disease, there can be multiple GWAS files with varying populations, numbers of associations identified, sample sizes, validation samples, types of analyses used to generate the GWAS, genome builds, and types of information listed in the GWAS. Finding, downloading, and verifying the GWAS file for a specific phenotype can be challenging, as it involves downloading, extracting, parsing, and manually scanning the files. We propose a tool generated after analyzing 60,400 GWAS summary files from the GWAS Catalog. It allows scanning the GWAS file without fully downloading it. The process involves searching and downloading metadata for all GWAS related to a specific phenotype, partially downloading the GWAS file, parsing and cleaning the file to create a readable table, searching for specific column headers, and listing the necessary columns for polygenic risk score calculation. It also includes extracting the DOI and citation of the article using PMID, along with a Python code generator module for mapping original GWAS columns to those required by PRS tools.

Alt Text

conda create --name genetics --file environment.txt

Module 1

wget https://www.ebi.ac.uk/gwas/api/search/downloads/studies/v1.0.3.1
mv v1.0.3.1 summary_statistics_table_export.tsv
python Module1-SearchPhenotypeandPopulation.py --phenotype asthma --population European
python Module1-SearchPhenotypeandPopulation.py --phenotype migraine

Module 2

Once you manually process the output file, do not change the file name and keep it the same for smooth working of the code.
python Module2-Search_Poke_Normalize_Scan.py --processedfile migraine.csv

Module 3

python Module3-DownloadGWAS.py --processedfile Input-Module3-Migraine.csv --indexer 1
The Input-Module3-Migraine.csv contains the following information. The name is the directory in which the file should be downloaded and the further processed files will be stored in the same directory.
Name,Download Link
migraine,http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90038001-GCST90039000/GCST90038646/GCST90038646_buildGRCh37.tsv

Please include the Hugging Chat password and email, as it is required for the execution.

Module 4

python Module4-ExtractGWAS.py --processedfile Input-Module3-Migraine.csv --indexer 1

Module 5

If you want to check the columns in your own GWAS file. The code accepts .csv format.
python Module5-ListPRSColumns.py --gwasfile gwas.csv.modified
Index File Input Input Parameters Output
0 Module 0 - GWAS File Analysis GWAS Metafile
Downloaded from the GWAS Catalog
NA Frequency Plots, Wordclouds Plots
1 Module1-SearchPhenotypeandPopulation.py GWAS Metafile
Downloaded from the GWAS Catalog
phenotype asthma
(optional) population asian
GWAS files for a specific phenotype/disease
2 Module1-SearchPhenotypeandPopulation.py Manually Processed file processedfile migraine.csv migrain.html
A file containing the information about the GWAS headers.
3 Module3-DownloadGWAS.py Manually processed file containing Phenotype (Directory) Name and (GWAS) Link processedfile Manually Processed file containing Phenotype name and GWAS link. (name,link)
Indexer 1 - it represents the row that you want to download.
Complete GWAS file in a directory.
4 Module4-ExtractGWAS Manually processed file containing Phenotype (Directory) Name and (GWAS) Link processedfile Manually Processed file containing Phenotype name and GWAS link. (name,link)
Indexer 1 - it represents the row that you want to download.
Processed GWAS as gwas.csv.modified
Output.py file containing the transformation
5 Module5-ListPRSColumns GWAS file gwasfile gwas.csv.modified Output.py file containing the transformation and mapping.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors