-
Notifications
You must be signed in to change notification settings - Fork 2
Description
[Moved from https://github.com/Bioconductor/GenomicFeatures/issues/65 on March 22, 2024]
Question: How to make a TxDb object for the T2T-CHM13v2.0 genome (telomere to telomere Human genome), a.k.a. the hs1 genome at UCSC.
Answer: Unfortunately, makeTxDbFromUCSC() doesn't support hs1 at the moment, so we're going to use the GFF file provided by NCBI.
-
Download
GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gzfrom https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/ -
Import the GFF file as a GRanges object:
library(rtracklayer) ## Takes < 1 min, consumes about 7Gb of RAM gff <- import("GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz") -
Note that the sequence names in the GRanges object are RefSeq accessions:
seqlevels(gff) # [1] "NC_060925.1" "NC_060926.1" "NC_060927.1" "NC_060928.1" "NC_060929.1" # [6] "NC_060930.1" "NC_060931.1" "NC_060932.1" "NC_060933.1" "NC_060934.1" # [11] "NC_060935.1" "NC_060936.1" "NC_060937.1" "NC_060938.1" "NC_060939.1" # [16] "NC_060940.1" "NC_060941.1" "NC_060942.1" "NC_060943.1" "NC_060944.1" # [21] "NC_060945.1" "NC_060946.1" "NC_060947.1" "NC_060948.1"Let's change them to the official chromosome names:
library(GenomeInfoDb) chrominfo <- getChromInfoFromNCBI("T2T-CHM13v2.0") seqlevels(gff) <- setNames(chrominfo$SequenceName, chrominfo$RefSeqAccn) seqlevels(gff) # [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" # [16] "16" "17" "18" "19" "20" "21" "22" "X" "Y" "MT" -
Add the complete sequence info to the GRanges object:
seqinfo(gff) <- Seqinfo(genome="T2T-CHM13v2.0") seqinfo(gff) # Seqinfo object with 25 sequences (1 circular) from T2T-CHM13v2.0 genome: # seqnames seqlengths isCircular genome # 1 248387328 FALSE T2T-CHM13v2.0 # 2 242696752 FALSE T2T-CHM13v2.0 # 3 201105948 FALSE T2T-CHM13v2.0 # 4 193574945 FALSE T2T-CHM13v2.0 # 5 182045439 FALSE T2T-CHM13v2.0 # ... ... ... ... # 21 45090682 FALSE T2T-CHM13v2.0 # 22 51324926 FALSE T2T-CHM13v2.0 # X 154259566 FALSE T2T-CHM13v2.0 # Y 62460029 FALSE T2T-CHM13v2.0 # MT 16569 TRUE T2T-CHM13v2.0 -
Use
makeTxDbFromGRanges()to make a TxDb object from the GRanges object:library(txdbmaker) ## This will emit 3 warnings that can be ignored. txdb <- makeTxDbFromGRanges(gff, taxonomyId=9606) txdb # TxDb object: ## Db type: TxDb ## Supporting package: GenomicFeatures ## Genome: T2T-CHM13v2.0 ## Organism: Homo sapiens ## Taxonomy ID: 9606 ## Nb of transcripts: 188205 ## Db created by: txdbmaker package from Bioconductor ## Creation time: 2024-03-22 16:56:52 -0700 (Fri, 22 Mar 2024) ## txdbmaker version at creation time: 0.99.7 ## RSQLite version at creation time: 2.3.5 ## DBSCHEMAVERSION: 1.2
Note that if you need the UCSC chromosome names instead of the NCBI ones, you can switch them with seqlevelsStyle():
seqlevelsStyle(txdb)
# [1] "NCBI"
seqlevelsStyle(txdb) <- "UCSC"
seqlevelsStyle(txdb)
# [1] "UCSC"
seqlevels(txdb)
# [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9"
# [10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
# [19] "chr19" "chr20" "chr21" "chr22" "chrX" "chrY" "chrM"
H.