-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hi @hpages
I have stumbled upon a peculiar case where the makeTxDbFromGRanges function is dropping a lot of transcripts
## Deifne release of annotation
x=49
## Import GFF
txdb<-paste0("https://toxodb.org/common/downloads/release-",x,"/TgondiiME49/gff/data/ToxoDB-",x,"_TgondiiME49.gff") %>% txdbmaker::makeTxDbFromGFF(format = "gff3")
Warning message:
In makeTxDbFromGRanges(gr, metadata = metadata) :
The following transcripts were dropped because their exon ranks could
not be inferred (either because the exons are not on the same
chromosome/strand or because they are not separated by introns):
TGME49_200010-t26_1, TGME49_200110-t26_1, TGME49_200130-t26_1,
TGME49_200230-t26_1, TGME49_200240-t26_1, TGME49_200250-t26_1,
TGME49_200260-t26_1, TGME49_200270-t26_1, TGME49_200280-t26_1,
TGME49_200290-t26_1, TGME49_200295-t26_1, TGME49_200300-t26_1,
TGME49_200310-t26_1, TGME49_200320-t26_1, TGME49_200330-t26_1,
TGME49_200340-t26_1, TGME49_200350-t26_1, TGME49_200360-t26_1,
TGME49_200370-t26_1, TGME49_200375-t26_1, TGME49_200385-t26_1,
TGME49_200400-t26_1, TGME49_200410-t26_1, TGME49_200430-t26_1,
TGME49_200440-t26_1, TGME49_200450-t26_1, TGME49_200460-t26_1,
TGME49_200470-t26_1, TGME49_200480-t26_1, TGME49_200590-t26_1,
TGME49_200595-t26_1, TGME49_200600-t26_1, TGME49_200700-t26_1,
TGME49_201100-t26_1, TGME49_201110-t26_1, TGME49_201120-t26_1,
TGME49_201130-t26 [... truncated]
Only the release 49 annotation is problematic for this parasite. Having a look at the GFF file, I am still unable to understand why the makeTxDbFromGRanges will discard TGME49_200010-t26_1 in release 49 but will have no issue with TGME49_200010.R447 in release 68? is it because mRNA field is missing from the GFF?
## Release 49
TGME49_chrXII VEuPathDB gene 2245476 2249210 . - . ID=TGME49_200010;description=hypothetical protein
TGME49_chrXII VEuPathDB protein_coding_gene 2245476 2249210 . - . ID=TGME49_200010-t26_1;Parent=TGME49_200010;description=hypothetical protein
TGME49_chrXII VEuPathDB exon 2245476 2247209 . - . ID=exon_TGME49_200010-t26_1-E2;Parent=TGME49_200010-t26_1
TGME49_chrXII VEuPathDB exon 2247554 2249210 . - . ID=exon_TGME49_200010-t26_1-E1;Parent=TGME49_200010-t26_1
TGME49_chrXII VEuPathDB CDS 2246111 2247209 . - 1 ID=TGME49_200010-t26_1-p1-CDS2;Parent=TGME49_200010-t26_1;protein_source_id=TGME49_200010-t26_1-p1
TGME49_chrXII VEuPathDB CDS 2247554 2247696 . - 0 ID=TGME49_200010-t26_1-p1-CDS1;Parent=TGME49_200010-t26_1;protein_source_id=TGME49_200010-t26_1-p1
TGME49_chrXII VEuPathDB three_prime_UTR 2245476 2246110 . - . ID=utr_TGME49_200010-t26_1_1;Parent=TGME49_200010-t26_1
TGME49_chrXII VEuPathDB five_prime_UTR 2247697 2249210 . - . ID=utr_TGME49_200010-t26_1_2;Parent=TGME49_200010-t26_1
## Release 68
TGME49_chrXII VEuPathDB protein_coding_gene 2245476 2248187 . - . ID=TGME49_200010;Name=GRA20;description=dense granule protein GRA20;ebi_biotype=protein_coding
TGME49_chrXII VEuPathDB mRNA 2245476 2248187 . - . ID=TGME49_200010.R447;Parent=TGME49_200010;description=dense granule protein GRA20;gene_ebi_biotype=protein_coding
TGME49_chrXII VEuPathDB exon 2245476 2247209 . - . ID=exon_TGME49_200010.R447-E2;Parent=TGME49_200010.R447;gene_id=TGME49_200010
TGME49_chrXII VEuPathDB exon 2247554 2248187 . - . ID=exon_TGME49_200010.R447-E1;Parent=TGME49_200010.R447;gene_id=TGME49_200010
TGME49_chrXII VEuPathDB CDS 2246111 2247209 . - 1 ID=TGME49_200010.P447-CDS2;Parent=TGME49_200010.R447;gene_id=TGME49_200010;protein_source_id=TGME49_200010.P447
TGME49_chrXII VEuPathDB CDS 2247554 2247696 . - 0 ID=TGME49_200010.P447-CDS1;Parent=TGME49_200010.R447;gene_id=TGME49_200010;protein_source_id=TGME49_200010.P447
TGME49_chrXII VEuPathDB three_prime_UTR 2245476 2246110 . - . ID=utr_TGME49_200010.R447_1;Parent=TGME49_200010.R447
TGME49_chrXII VEuPathDB five_prime_UTR 2247697 2248187 . - . ID=utr_TGME49_200010.R447_2;Parent=TGME49_200010.R447
Metadata
Metadata
Assignees
Labels
No labels