Skip to content

ConvertToReflFlat skips many transcripts #333

@auesro

Description

@auesro

Affected tool(s)

ConvertToRefFlat

Affected version(s)

  • Latest public release version [2.5.1]
  • Latest development/master branch as of [date of test?]

Description

I am analyzing a public dataset generated with Drop-seq GSE103892. In order to merge the resulting expression matrix to other 10x Genomics datasets, including my own, I have used the genome and gtf files as provided by Pool Lab, which represents an improved version over the 10x Genomics-provided reference files.

When running the provided script to generate the required Drop-seq metadata files, I observe that many transcripts and some exons are skipped at the ConvertToRefFlat step as can be observed in the attached log file.

Then, during the next step, ReduceGtf, the same skipping warnings are printed and, in addition, many warnings in the form of:
WARNING 2023-03-14 09:14:56 EnhanceGTFRecords gene GTFRecord(chrX:103356476-103396092 + . [Chic1 gene]) != GeneFromGTF(chrX:103356476-103409092 + Chic1) -- skipping are output...

Both types of warnings can be observed in the attached logfile: logfile.txt

The resulting refFlat file is 10 times smaller than the one provided by you in the Cookbook.

Is this normal?

Expected behavior

GTF to refFlat conversion includes all transcripts in the GTF

Actual behavior

Many transcripts are removed from refFlat.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions