Skip to content

Conversation

@maxim-h
Copy link

@maxim-h maxim-h commented Nov 19, 2025

I followed the OSCA introduction and saw that 100% of genes were classified as mitochondrial.
The GRanges accessor seqnames returns a factor-Rle object and any collapses it to a single
logical value, which usually will be TRUE.

Not sure if something changed in GenomicRanges or if any has special behavious for GRangesList objects,
but the book shows adding rowRanges as just a GRanges object. The exact conversion to an index
can be done differently, and maybe faster, like so as.logical(seqnames(location)=="MT), but I don't
see much difference in practice.

@alanocallaghan
Copy link
Collaborator

Thanks! I'd like to ensure that we have checks to guard against this in the build

@LTLA
Copy link
Collaborator

LTLA commented Nov 20, 2025

Are you sure? The code here works fine for me in 3.22:

library(scRNAseq)
sce.416b <- LunSpikeInData(which="416b") 
sce.416b$block <- factor(sce.416b$block)

library(SingleCellExperiment)
# Identifying the mitochondrial transcripts in our SingleCellExperiment.
location <- rowRanges(sce.416b)
is.mito <- any(seqnames(location)=="MT")

summary(is.mito)
##    Mode   FALSE    TRUE 
## logical   46567      37 

Also works fine on the build system, seeing as how the percentages on the compiled book range from 4.5 to 15.6.

This should work for both GRanges object and a GRangesList object,
if the latter doesn't actually have any hierarchical structure
@maxim-h
Copy link
Author

maxim-h commented Nov 23, 2025

Ok, apologies for the confusion.

The current code definitely works with the example data in the book. Where I got into a problem was trying to analyze another dataset. So I went here to see how to actually populate the rowRanges and got the GRanges object that failed with the described approach. Definitely a reading comprehension issue, but I wonder if there's more people who might assume that it should work with any SCE object.

So the main problem is just the fact that rowRanges can have either GRanges or a GRangesList objects and I don't know of a simple one liner that would work with both in all cases. I've added one more change that makes it compatible with both the example dataset with GRangesList and my own data with GRanges. However this probably won't work if GRangesList actually contains multiple locations per gene, which is what I now understand the current approach is supposed to handle. In example data it doesn't matter, since there's only one location per gene.

At this point I'm not sure if the change is worth it, since current one doesn't work on GRanges and the new one won't work on a complex GRangesList. Maybe you have some alternative ideas. Or just feel free to close it if out of scope.

@LTLA
Copy link
Collaborator

LTLA commented Nov 23, 2025

Reminds me of Bioconductor/GenomicRanges#52, actually.

@maxim-h
Copy link
Author

maxim-h commented Nov 24, 2025

I see, not a new problem.

What about something like this:

is.mito <- sapply(seqnames(location)=="MT", any)

Still works on Rle produced from GRanges object and if we get an RleList from the GRangesList it should be equivalent to doing any(RleList). At least it's stated to be equivalent for other AtomicList-summarization methods like sum, mean and such.

Revert "Switch to gene names"

This reverts commit c7ae896.
@LTLA
Copy link
Collaborator

LTLA commented Nov 24, 2025

I'll leave your suggestion to the discretion of @PeteHaitch. Though it might also be a good time to nudge @hpages to have a fresh look at Bioconductor/GenomicRanges#60, now that we have a real user running into this problem.

@maxim-h
Copy link
Author

maxim-h commented Nov 24, 2025

I'll also note here the main drawback I found with the sapply approach - it's painfully slow on a GRangesList object. Around 30s on my laptop with the 416b dataset. This is ~4000 times slower compared to current code in the book.

With GRanges object it's still instant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants