Distance score and isoform match to mapped data for transcript-level quantification #158

lianov · 2024-02-23T22:33:58Z

lianov
Feb 23, 2024

Hello all!

First, thank you for developing this great tool. I, along with my colleague @atrull314, have implemented it as part of our nf-core/scnanoseq pipeline which is currently under development (nearing a first release). In our case we make use of the --read_group feature to quantify gene and transcript level counts at the single-cell level.

There is however, one point / a question which I wanted to bring up to ensure we are interpreting the algorithm correctly in how we apply it to single-cell data derived from Oxford nanopore. In our analysis, we note that overall the number of features detected for transcripts is higher than genes (which makes perfect sense. We expect that).

However, when we compute the mean number of features per cell, we typically see higher number of features per cell in genes vs transcripts. There are a number of reasons that we think this may be happening, but mainly the following come to mind:

When dealing with single-cell data, the data can already be sparse, and given that transcript-level quantification is expected to be a subset of the gene-level quantification, we would encounter a higher drop-out rate per feature.
Another reason is also the choice of isoform. Specifically the fact that in my understanding, isoquant does not quantify isoforms fractionally. It relies on the distance score to the exon and splice profiles to choose the isoform of highest confidence (e.g.: in Supplementary Figure 4 of your paper, my understanding is that the orange profile with score d=0 would be chosen, rather than sharing that quantification across all three options). Could you confirm this is indeed the case?

Let me know if our interpretation above is correct, specifically to point 2. We are just ensuring that our current implementation of isoquant to single-cell data will be clear to the community (and we have already validated a set of results to public data and it all seem to match what we expect).

Thanks!

Answered by andrewprzh

Feb 28, 2024

Dear @lianov

Thanks for a positive feedback and for bringing up this discussion!

I am not sure I entirely got this statement, but sounds about right. Could you elaborate just in case?
Yes, that's correct. IsoQuant assigns long reads based on the splice site matches. Moreover, the feature is reported only in there is at least one unique (unambiguous) read assignment.
Thus, it may happen, for example, that in a certain cell there are 3 reads mapping to a gene, but none of there reads can be unambiguously assigned to an isoform. Thus, for this cell the reported gene count will be 3, but all isoforms will receive count 0.
Thus it seems normal to see that per-cell you see more genes than i…

View full answer

andrewprzh · 2024-02-28T22:36:54Z

andrewprzh
Feb 28, 2024
Maintainer

Dear @lianov

Thanks for a positive feedback and for bringing up this discussion!

I am not sure I entirely got this statement, but sounds about right. Could you elaborate just in case?
Yes, that's correct. IsoQuant assigns long reads based on the splice site matches. Moreover, the feature is reported only in there is at least one unique (unambiguous) read assignment.
Thus, it may happen, for example, that in a certain cell there are 3 reads mapping to a gene, but none of there reads can be unambiguously assigned to an isoform. Thus, for this cell the reported gene count will be 3, but all isoforms will receive count 0.
Thus it seems normal to see that per-cell you see more genes than isoforms.

If you feel that this behavior can be changed, we can discuss possibility to implement another strategy.

P.S. If at some point you plat to use --count_exons, I'd like to warn you that it takes a lot of RAM when using with --read_groups on single cell data. I'm working on the optimizations.

Best
Andrey

6 replies

andrewprzh Mar 27, 2024
Maintainer

@lianov thanks for getting back!

I think I will switch to unique_only strategy as the default in the next release too.

Hope IsoQuant 3.4 will be released within a few weeks.

Best
Andrey

lianov Mar 27, 2024
Author

Great! Looking forward to it!

lianov Apr 23, 2024
Author

Just to follow-up, our pipeline (which implements IsoQuant) is now merged with the nf-core github group and can be found at: https://github.com/nf-core/scnanoseq.

It is still in the development phase, but we are in the final phase of dev, and hope to have a first release soon. Just wanted to share as it was part of this discussion.

andrewprzh Apr 24, 2024
Maintainer

Great! I am also close to finalizing new IsoQuant release.

By the way, I am also working on some features for long-read single-cell data, i.e. barcode detection and UMI dedeuplication. Possibly, you might be interested in using those functions as well.

lianov Apr 24, 2024
Author

@andrewprzh : we would absolutely be interested in testing and potentially implementing them! We already have these tasks implemented in the pipeline through alternative approaches, but we can always enable additional methods and functions as well (especially if they turn out to be faster and/or more accurate). Looking forward to your future updates!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distance score and isoform match to mapped data for transcript-level quantification #158

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Distance score and isoform match to mapped data for transcript-level quantification #158

lianov Feb 23, 2024

Replies: 1 comment · 6 replies

andrewprzh Feb 28, 2024 Maintainer

andrewprzh Mar 27, 2024 Maintainer

lianov Mar 27, 2024 Author

lianov Apr 23, 2024 Author

andrewprzh Apr 24, 2024 Maintainer

lianov Apr 24, 2024 Author

lianov
Feb 23, 2024

Replies: 1 comment 6 replies

andrewprzh
Feb 28, 2024
Maintainer

andrewprzh Mar 27, 2024
Maintainer

lianov Mar 27, 2024
Author

lianov Apr 23, 2024
Author

andrewprzh Apr 24, 2024
Maintainer

lianov Apr 24, 2024
Author