Distance score and isoform match to mapped data for transcript-level quantification #158
-
Hello all! First, thank you for developing this great tool. I, along with my colleague @atrull314, have implemented it as part of our There is however, one point / a question which I wanted to bring up to ensure we are interpreting the algorithm correctly in how we apply it to single-cell data derived from Oxford nanopore. In our analysis, we note that overall the number of features detected for transcripts is higher than genes (which makes perfect sense. We expect that). However, when we compute the mean number of features per cell, we typically see higher number of features per cell in genes vs transcripts. There are a number of reasons that we think this may be happening, but mainly the following come to mind:
Let me know if our interpretation above is correct, specifically to point 2. We are just ensuring that our current implementation of isoquant to single-cell data will be clear to the community (and we have already validated a set of results to public data and it all seem to match what we expect). Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Dear @lianov Thanks for a positive feedback and for bringing up this discussion!
If you feel that this behavior can be changed, we can discuss possibility to implement another strategy. P.S. If at some point you plat to use --count_exons, I'd like to warn you that it takes a lot of RAM when using with --read_groups on single cell data. I'm working on the optimizations. Best |
Beta Was this translation helpful? Give feedback.
Dear @lianov
Thanks for a positive feedback and for bringing up this discussion!
I am not sure I entirely got this statement, but sounds about right. Could you elaborate just in case?
Yes, that's correct. IsoQuant assigns long reads based on the splice site matches. Moreover, the feature is reported only in there is at least one unique (unambiguous) read assignment.
Thus, it may happen, for example, that in a certain cell there are 3 reads mapping to a gene, but none of there reads can be unambiguously assigned to an isoform. Thus, for this cell the reported gene count will be 3, but all isoforms will receive count 0.
Thus it seems normal to see that per-cell you see more genes than i…