-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98
Comments
Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct. I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything. |
Thank you for the reply and for looking into it. When I have the time I will look into this type of read failing and read characteristics. |
If you're also planning to look into it on your end, here's some code that might be useful as a starting point: https://github.com/fairliereese/220421_talon_debug/blob/master/check_talon_log.ipynb |
I looked into it a bit more and I am still at a loss why some reads are failing. This was consistent across multiple samples although all processed the same so there is the possibility I am doing something weird. |
Any update on this? Following @fairliereese 's advice I've checked my own data and revealed the same issue. |
Using TALON v5 installed
python setup.py install
on HPC running DebianUsing
python version 3.6.7
I kept the default 0.9 fraction alignment and 0.8 identity defaults
I was routing through the TALON QC log file because we are seeing many reads filtered out despite using cap-trap and oligo-dT alignment so sure we have good quality data. I actually found a potential issue that may account for a lot of reads having low fraction alignment due to my library prep and pychopper not trimming effectively the polyA tails from the FASTQ reads but then I saw an additional subset of alignments that were filtered out not because they were not primary alignments, nor failed either of the fraction aligned or identity filters.
I attach an upSet plot of the reasoning for an alignment passed to TALON to either pass or fail the QC step. You can see the third column has no reason to fail around 3.5M reads.
I was looking through the TALON_label log and I roughly saw around 0.5M reads with evidence of internal priming but from what I understand this doesn't factor for generating the talon database.
Is there some other behind the scenes filtering going on during database generation that isn't reported in the QC log?
The text was updated successfully, but these errors were encountered: