Adding Strandcheck and Strandflip-correction#375
Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
OK, I open this for code review new. It's probably hard to review, sorry @nevrome. The important points:
Finally, I have not battle-tested this new feature yet, and I am planning to reach out to users who have reported strand flips in the past. So while this is ready for review, I would like to delay merging until I've had a chance to see it in practice. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #375 +/- ##
==========================================
+ Coverage 51.80% 52.15% +0.34%
==========================================
Files 36 36
Lines 5870 5881 +11
Branches 641 642 +1
==========================================
+ Hits 3041 3067 +26
+ Misses 2188 2172 -16
- Partials 641 642 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
nevrome
left a comment
There was a problem hiding this comment.
I went through the code. Looks very good, overall. My main concerns are with the interface, but maybe I just don't understand exactly what you wanted to achieve.
| parseStrandCheck = OP.switch ( | ||
| OP.long "strandCheck" <> | ||
| OP.help "Whether to allow strand flips in the genotype data. Note that this will remove \ | ||
| \any A/T and G/C SNPs from the data, as for those we cannot determine the correct strand orientation.") |
There was a problem hiding this comment.
About the interface in general, as a slightly oblivious user myself:
Does that mean the default behaviour of trident does not change, if this option is not active? Is this a good default? Should --strandCheck not be active by default?
You write
If a user suspects strand flips (they will notice because the merged number of SNPs will be very low in case there are strand flips but trident assumes none) ...
but I wonder if I myself would actually notice this. Should there not be a warning in these cases? Maybe a little report at the end of the forging operation, that reports the number of lost SNPs and makes suggestions like "use --strandCheck" under certain conditions?
There was a problem hiding this comment.
Yes, the setting right now is such that the default behaviour does not change, and by default we i) assume that all SNPs live on the same strand, and ii) that we therefore do not have to check or flip anything.
There are arguments for setting a different default, I will raise this in a new comment below.
| blocks <- liftIO $ catch ( | ||
| runSafeT $ do | ||
| eigenstratProd <- getJointGenotypeData logA False relevantPackages Nothing | ||
| eigenstratProd <- getJointGenotypeData logA False False relevantPackages Nothing |
There was a problem hiding this comment.
I understand now what you mean by
they should first forge (and thereby correct strand flips), and then run any additional command. At least I as a user would find it a little bit "too smart", if trident where able to correct strand flips somewhat under the hood implicitly every time it reads in multiple datasets...
I was wondering which subcommand you had in mind, completely forgetting about xerxes 🤦. Good that we have this code in here now, and can not delay this decision!
I'm not sure about it, honestly. But I feel xerxes should at least tell me that there is a potential issue if I run it on malformed data. And if I could run --strandCheck here, instead of tediously forging everything beforehand, I would probably appreciate the convenience. 🤔
There was a problem hiding this comment.
Hmm, yes, I expected this opinion from you 😁... and I was going back and forth on this. OK, let's discuss this below more fully.
Co-authored-by: Copilot <copilot@github.com>
|
OK, so @nevrome raised three important points here: Should
|
Co-authored-by: Copilot <copilot@github.com>
|
Ok - thanks for the explanation. I think what you already partially implemented in 46b17ac is the right way to go: Stop if incongruent SNPs are detected, and force the user to make a decision how to deal with them. That's the kind of hand-holding I appreciate. If I understand correctly, then there may be issues that I'm also wondering about the naming of these options:
|
This is going to add an important feature. For now just work in progress, will update that message in due time. The branch is on top of v2.0.0.0, so commits within this PR will simplify once #369 is merged.