Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupReadsByUmi may fail when marking duplicates including secondary/supplementary reads #964

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nh13
Copy link
Member

@nh13 nh13 commented Jan 30, 2024

There's an open issue in hts-specs about how we want to handle getting the primary alignment information when looking at a secondary or supplementary read: samtools/hts-specs#755

This PR adds the read primary "rp" tag to store the primary alignment for end of the current secondary/supplementary alignment, in the same format as the "SA" tag. The mate's primary alignment is stored in the "mp" tag. Both are currently lowercase as they are not reserved tags.

I have tested that ZipperBams will now add these, that SortBam will correctly sort in template-coordinate, and finally that GroupReadsByUmi passes. I added tests for GroupReadsByUmi and SamOrder.

Also, in my hands, secondary and supplementary records will never be output by GroupReadsByUmi as currently only primary alignments are output.

Copy link

codecov bot commented Jan 30, 2024

Codecov Report

Attention: Patch coverage is 93.54839% with 4 lines in your changes missing coverage. Please review.

Project coverage is 95.67%. Comparing base (f93fdfb) to head (bf2dba3).

Files with missing lines Patch % Lines
src/main/scala/com/fulcrumgenomics/bam/Bams.scala 95.45% 2 Missing ⚠️
...n/scala/com/fulcrumgenomics/bam/api/SamOrder.scala 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #964      +/-   ##
==========================================
+ Coverage   95.65%   95.67%   +0.02%     
==========================================
  Files         126      126              
  Lines        7403     7447      +44     
  Branches      521      506      -15     
==========================================
+ Hits         7081     7125      +44     
  Misses        322      322              
Flag Coverage Δ
unittests 95.67% <93.54%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -208,21 +208,31 @@ case class Cigar(elems: IndexedSeq[CigarElem]) extends Iterable[CigarElem] {
def trailingSoftClippedBases: Int = stats.trailingSoftClippedBases

/** Returns the number of bases that are hard-clipped at the start of the sequence. */
def leadingHardClippedBases = this.headOption.map { elem =>
def leadingHardClippedBases: Int = this.headOption.map { elem =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public defs should have return types here and below

@nh13 nh13 force-pushed the nh_markdup_order_issue branch from 36c78a6 to bbfaeeb Compare February 5, 2025 02:07
nh13 added 4 commits February 4, 2025 19:07
…nd supplementary reads

Secondary and supplementary reads must use the coordinates of the
primary alignments within the template, otherwise they will not
guaranteed to be next the primary alignments in the file.  Therefore,
we've added the "rp" and "mp" tags to store the SA-tag equivalent
information for the primary alignment.  This keeps information about the
primary alignments with the secondary and supplementary alignments.
@nh13 nh13 force-pushed the nh_markdup_order_issue branch from bbfaeeb to f9390ab Compare February 5, 2025 02:10
@nh13 nh13 temporarily deployed to github-actions February 5, 2025 02:10 — with GitHub Actions Inactive
@nh13 nh13 temporarily deployed to github-actions February 5, 2025 02:10 — with GitHub Actions Inactive
@nh13 nh13 marked this pull request as ready for review February 5, 2025 02:11
@nh13 nh13 requested a review from tfenne as a code owner February 5, 2025 02:11
@nh13 nh13 temporarily deployed to github-actions February 5, 2025 02:11 — with GitHub Actions Inactive
@nh13 nh13 temporarily deployed to github-actions February 5, 2025 02:11 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupReadsByUmi duplicate marking may fail when secondary and supplementary alignments are included
2 participants