Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TALON support for CIGAR strings found in pbmm2 sam files #132

Open
ddpinto opened this issue Jul 13, 2023 · 0 comments
Open

TALON support for CIGAR strings found in pbmm2 sam files #132

ddpinto opened this issue Jul 13, 2023 · 0 comments

Comments

@ddpinto
Copy link

ddpinto commented Jul 13, 2023

When aligning reads with pbmm2 the CIGAR strings use 'X' and '=' instead of 'M'. This causes issues with the split_cigar function in transcript_utils.py. I am trying the following patch to add support pbmm2 CIGAR strings but there may be other places in the code where this could be an issue. Any thoughts on the patch below and support for X/= CIGAR strings would be most appreciated.

--- /sc/arion/work/miniconda3/envs/talon/lib/python3.8/site-packages/talon/transcript_utils.py~	2023-06-25 10:41:50.811784000 -0400
+++ /sc/arion/work/miniconda3/envs/talon/lib/python3.8/site-packages/talon/transcript_utils.py	2023-07-13 17:26:59.877099000 -0400
@@ -68,6 +68,10 @@
     for op,ct in zip(ops, counts):
         if op == "M":
             matches += ct
+        if op == "X":
+            matches += ct
+        if op == "=":
+            matches += ct
         if op == "D":
             total_bases += ct
 
@@ -108,7 +112,7 @@
         the number of bases that each operation applies to. """
 
     alignTypes = re.sub('[0-9]', " ", cigar).split()
-    counts = re.sub('[A-Z]', " ", cigar).split()
+    counts = re.sub('[A-Z=]', " ", cigar).split()
     counts = [int(i) for i in counts]
 
     return alignTypes, counts
@@ -130,7 +134,7 @@
 
     ops, counts = split_cigar(cigar)
     for op,ct in zip(ops, counts):
-        if op in ["H", "M", "N", "D"]:
+        if op in ["H", "M", "N", "D", "X", "="]:
             end += ct
 
     return end - 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant