Skip to content

Strange read-splitting behaviour #184

@apredeus

Description

@apredeus

Dear bamtofastq developer team,

I recently came across a very interesting behaviour. I am trying to reprocess a public dataset that consists of 22 10x GEX runs (I've checked and I'm pretty positive that none of those are ATAC etc). Here is the link to the dataset:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138669

SRA has failed to recognise the "technical" R1 read, so they have made the submitter's 10x BAM files available. However, upon running the latest (v1.4.1) bamtofastq (each job has completed successfully etc), I have discovered that samples were split into two big groups. Group 1 (GSM4115877-GSM4115889) has generated normal index, technical R1 (26 bp), and biological R2 (98 bp). However, Group 2 (GSM4115868-GSM4115876) has generated 4 reads: index, R1 which is a biological read (98 bp), R2 containing cell barcode (16 bp), and R3 containing UMI (10 bp).

GSM4115868 I1 R1 R2 R3
GSM4115869 I1 R1 R2 R3
GSM4115870 I1 R1 R2 R3
GSM4115871 I1 R1 R2 R3
GSM4115872 I1 R1 R2 R3
GSM4115873 I1 R1 R2 R3
GSM4115874 I1 R1 R2 R3
GSM4115875 I1 R1 R2 R3
GSM4115876 I1 R1 R2 R3
GSM4115877 I1 R1 R2
GSM4115878 I1 R1 R2
GSM4115879 I1 R1 R2
GSM4115880 I1 R1 R2
GSM4115881 I1 R1 R2
GSM4115882 I1 R1 R2
GSM4115883 I1 R1 R2
GSM4115884 I1 R1 R2
GSM4115885 I1 R1 R2
GSM4115886 I1 R1 R2
GSM4115887 I1 R1 R2
GSM4115888 I1 R1 R2
GSM4115889 I1 R1 R2

All BAM tags/headers appear to be the same, even made by the same version of Cell Ranger (v3 I think).

SRR10254548.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY
SRR10254549.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf
..............
SRR10254569.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf

Do you know what is causing it, and I can I fix it?

For your convenience, here are some (NCBI) links to an "offending" and a "normal-behaving" BAM files:

bad BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254550/SC4possorted_genome_bam.bam.1
good BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254567/SC185possorted_genome_bam.bam.1

Thank you in advance!

-- Alex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions