Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in number of calls: CRAM vs BAM #344

Open
nicolechai opened this issue Nov 6, 2024 · 4 comments
Open

Difference in number of calls: CRAM vs BAM #344

nicolechai opened this issue Nov 6, 2024 · 4 comments
Assignees

Comments

@nicolechai
Copy link

Hi there,

I’m currently comparing the results of Clair3 v1.0.5 when alignments are stored within a BAM file vs CRAM. I am using HG002 replicates, and CRAM files were converted from the BAM file.

When comparing total number of variants called from the BAM file vs converted CRAM, I am seeing a 1-5 variant difference for 6 out of 8 HG002 replicates (out of on average 6 million variants called per replicate). Another thing I found interesting is that once the CRAM is converted back to BAM and is processed through Clair3, the total number of variant calls from the converted BAM matches the calls from original BAM.

Here is an example of what I am seeing:

Sample File type Clair3 total number of calls
HG002_replicate_A BAM 6095272
HG002_replicate_A CRAM 6095277
HG002_replicate_A BAM (converted back from CRAM) 6095272

Would you know why this might be happening?

Clair3 command used:

run_clair3.sh \
--bam_fn=$IN_ALN \
--ref_fn=$REF \
--threads=16 \
--platform="ont" \
--var_pct_full=0.7 \
--ref_pct_full=0.1 \
--snp_min_af=0.08 \
--indel_min_af=0.15 \
--model_path=$MODEL \
--output=$OUTPUT_DIR \
--remove_intermediate_dir

Kind regards,
Nicole

@aquaskyline
Copy link
Member

Hi Nicole,

Would you be able to show us the variant differences of the replicates you have. At least we want to know where the different variants are at.

@nicolechai
Copy link
Author

Sure, these are the variant differences that were seen in the replicates:

Replicate 1
Only present in CRAM file:

chr1	46788044	.	TA	T	0.00	LowQual	F	GT:GQ:DP:AD:AF	0/1:0:23:5,3:0.1304
chr7	215642	.	C	G	5.64	PASS	F	GT:GQ:DP:AD:AF	0/1:5:27:12,5:0.1852
chr7	215753	.	T	G	5.53	PASS	F	GT:GQ:DP:AD:AF	0/1:5:27:6,4:0.1481
chr7	215855	.	C	T	5.47	PASS	F	GT:GQ:DP:AD:AF	0/1:5:27:12,5:0.1852
chr7	216365	.	TGG	T	2.29	PASS	F	GT:GQ:DP:AD:AF	0/1:2:26:13,6:0.2308

Replicate 2
Only present in BAM file:

chr12   31648360        .       GT      G       0.00    LowQual F       GT:GQ:DP:AD:AF  0/1:0:30:6,7:0.2333

Replicate 3
Only present in BAM file:

chr2    175914347       .       G       GA      0.00    LowQual F       GT:GQ:DP:AD:AF  0/1:0:12:5,3:0.2500

Replicate 4
Only present in BAM file:

chr1    57957991        .       C       CT      0.00    LowQual F       GT:GQ:DP:AD:AF  0/1:0:22:5,4:0.1818
chr5    53203136        .       CG      C       9.82    PASS    F       GT:GQ:DP:AD:AF  0/1:9:18:10,7:0.3889

Replicate 5
Only present in CRAM file:

chr17   32741382        .       CA      C       0.00    LowQual F       GT:GQ:DP:AD:AF  0/1:0:28:12,5:0.1786

Replicate 6
Only present in CRAM file:

chr7    115261227       .       C       CT      0.83    LowQual F       GT:GQ:DP:AD:AF  0/1:0:30:24,5:0.1667

Looking in IGV many of these calls appear to be at the start of or within repetitive regions.

@aquaskyline
Copy link
Member

Thank you for the details, we will start investigating.

@nicolechai
Copy link
Author

Sounds good, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants