Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contigN50 too small #259

Open
shiyi-pan opened this issue Apr 12, 2023 · 13 comments
Open

contigN50 too small #259

shiyi-pan opened this issue Apr 12, 2023 · 13 comments

Comments

@shiyi-pan
Copy link

Hi, I've used wtdbg2 to assemble a soybean genome with 62-fold Hifi data. The resulting fasta file has 4604 contigs, with contigN50 of 595316 bp, which is too small.
Here is my script,
wtdbg2 -x ccs -g 1144.6m -t 8 -i Glycine_max.ccs.bam.fasta.gz -fo wtdbg2
wtpoa-cns -t 8 -i wtdbg2.ctg.lay.gz -fo wtdbg2.raw.fa
Could you give me some advise to improve contigN50 ? Thank you very much.

@ruanjue
Copy link
Owner

ruanjue commented Apr 12, 2023

The results look so bad. Have you tried hifiasm?

@shiyi-pan
Copy link
Author

Yes, the hifiasm obtains an assembly which has 1389 contigs with contigN50 of 26905840 bp using the same data.
Here is the hifiasm scripts:
hifiasm -o NN1138_hifiasm -t 8 Glycine_max.ccs.bam.fasta.gz -l0 -f0

Could you give me some advises to get an better assembly with wtdbg2.

@ruanjue
Copy link
Owner

ruanjue commented Apr 12, 2023

ccs: -p 21 -k 0 -AS 4 -K 0.05 -s 0.5
Try -p 0 -k 19/21, try -s 0.6/0.7 ....

@shiyi-pan
Copy link
Author

Hi, thank you for your reply. Do you means I run the following commands one by one and choose the best one?
wtdbg2 -x ccs -g 1144.6m -t 8 -i Glycine_max.ccs.bam.fasta.gz -fo wtdbg2 -p 0 -k 19 -s 0.6
wtdbg2 -x ccs -g 1144.6m -t 8 -i Glycine_max.ccs.bam.fasta.gz -fo wtdbg2 -p 0 -k 19 -s 0.7
wtdbg2 -x ccs -g 1144.6m -t 8 -i Glycine_max.ccs.bam.fasta.gz -fo wtdbg2 -p 0 -k 21 -s 0.6
wtdbg2 -x ccs -g 1144.6m -t 8 -i Glycine_max.ccs.bam.fasta.gz -fo wtdbg2 -p 0 -k 21 -s 0.7
Thank you again.

@shiyi-pan
Copy link
Author

Hi, I tried the following script and the contig number is 5601 and N50 is 424654 bp.

wtdbg2 -x ccs -g 1144.6m -t 8 -i Glycine_max.ccs.bam.fasta.gz -fo NN1138_wtdbg2_test1 -p 0 -k 19 -s 0.6
wtpoa-cns -t 8 -i NN1138_wtdbg2_test2.ctg.lay.gz -fo NN1138_wtdbg2_test2.raw.fa

@ruanjue
Copy link
Owner

ruanjue commented Apr 19, 2023

Please decrease -s to 0.4/0.3 maybe error rate of those CCS is a bit higher.

1 similar comment
@ruanjue
Copy link
Owner

ruanjue commented Apr 19, 2023

Please decrease -s to 0.4/0.3 maybe error rate of those CCS is a bit higher.

@shiyi-pan
Copy link
Author

I really appreciate your reply and will try -s 0.4/0.3,ruanjue.

@shiyi-pan
Copy link
Author

Hi, I have tried the '-p 0 -k 19 -s 0.4' parameter but not performed worse with contig N50 of 141547 bp.
Here is my logs. Could you give me some other advise, thank you a lot.
184279.err.txt

@ruanjue
Copy link
Owner

ruanjue commented May 4, 2023

From the log file, it looks a highly repetitive genome. It is better to increase -k to 21/23, also try to increase the -s to 0.5/0.6/0.7. Any progress, please send the full log file.

@shiyi-pan
Copy link
Author

shiyi-pan commented May 8, 2023

Thank you for your help, Dr. ruanjue. This time I tried the -p 0 -k 23 -s 0.7 and got a best assembly with longset total length (905541315) and contig N50(724860). But still shorter than other Assemblers' results.
The specie I am studying is soybean. It has 50% TEs in genomes.
Here is my logs. I will try -p 0 -k 23 -s 0.6.
199135.err.txt
199135.out.txt
thank you again.

@ruanjue
Copy link
Owner

ruanjue commented May 8, 2023

Also try change --node-max to a larger value. You can --load-alignments from previous results, and set different --node-max. Don't invoke wtpoa-cns before you get fine result, because it is the most time-cost.

@shiyi-pan
Copy link
Author

Thank you for your help, Dr. ruanjue, sorry for relpy late. I didn't know the --load-alignments parameter, so I deleted the alignment file for saving hard drive space.
I tried the -p 0 -k 23 -s 0.6 but it's worse than before. The final assembly has 889781517 bp with contigN50 of 527048 bp. Here is my log file.
251452.err.txt
268921.err.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants