Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long overhang in POA causes bad global alignment #40

Open
rlorigro opened this issue Mar 6, 2022 · 4 comments
Open

Long overhang in POA causes bad global alignment #40

rlorigro opened this issue Mar 6, 2022 · 4 comments

Comments

@rlorigro
Copy link
Collaborator

rlorigro commented Mar 6, 2022

Some bicliques have dramatically different size overlaps:

L   s14.utg028499l +  s14.utg050706l +  20122M
L  s14.utg002821l_L   -  s14.utg028499l -  9027M172D

And create bad global alignments:
image

Because abpoa fails to allow large indels on one end. Need hemi-global alignment? @jeizenga

@jeizenga
Copy link
Collaborator

jeizenga commented Mar 7, 2022

It looks like abPOA has a prefix-to-prefix option, but no suffix-to-suffix option. Also, it doesn't allow you to mix between global/local/prefix for different sequences in the alignment. Still, I think this might be something we could accomplish with a fork, or maybe a feature request.

@rlorigro
Copy link
Collaborator Author

Just updating this: it seems that a local alignment may improve the outcome in situations like this. However, @Sebastien-Raguideau it does look like there are large portions of the node that probably shouldn't be in the overlap.

In the provided example:

L   s14.utg028499l +  s14.utg050706l +  20122M
L  s14.utg002821l_L   -  s14.utg028499l -  9027M172D

It says that you have a 172bp deletion, which indicates that your exact aligner has found the hifiasm approximate overlap to be too long for one of the sequences. When we run local alignment it produces this graph:
image
Showing a portion of 172bp that cant align at all to the other sequence.

A dotplot of the overlaps shows a small gap (which appears as many bubbles in the GFA POA graph)
image

@Sebastien-Raguideau
Copy link

Hi @rlorigro, nice plot :) I saw that a lot when generating the cigars. It is pretty weird because the non-overlaping bit is at the edge of the unitig. So rather than the overlap length being too long, it is that the overlap doesn't start at the edge and instead start 172 nucleotide from it. It feels as if some strange tips were not removed from the assembly graph.

@rlorigro
Copy link
Collaborator Author

Yeah, I looked a bit longer and realized that as well, so i guess there is no easy solution for these. I agree its odd that 9,000bp would be a perfect match and then deviate into random for only 172bp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants