For aligning chromosomes of different species over 100MYA or div is it better to use .masked files? #383

Isoris · 2024-03-20T09:44:20Z

Hello,

I would like to know if using masked genomes is more efficient than non-masked genomes for all-vs-all cross species alignments?

Thank you in advance for your answer

Quentin

AndreaGuarracino · 2024-10-16T23:58:33Z

Using masked genomes is more efficient because it saves on computations (masked stuff is not aligned), with the price of possibly losing interesting alignments and therefore relationships between genomes!

Isoris · 2024-10-17T00:12:23Z

Ok and another question. Is there anyway that I could use PGGB for reference guided genome scaffolding? Now I have a catfish genome assembly that I'm working on. The QV is > 50 but there is still 500+ gaps after HiC scaffolding. For instance let's say we have the chromosome 1 with 35 gaps. Would it be possible to use minimap2 to find tge matching pairs of homologous chromoomes in other species. Then use PGGB to align the chromosomes 1 ( eventually masked ) in order to find the correct path in the GFA file in bandage? Can PGGB accomodate with nanopore reads and HIFI reads or maybe it would be better to extract the path from the graph and then map thé reads on it to see if the gap can be closed. It is not really related to this github issue, but I am just wondering if PGGB has ever been used for scaffolding / gapclosing using related species? If so, how to do? Thank you very much in advance Andrea.

…

On Thu, Oct 17, 2024, 6:58 AM Andrea Guarracino ***@***.***> wrote: Using masked genomes is more efficient because it saves on computations (masked stuff is not aligned), with the price of possibly losing interesting alignments and therefore relationships between genomes! — Reply to this email directly, view it on GitHub <#383 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASYS5TG6DCNEMT7WQ6VK3RDZ334T5AVCNFSM6AAAAABQCRQCR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJYGE4DCMJZGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

AndreaGuarracino · 2024-10-19T14:49:38Z

If I understand your questions correctly/partially, you would like to curate an assembly using assemblies of other related species modeled in a pangenome graph. It sounds like a task for tools able to align sequences against a graph, like Minigraph and GraphAligner, where you align your gapped assembly (scaffolds with NNNNNs for the gaps) against the other-species graphs.

PGGB can accommodate everything, but its first step is an all-vs-all alignment, and you don't want to put millions of reads in the input. Moreover, PGGB is a 'trash-in -> trash-out' pipeline, so if your reads are noisy, your noise will smear your output.

I smell PGGB could be used for scaffolding/gapclosing somehow, but we don't have a pipeline for that (we've never used it that way).

Isoris · 2024-10-20T03:39:17Z

Because in your paper you did this:

To identify which chromosomes were represented in each community, we partitioned all contigs by mapping them against both T2T-CHM13v1.1 and GRCh38 human reference genomes with WFMASH, this time requiring homologous regions at least 150 kb long and nucleotide identity of at least 90%.

wfmash chm13+grch38.fa HPRCy1.fa -s 50k -l 150k -p 90 -n 1 -H 0.001 -m -N 

We disabled the contig splitting (-N) during mapping to obtain homologous regions covering the whole contigs. For the unmapped contigs, we repeated the mapping with the same parameters, but allowing the contig splitting (without specifying -N). We labelled contigs ‘p’ or ‘q’ depending on whether they cover the short arm or the long arm of the chromosome they belonged to. Contigs fully spanning the centromeres were labelled ‘pq’. We used such labels to identify the chromosome composition of the communities detected in the mapping graph obtained without reference sequences, and to annotate the nodes in the mapping graph.

Yes you understand correctly my question. Do you think that it would be possible to first use PGGB to use related species to get a first graph and then use graphaligner to map the reads on the PGGB graph? Or I say something completely nonsense?

Isoris · 2024-10-20T03:47:53Z

If I understand your questions correctly/partially, you would like to curate an assembly using assemblies of other related species modeled in a pangenome graph. It sounds like a task for tools able to align sequences against a graph, like Minigraph and GraphAligner, where you align your gapped assembly (scaffolds with NNNNNs for the gaps) against the other-species graphs.

PGGB can accommodate everything, but its first step is an all-vs-all alignment, and you don't want to put millions of reads in the input. Moreover, PGGB is a 'trash-in -> trash-out' pipeline, so if your reads are noisy, your noise will smear your output.

I smell PGGB could be used for scaffolding/gapclosing somehow, but we don't have a pipeline for that (we've never used it that way).

Yes the toolkit is complete there should be new applications of PGGB in the future to help genome assembly. Ragtag is quite limited.

AndreaGuarracino · 2024-10-26T16:53:46Z

A pangenome-based scaffolder would be hot, but I've never delved so deeply into the problems that I've been able to start hacking on them. Happy to chat separately more about that. PGGB+GraphAligner would make sense if the karyotypes are stable and veeeeeery similar between the different species.

AndreaGuarracino added the question Further information is requested label Oct 16, 2024

AndreaGuarracino closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For aligning chromosomes of different species over 100MYA or div is it better to use .masked files? #383

For aligning chromosomes of different species over 100MYA or div is it better to use .masked files? #383

Isoris commented Mar 20, 2024

AndreaGuarracino commented Oct 16, 2024

Isoris commented Oct 17, 2024 via email

AndreaGuarracino commented Oct 19, 2024

Isoris commented Oct 20, 2024 •

edited

Loading

Isoris commented Oct 20, 2024

AndreaGuarracino commented Oct 26, 2024 •

edited

Loading

For aligning chromosomes of different species over 100MYA or div is it better to use .masked files? #383

For aligning chromosomes of different species over 100MYA or div is it better to use .masked files? #383

Comments

Isoris commented Mar 20, 2024

AndreaGuarracino commented Oct 16, 2024

Isoris commented Oct 17, 2024 via email

AndreaGuarracino commented Oct 19, 2024

Isoris commented Oct 20, 2024 • edited Loading

Isoris commented Oct 20, 2024

AndreaGuarracino commented Oct 26, 2024 • edited Loading

Isoris commented Oct 20, 2024 •

edited

Loading

AndreaGuarracino commented Oct 26, 2024 •

edited

Loading