Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error "error: trying to access a snarl tree node of the wrong type" in haplotypes #4381

Open
wjwei-handsome opened this issue Aug 21, 2024 · 11 comments

Comments

@wjwei-handsome
Copy link

1. What were you trying to do?

Haplotype sampling

First step is preprocessing the graph

2. What did you want to happen?

Successfully generate sample.hapl file

3. What actually happened?

Loading GBZ from test.gbz
Generating haplotype information
Guessing that distance index is test.dist
Loading distance index from test.dist
Building minimizer index
Built the minimizer index in 918.913 seconds
Guessing that r-index is test.ri
Loading r-index from test.ri
Determining construction jobs
Using contig name chr12 for chain 0
Partitioned 1 components into 1 jobs in 2.37036 seconds
Running 16 jobs in parallel
error: [job 0]: error: trying to access a snarl tree node of the wrong type

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

vg haplotypes -v3 -t16 -H test.hapl test.gbz

6. What does running vg version say?

vg version v1.58.0 "Cartari"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by [email protected]

Interestingly, I encountered this problem when building dist index before: #3884

So I guess, is there some problem in the distance index when doing haplotypes sampling?

Looking forward to your reply :)

@jltsiren
Copy link
Contributor

jltsiren commented Sep 3, 2024

This sounds like an issue with the graph or the distance index. What kind of data do you have, how did you obtain/build the graph, and how did you obtain/build the distance index?

@wjwei-handsome
Copy link
Author

The graph is clip.gfa from the Minigraph-Cactus pipeline, and the distance index was built using the command vg index -t 32 -j test.clip.dist test.clip.gbz.

Additionally, I ran vg haplotype on the graph for each chromosome and found that only chr12 didn't work.

@jltsiren
Copy link
Contributor

jltsiren commented Sep 6, 2024

Can you share the graph? I don't think we can figure this out without it.

@wjwei-handsome
Copy link
Author

Of course, thank you very much for your assistance, which will greatly help us!

I sent a Google drive link to your work mailbox (uscs.edu), please let me know if you have other data needs.

@jltsiren
Copy link
Contributor

@xchang1 There seems to be something wrong with the distance index. I'm iterating over the only top-level chain. The last net handles that look correct correspond to nodes (22649686, reverse) and (22649687, reverse). The next node on the haplotypes is (23053794, reverse), which has a self-loop on the right side and a simple snarl on the left side. Instead, we arrive at (22649676, reverse), which has been visited a bit earlier. Then the error message comes from trying to get its parent with SnarlDistanceIndex::get_parent().

Here is the subgraph: subgraph.pdf

You can find the graph and the distance index at /private/groups/cgl/jlsiren/issue_4381.

@xchang1
Copy link
Contributor

xchang1 commented Sep 11, 2024

I just made a PR (#4395) that should fix this. I haven't tested it on the full graph yet though

@wjwei-handsome
Copy link
Author

Shocked by your speed and efficiency!

Thanks again for your help! @jltsiren @xchang1

I will try it on the full graph. If there are any follow-up questions, I will keep in touch with you.

BTW, Compiling the source code is still a struggle. I would be grateful if you could provide me with the compilation results of the latest repaired version. @xchang1

@xchang1
Copy link
Contributor

xchang1 commented Sep 12, 2024

Haha more like I wrote a lot of dumb bugs that are fast to fix once someone points them out but thanks! I hope it works

Here's a gzipped binary. It's for commit c5ff42, which is the current master branch plus my changes

vg.gz

@wjwei-handsome
Copy link
Author

Hi @xchang1

Unfortunately, when I tried the new version you provided in the full graph, the same error happened.

Using contig name GRch38.chr12 for chain 0
Partitioned 1 components into 1 jobs in 1.26687 seconds
Running 32 jobs in parallel
error: [job 0]: error: trying to access a snarl tree node of the wrong type

The version:

version v1.59.0-26-gc5ff4208e "Casatico"

Thank you very much for your help before, if you can continue to fix this stubborn error, I will be grateful!

@xchang1
Copy link
Contributor

xchang1 commented Sep 13, 2024

Ah sorry, I forgot to say, you have to rebuild the distance index. I'm still running it on the chr12 graph but it's getting farther than before at least

@wjwei-handsome
Copy link
Author

Oh, I should have thought of that! Sorry, I'll keep trying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants