Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast update function #322

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Add fast update function #322

wants to merge 16 commits into from

Conversation

johnlees
Copy link
Member

Fixes #321

Copy link
Collaborator

@nickjcroucher nickjcroucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - took me a while to understand the cluster merging, but looks good - sorry for not spotting this needed a review earlier. Is it worth addding a test for the fast pruning mode? Also based on the current CI tests it looks like the extractReferences call in __main__ is missing a merged_queries argument.

PopPUNK/assign.py Show resolved Hide resolved
PopPUNK/network.py Show resolved Hide resolved
@nickjcroucher nickjcroucher mentioned this pull request Oct 9, 2024
3 tasks
@johnlees johnlees changed the title unword consonants as list Add fast update function Dec 17, 2024
@johnlees
Copy link
Member Author

johnlees commented Jan 6, 2025

python /home/jlees/installs/PopPUNK/poppunk_assign-runner.py --db Salmonella_progressive/update_0 --previous-clustering Salmonella_core_threshold --distances sal_sketch40k/sal_sketch40k.dists --query Salmonella_query_files/salmonella_split.txtab --model-dir Salmonella_core_threshold --threads 16 --output Salmonella_progressive/update_1_fast --update-db fast

Graph-tools OpenMP parallelisation enabled: with 40 threads
Looking for existing sketches in Salmonella_progressive/update_1/update_1.h5
Loading previously refined model
Completed model loading
48180 refs 31935 queries

WARNING: versions of input databases sketches are different, results may not be compatible
Calculating distances using 40 thread(s)
Progress (CPU): 100.0%
Loading network from Salmonella_core_threshold/Salmonella_core_threshold_graph.gt
Network loaded: 48180 samples
Loading previous cluster assignments from Salmonella_core_threshold/Salmonella_core_threshold_clusters.csv
1538628300 assignments 48180 refs 31935 queries

109549943 tuples

Calculating all query-query distances
Calculating random match chances using Monte Carlo
Calculating distances using 40 thread(s)
Progress (CPU): 100.0%

509906145 assignments 80115 refs 80115 queries

37646978 tuples

@johnlees
Copy link
Member Author

johnlees commented Jan 7, 2025

core_threshold: 1333 refs
update_0: 1746 refs

@johnlees
Copy link
Member Author

johnlees commented Jan 7, 2025

Most recent run:

Loading previously refined model
Completed model loading
1333 refs 31935 queries

WARNING: versions of input databases sketches are different, results may not be compatible
Calculating distances using 16 thread(s)
Progress (CPU): 100.0%
Loading network from Salmonella_core_threshold/Salmonella_core_threshold.refs_graph.gt
Network loaded: 1333 samples
Loading previous cluster assignments from Salmonella_core_threshold/Salmonella_core_threshold_clusters.csv
42569355 assignments 1333 refs 31935 queries

Calculating all query-query distances
Calculating random match chances using Monte Carlo
Calculating distances using 16 thread(s)
Progress (CPU): 100.0%
509906145 assignments 33268 refs 33268 queries

Clusters 16,995 have merged into 16_995
Clusters 82,168 have merged into 82_168
Clusters 136,550 have merged into 136_550
Clusters 231,563 have merged into 231_563
Clusters 243,466 have merged into 243_466
Clusters 320,1052 have merged into 320_1052
Clusters 359,666 have merged into 359_666
Clusters 348,464 have merged into 348_464
Clusters 329,756 have merged into 329_756
Clusters 428,696 have merged into 428_696
Clusters 620,887 have merged into 620_887
Updating reference database to Salmonella_progressive/update_1_fast
Updating random match chances
Calculating random match chances using Monte Carlo
Saving model and network
Finding references (fast)
Running quick reference picking

then errors:

multiprocessing.pool.MaybeEncodingError: Error sending result:
...
Reason: 'RuntimeError('Pickling of "graph_tool.libgraph_tool_core.Vertex" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)')'

note some <invalid Vertex object at 0x7f7986d6e440> could be causing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--fit-model error
2 participants