-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fast update function #322
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - took me a while to understand the cluster merging, but looks good - sorry for not spotting this needed a review earlier. Is it worth addding a test for the fast pruning mode? Also based on the current CI tests it looks like the extractReferences
call in __main__
is missing a merged_queries
argument.
6305795
to
817b767
Compare
python /home/jlees/installs/PopPUNK/poppunk_assign-runner.py --db Salmonella_progressive/update_0 --previous-clustering Salmonella_core_threshold --distances sal_sketch40k/sal_sketch40k.dists --query Salmonella_query_files/salmonella_split.txtab --model-dir Salmonella_core_threshold --threads 16 --output Salmonella_progressive/update_1_fast --update-db fast
Graph-tools OpenMP parallelisation enabled: with 40 threads
Looking for existing sketches in Salmonella_progressive/update_1/update_1.h5
Loading previously refined model
Completed model loading
48180 refs 31935 queries
WARNING: versions of input databases sketches are different, results may not be compatible
Calculating distances using 40 thread(s)
Progress (CPU): 100.0%
Loading network from Salmonella_core_threshold/Salmonella_core_threshold_graph.gt
Network loaded: 48180 samples
Loading previous cluster assignments from Salmonella_core_threshold/Salmonella_core_threshold_clusters.csv
1538628300 assignments 48180 refs 31935 queries
109549943 tuples
Calculating all query-query distances
Calculating random match chances using Monte Carlo
Calculating distances using 40 thread(s)
Progress (CPU): 100.0%
509906145 assignments 80115 refs 80115 queries
37646978 tuples |
core_threshold: 1333 refs |
Most recent run:
then errors: multiprocessing.pool.MaybeEncodingError: Error sending result:
...
Reason: 'RuntimeError('Pickling of "graph_tool.libgraph_tool_core.Vertex" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)')' note some |
Fixes #321