-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembly is running around a month and going strong - or is it stalled? #170
Comments
Hi, could you paste the content of some files: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align*/nextDenovo.sh.e to here? |
Sure! Here is the last one (09 to 36 are like this one - only an sh script in the folder): /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align36/nextDenovo.sh #!/bin/bash |
And here is the first one (01-08 are similar to this): hostname
|
Try to increase |
Ok! I'll kill things and restart fresh.... |
Actually - rather than starting totally fresh, I updated the run file and just deleted folders 2 and 3 to save a little time and see how things go with your update more quickly. Given the set up - how long would you expect things to run - just so I can know when its going over. |
I don't know how long it will take , but you can try it first. No need to start fresh, just continue running from the breakpoint . You can also set PS: try to check the value ( |
minimap2-nd started up - first 8 of 36 jobs again - mid_occ is now under 1000 at 427 - jobs are still running under status S but that might be ok. hostname
|
Well - things haven't advanced past the first 8 jobs after several days now. It feels like it might be similar to before... Is there anything you might suggest I check or try? |
You can try to increase |
Good news - I left things running and one of the initial 8 jobs finished after 4 days and a second one after 5 - so two new jobs now running - and I'm hoping the next 6 will finish soon to advance through the remaining 26 jobs or so - currently they are at 850 CPU hours each. Seems like it will be 2-3 weeks to finish them all. |
The remaining 6 jobs of the initial round finished at around 7 days / 1300 CPUs hours per 7-CPU job - so second round of 4-5 rounds is now fully underway. Not sure if this is normal timeframe for ~human genome size and ~45x coverage with 60 CPUs and half a Tb RAM. I'm estimating 5 weeks for this stage in the pipeline start to finish. As long as it can finish, I'm very happy! If you have ideas for making it more efficient without going outside what is tested / known on your side for the parameters - I'd love to hear - but also I think you might have covered everything. Thank you for your help on this! |
The running time is largely determined by genome complexity and input data size. For a ~human genome, it usually completes within 1-2 days. Obviously, the genome you assembled is highly repetitive (you can check this by k-mer spectrum using short reads), so you can try wtdbg2, which should be able to finish assembly very quickly. |
Wow - so this is really running long already and still weeks to go. I'm actually trying to improve on a wtdbg2 assembly - do you think NextDenovo is likely to offer improvement? It did great with the sponge data and was super fast. A different octopus with ONT reads took around 4-5 weeks a few months ago and it seemed reasonably good overall - very good considering the data going in I thought. I'll probably let it finish regardless out of curiosity at this point, as long as the machine isn't needed otherwise. I can update how it goes! Thanks again. |
The assembly result is hard to say, because the genome you assembled is not normal, and the default parameters may not be suitable. But anyway, wait to finish this assembly task first. |
Describe the bug
Unsure if assembly of octopus (human-sized) genome with 43x seed is active or stalled after running almost a month with 500 Gb RAM 60 CPU and 2 Tb disk.
Error message
There is no error but a month ago I used NextDenovo on the machine to successfully assemble a sponge genome 1/10 the size overnight - vs - the current octopus genome is only 10x larger but running very long now.
Memory on 8 jobs running with ~7 CPUs are each cycling between 3.8 to 5.6 Gb RAM over hours - so seems like it could be active - using a very steady 87% of all CPUs on machine and 40% of memory. However Glances and Top indicate a stalled status of S running the MiniMap2-nd step (see attached screenshots). Every once and a while one of the jobs will drop for minutes to maybe an hour from 7 to 1 CPU - but then return to 7.
Previous jobs unrelated to NextDenovo sometimes have a status of S but finish no problem - so I wasn't sure how critical the status is - it is a very steady S.
I previously restarted the job after 2 weeks, given it was more than 10x longer in run time than sponge at that point - but restart went almost all the way back to the beginning, as there is no output / update from the minimap2-nd step. And I did a fresh start with a few short (minute or less) initial restarts before the current month-long run - so fresh from the initial 2-week run.
The last pid log readout indicates 36 jobs for cns_align.sh - with the largest job number of the 8 jobs at the start being 59306 (see below). Within a day or so the largest job was 59311 (see screenshot) - suggesting nextDenovo is on the last round of jobs to reach the allotted 36 - but then things have simply stayed here for weeks.
Here are details on this:
[59245 INFO] 2023-02-24 12:04:29 skip step: db_split
[59245 INFO] 2023-02-24 12:04:29 skip step: raw_align
[59245 INFO] 2023-02-24 12:04:29 skip step: sort_align
[59245 INFO] 2023-02-24 12:04:29 skip step: seed_cns
[59245 INFO] 2023-02-24 12:04:29 seed_cns finished, and final corrected reads file:
[59245 INFO] 2023-02-24 12:04:29 ESC[35m /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns*/cns.fasta ESC[0m
[59245 INFO] 2023-02-24 12:04:29 Total jobs: 36
[59245 INFO] 2023-02-24 12:04:29 Submitted jobID:[59246] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:29 Submitted jobID:[59252] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align02/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:30 Submitted jobID:[59261] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align03/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:30 Submitted jobID:[59270] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align04/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:31 Submitted jobID:[59279] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align05/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:31 Submitted jobID:[59288] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align06/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:32 Submitted jobID:[59297] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align07/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:32 Submitted jobID:[59306] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align08/nextDenovo.sh] in the local_cycle.
Ram usage is 40% and CPU usage is 87% - the general set up is similar but rescaled to the new genome size from what I did for sponge. I wonder if somehow my calculations might have been off and its doesn't have the resources to output or finish at this point...?
Genome characteristics
Genome size is estimated around 3 Gb - high repeat content - likely high heterozygosity.
Input data
Total base count, sequencing depth, average/N50 read length...
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: clr
job_type: local
input_type: raw
parallel_jobs: 8
read_cutoff: 15k
pa_correction: 7
seed_cutfiles: 7
seed_depth: 43.64
genome_size: 2.8g
seed_cutoff: 15001
blocksize: 11726373
job_prefix: nextDenovo
ctg_cns_options: -p 7
nextgraph_options: -a 1
sort_options: -m 70g -t 8 -k 38
minimap2_options_map: -x map-pb
minimap2_options_raw: -t 8 -x ava-pb
correction_options: -p 7 -max_lq_length 1000 -min_len_seed 7500
minimap2_options_cns: -t 7 -x ava-pb -k 17 -w 17 --minlen 1500 --maxhan1 5000
input_fofn: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/input.fofn
workdir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly
raw_aligndir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/01.raw_align
cns_aligndir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align
ctg_graphdir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/03.ctg_graph
[59245 INFO] 2023-02-24 12:04:29 summary of input data:
file:ESC[35m /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/01.raw_align/input.reads.stat ESC[0m
[Read length stat]
Types Count (#) Length (bp)
N10 266015 39329
N20 608403 32855
N30 1007493 28710
N40 1458869 25618
N50 1961372 23141
N60 2515311 21068
N70 3122091 19277
N80 3783903 17704
N90 4503641 16293
Types Count (#) Bases (bp) Depth (X)
Raw 28758338 245628751872 87.72
Filtered 23472855 123430622516 44.08
Clean 5285483 122198129356 43.64
*Suggested seed_cutoff (genome size: 2800.00Mb, expected seed depth: 45, real seed depth: 43.64): 15001 bp
Config file
Please paste the complete content of the Config file (
run.cfg
) to here.[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 8 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = output/3-nextDenovo-assembly
[correct_option]
read_cutoff = 15k
genome_size = 2.8g # estimated genome size
sort_options = -m 70g -t 8
minimap2_options_raw = -t 8
pa_correction = 7 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 7
[assemble_option]
minimap2_options_cns = -t 7
nextgraph_options = -a 1
see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters
Operating system
Which operating system and version are you using?
You can use the command
lsb_release -a
to get it.Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
GCC
What version of GCC are you using?
You can use the command
gcc -v
to get it.Salk :) gcc -v
Reading specs from /nadata/mnlsc/home/eedsinger/anaconda3/bin/../lib/gcc/x86_64-conda-linux-gnu/7.5.0/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/nadata/mnlsc/home/eedsinger/anaconda3/bin/../libexec/gcc/x86_64-conda-linux-gnu/7.5.0/lto-wrapper
Target: x86_64-conda-linux-gnu
Configured with: /home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=x86_64-conda-linux-gnu --prefix=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/gcc_built --with-sysroot=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/gcc_built/x86_64-conda-linux-gnu/sysroot --enable-languages=c,c++,fortran,objc,obj-c++ --with-pkgversion='crosstool-NG 1.24.0.131_87df0e6_dirty' --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --enable-libquadmath --enable-libquadmath-support --enable-libsanitizer --enable-libmpx --with-gmp=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --with-mpfr=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --with-mpc=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --with-isl=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --enable-lto --enable-threads=posix --enable-target-optspace --enable-plugin --enable-gold --disable-nls --disable-multilib --with-local-prefix=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/gcc_built/x86_64-conda-linux-gnu/sysroot --enable-long-long --enable-default-pie
Thread model: posix
gcc version 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
Python
What version of Python are you using?
You can use the command
python --version
to get it.Python 3.8.12
NextDenovo
What version of NextDenovo are you using?
You can use the command
nextDenovo -v
to get it.nextDenovo v2.5.0
Any suggestions would be greatly appreciated - NextDenovo did simply fantastic on sponge - just not sure what is going on now with octopus. Some sort of user error but I am just stuck as to what it might be at this point.
Thank you very much :)
Eric
The text was updated successfully, but these errors were encountered: