-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timing out on minimap-nd tasks #203
Comments
Hi @wgallin . I'm still trying to figure out how to run NextDenovo in a HPC environment using SLURM. Would you be able to share your |
Hi,
So I ddi figure out why I was having a problem, and worked around it.
The basic problem was that when running in Grid mode SLURM allows to system administrators to set the wall time for jobs that are submitted without an explicit wall time value.
In my case about 10% of jobs in one step ran over that time, so the whole job crashed because these jobs would not complete.
The solution that I used was to run the job in LOCAL mode on a single node with 32 cpus, 256G RAM and a wall time that turned out otherwise be much longer than the job actually took (I requested 7 days but the job finished in less than 4 days).
If I had wanted to run the job in Grid mode I would have needed to be able to set the wall times for at least some of the individual sub-jobs, but I could not find a way to do that in the submission script.
So I guess the solution to my problem would have been to allow the submission script or the configuration file to feed a user-defined wall-time (and probably memory allocation) to the individual sub-jobs that the parent job spawns onto the grid.
Warren Gallin
On Apr 22, 2024, at 2:41 AM, DaniPaulo ***@***.***> wrote:
Hi @wgallin <https://github.com/wgallin> . I'm still trying to figure out how to run NextDenovo in a HPC environment using SLURM. Would you be able to share your script.slurm.sh with me?
—
Reply to this email directly, view it on GitHub <#203 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEW55K3DYL4ZKPKZBXMIPODY6TEJ7AVCNFSM6AAAAABFRYWY5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRYHAZDINBYG4>.
You are receiving this because you were mentioned.

|
Hi @wgallin, Thanks for your response. Let's see if I understand.
And your
Could you please verify this? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
My assembly job is failing with Time Limit being exceeded during some of the minimap-nd jobs
It appears that when parallel tasks are being run the time allocated to their running is shorter than it time it takes to complete them.
An example log entry for a single job ( it appears that 10 of these have failed out of 100 submitted) is shown here:
Error message
hostname
cd /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/03.raw_align.sh.work/raw_align100
( time /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/nextdenovo/2.5.2/bin/minimap2-nd --step 1 -I 3G -t 8 -x ava-ont /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembl
y/01.raw_align/input.seed.004.2bit /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.seed.004.2bit -o input.seed.004.2bit.99.ovl; )
aw_align/input.seed.004.2bit /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.seed.004.2bit -o input.seed.004.2bit.99.ovl
[M::mm_idx_gen::64.6861.84] collected minimizers
[M::mm_idx_gen::75.2002.64] sorted minimizers
[M::main::75.2002.64] loaded/built the index for 107322 target sequence(s)
[M::mm_mapopt_update::77.5442.59] mid_occ = 1212
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 107322
[M::mm_idx_stat::78.6582.57] distinct minimizers: 95367629 (42.05% are singletons); average occurrences: 8.194; average spacing: 2.931
[M::worker_pipeline::1280.7467.56] mapped 25749 sequences
[M::worker_pipeline::2627.600*7.78] mapped 20748 sequences
slurmstepd: error: *** JOB 18227135 ON gra1100 CANCELLED AT 2024-03-30T08:38:49 DUE TO TIME LIMIT ***
Genome characteristics
genome size, heterozygous rate, repeat content...
Input data This is the relevant part of the slurm.out file
[100999 INFO] 2024-03-30 02:52:07 NextDenovo start...
[100999 INFO] 2024-03-30 02:52:08 version:Unknown logfile:pid100999.log.info
[100999 WARNING] 2024-03-30 02:52:09 Re-write workdir
[100999 INFO] 2024-03-30 02:52:09 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/02.cns_align
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/03.ctg_graph
[100999 INFO] 2024-03-30 02:52:18 Total jobs: 1
[100999 INFO] 2024-03-30 02:52:18 Submitted jobID:[18223332] jobCmd:[/scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/01.db_stat.sh.work/db_stat1/Trial02.sh] in the slur
m_cycle.
[100999 INFO] 2024-03-30 02:54:20 db_stat done
[100999 INFO] 2024-03-30 02:54:20 updated options:
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: ont
job_type: slurm
input_type: raw
read_cutoff: 1k
pa_correction: 5
seed_cutfiles: 5
parallel_jobs: 32
seed_depth: 38.12
genome_size: 300m
seed_cutoff: 10000
job_prefix: Trial02
blocksize: 983465750
ctg_cns_options: -p 30
nextgraph_options: -a 1
sort_options: -m 50g -t 30 -k 40
minimap2_options_map: -x map-ont
minimap2_options_raw: -t 8 -x ava-ont
input_fofn: /scratch/wgallin/NextDeNovo_Test01/input.fofn
correction_options: -p 30 -max_lq_length 10000 -r ont -min_len_seed 5000
workdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly
minimap2_options_cns: -t 8 -x ava-ont -k 17 -w 17 --minlen 1000 --maxhan1 5000
raw_aligndir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align
cns_aligndir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/02.cns_align
ctg_graphdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/03.ctg_graph
[100999 INFO] 2024-03-30 02:54:20 summary of input data:
file: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.reads.stat
[Read length stat]
Types Count (#) Length (bp)
N10 49686 39610
N20 138374 24804
N30 277076 15991
N40 488598 10686
N50 795459 7571
N60 1219406 5562
N70 1792624 4116
N80 2576448 2961
N90 3705002 1970
Types Count (#) Bases (bp) Depth (X)
Raw 7575648 28638422273 95.46
Filtered 1971087 1286477110 4.29
Clean 5604561 27351945163 91.17
*Suggested seed_cutoff (genome size: 300.00Mb, expected seed depth: 45, real seed depth: 38.12): 10000 bp
Config file
[General]
job_type = slurm
job_prefix = Trial02
task = all
rewrite = yes
deltmp = yes
parallel_jobs = 32
input_type = raw
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = Trial_02_Ppen_NextDenovo_Assembly
[correct_option]
read_cutoff = 1k
genome_size = 300m # estimated genome size
sort_options = -m 50g -t 30
minimap2_options_raw = -t 8
pa_correction = 5
correction_options = -p 30
[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1
Operating system
LSB Version: n/a
Distributor ID: Gentoo
Description: Gentoo Base System release 2.6
Release: 2.6
Codename: n/a
GCC
gcc version 9.3.0 (GCC)
Python
3.11
NextDenovo
What version of NextDenovo are you using?
2.5.2
The text was updated successfully, but these errors were encountered: