You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have been running nextdenovo (v2.4.0) for the last couple of months for a de novo genome assembly (total genome size of 2.5 GB). The program took two months to finish the jobs within 02.cns_align.sh.work but everything ran ok. However, once it moved to 03.ctg_graph it stopped with a segmentation fault at the /01.ctg_graph.sh.work/ctg_graph0.
Error message
#$ tail -f 10 pid7227.log.info
"[INFO] 2023-03-25 08:09:41,174 Submit jobID:[61788] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align23/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-26 22:35:08,091 Submit jobID:[48347] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align24/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-28 17:27:35,526 Submit jobID:[52069] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align25/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-29 15:05:12,344 Submit jobID:[54523] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align26/nextDenovo.sh] in the local_cycle.
[INFO] 2023-04-06 12:48:15,650 Submit jobID:[60690] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align27/nextDenovo.sh] in the local_cycle.
[INFO] 2023-04-15 11:51:34,786 cns_align done
[INFO] 2023-04-15 11:51:39,917 Total jobs: 1
[INFO] 2023-04-15 11:51:39,929 Submit jobID:[15206] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh] in the local_cycle.
[ERROR] 2023-04-15 11:51:59,560 ctg_graph failed: please check the following logs:
[ERROR] 2023-04-15 11:51:59,561 /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e"
hostname
cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
/NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
/NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta
[INFO] 2023-04-15 11:51:40 Initialize graph and reading...
Segmentation fault (core dumped)"
Genome characteristics
`genome size ~2.5Gb,
heterozygous rate - I have not estimated this since I only have PacBio reads. However, by experience in the organism in question I would say it has a moderate to low heterozigozity
repeat content - based on a closely related species it should be ~50% of the genome assembly
Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
GCC
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-5.4.0/configure --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 5.4.0 (GCC)
Python
Python 3.8.5
NextDenovo
nextDenovo v2.4.0
To Reproduce (Optional)
In my past attempts I was able to run the software successfully, with both PacBio CLR and PacBio Hi-Fi reads. So I don't think i can recreat the problem with a smaller dataset. Sorry :/
Additional context (Optional)
Note that it did not run out of memory, so I am a little bit puzzled about what the error is about.
In the past, I have successfully run Nextdenovo (same version) for a similar genome without this problem. The only difference I can notice between the two projects is that in the past I provide the reads in fasta format (one file), while now I used fastq (two files). In both cases PacBio CLR.
I have found a similar issue on github (#86) but unfortunately, there was no solutions available.
Would you kindly let me know if you managed to solve this issue?
Also, I have never been able to restart a stopped job. How can I do this? In the FAQ’s is said to
“simply run the same command” but when I tried it, it created a backup of all the previous folders and start the assembly all over again from the beginning.
I hope you can help me,
Cheers,
André
The text was updated successfully, but these errors were encountered:
rewrite = no # yes/no means NextDenovo can not overwrite the existed work directory, so it has to create a backup of all the previous folders and start the assembly all over again from the beginning.
Describe the bug
Have been running nextdenovo (v2.4.0) for the last couple of months for a de novo genome assembly (total genome size of 2.5 GB). The program took two months to finish the jobs within 02.cns_align.sh.work but everything ran ok. However, once it moved to 03.ctg_graph it stopped with a segmentation fault at the /01.ctg_graph.sh.work/ctg_graph0.
Error message
#$ tail -f 10 pid7227.log.info
"[INFO] 2023-03-25 08:09:41,174 Submit jobID:[61788] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align23/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-26 22:35:08,091 Submit jobID:[48347] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align24/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-28 17:27:35,526 Submit jobID:[52069] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align25/nextDenovo.sh] in the local_cycle.
[INFO] 2023-03-29 15:05:12,344 Submit jobID:[54523] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align26/nextDenovo.sh] in the local_cycle.
[INFO] 2023-04-06 12:48:15,650 Submit jobID:[60690] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align27/nextDenovo.sh] in the local_cycle.
[INFO] 2023-04-15 11:51:34,786 cns_align done
[INFO] 2023-04-15 11:51:39,917 Total jobs: 1
[INFO] 2023-04-15 11:51:39,929 Submit jobID:[15206] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh] in the local_cycle.
[ERROR] 2023-04-15 11:51:59,560 ctg_graph failed: please check the following logs:
[ERROR] 2023-04-15 11:51:59,561 /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e"
#$ cat /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e
"hostname
cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
/NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
[INFO] 2023-04-15 11:51:40 Initialize graph and reading...
Segmentation fault (core dumped)"
Genome characteristics
`genome size ~2.5Gb,
heterozygous rate - I have not estimated this since I only have PacBio reads. However, by experience in the organism in question I would say it has a moderate to low heterozigozity
repeat content - based on a closely related species it should be ~50% of the genome assembly
Input data
`Types Count (#) Length (bp)
N10 320113 32018
N20 795312 25252
N30 1371150 21441
N40 2038774 18706
N50 2799212 16482
N60 3664213 14381
N70 4677464 11970
N80 5939270 9241
N90 7694302 6074
Types Count (#) Bases (bp) Depth (X)
Raw 13506201 134176569075 53.67
Filtered 1746043 777160708 0.31
Clean 11760158 133399408367 53.36
`
Config file
“[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = no # yes/no
deltmp = yes
parallel_jobs = 7 # number of tasks used to run in parallel - M/64 here, 64 can optimize to 32~64
input_type = raw # raw, corrected
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = nexdenovo_assembly
[correct_option]
read_cutoff = 1k
genome_size = 2.5g # estimated genome size
sort_options = -m 40g -t 5 # -m TOTAL_INPUT_BASES * 1.2/4g -t P/pa_correction
minimap2_options_raw = -t 4 # -t P/parallel_jobs
pa_correction = 5 # M/(TOTAL_INPUT_BASES * 1.2/4)
correction_options = -p 3 # -p P/pa_correction
[assemble_option]
minimap2_options_cns = -t 4 # -t P/parallel_jobs
nextgraph_options = -a 1"
Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
GCC
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-5.4.0/configure --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 5.4.0 (GCC)
Python
Python 3.8.5
NextDenovo
nextDenovo v2.4.0
To Reproduce (Optional)
In my past attempts I was able to run the software successfully, with both PacBio CLR and PacBio Hi-Fi reads. So I don't think i can recreat the problem with a smaller dataset. Sorry :/
Additional context (Optional)
Note that it did not run out of memory, so I am a little bit puzzled about what the error is about.
In the past, I have successfully run Nextdenovo (same version) for a similar genome without this problem. The only difference I can notice between the two projects is that in the past I provide the reads in fasta format (one file), while now I used fastq (two files). In both cases PacBio CLR.
I have found a similar issue on github (#86) but unfortunately, there was no solutions available.
Would you kindly let me know if you managed to solve this issue?
Also, I have never been able to restart a stopped job. How can I do this? In the FAQ’s is said to
“simply run the same command” but when I tried it, it created a backup of all the previous folders and start the assembly all over again from the beginning.
I hope you can help me,
Cheers,
André
The text was updated successfully, but these errors were encountered: