-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Description
Dear SCHISM users,
I am using two-way SCHISM-WWM model and trying to activate the "write hotfile", however I am getting the following "segmentation" error.
My wwminput.nml looks like so :
&HOTFILE
LHOTF = T ! Write hotfile
FILEHOT_OUT = 'wwm_hot_out' !'.nc' suffix will be added
BEGTC = '20170916.120000' !Starting time of hotfile writing. With ihot!=0 in SCHISM,
!this will be whatever the new hotstarted time is (even with ihot=2)
DELTC = 2400 ! time between hotfile writes
UNITC = 'SEC' ! unit used above
ENDTC = '20170917.120000' ! Ending time of hotfile writing (adjust with BEGTC)
the Fortran error backtrace looks like :
[c007:1568611:0:1568611] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x154bd7cfdac0)
==== backtrace (tid:1568611) ====
0 0x000000000005f10c ucs_callbackq_cleanup() ???:0
1 0x000000000005f2ca ucs_callbackq_cleanup() ???:0
2 0x0000000000054db0 __GI___sigaction() :0
3 0x00000000000caa0c __memcpy_evex_unaligned_erms() :0
4 0x0000000000057c5e opal_generic_simple_unpack() ???:0
5 0x0000000000007c36 pml_ucx_generic_datatype_unpack() pml_ucx_datatype.c:0
6 0x0000000000070a9a ucp_proto_rndv_handle_data() ???:0
7 0x000000000005cc11 ucs_callbackq_add_safe() ???:0
8 0x00000000000475ea ucp_worker_progress() ???:0
9 0x0000000000003817 mca_pml_ucx_progress() ???:0
10 0x000000000003abbc opal_progress() ???:0
11 0x000000000005201d ompi_request_default_wait_all() ???:0
12 0x000000000009408f MPI_Waitall() ???:0
13 0x000000000005b173 mpi_waitall__() ???:0
14 0x000000000048aa74 __wwm_hotfile_mod_MOD_setup_return_ac_varoned() ???:0
15 0x000000000048dd35 __wwm_hotfile_mod_MOD_output_hotfile() ???:0
16 0x00000000005bc138 general_output_() ???:0
17 0x00000000005f8a0b un_steady_() ???:0
18 0x00000000005fb2ff wwm_ii_() ???:0
19 0x000000000067d9b9 schism_step_() ???:0
20 0x0000000000436140 schism_main_() ???:0
21 0x0000000000436281 MAIN__() schism_driver.F90:0
22 0x0000000000405d2d main() ???:0
23 0x000000000003feb0 __libc_start_call_main() ???:0
24 0x000000000003ff60 __libc_start_main_alias_2() :0
25 0x0000000000405d65 _start() ???:0
=================================
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x155554c258b2 in ???
#1 0x155554c24a45 in ???
#2 0x155554854daf in ???
#3 0x1555548caa0c in ???
#4 0x155553c8ac5d in ???
#5 0x155553428c35 in ???
#6 0x155552e9ea99 in ???
#7 0x155552485c10 in ???
#8 0x155552e755e9 in ???
#9 0x155553424816 in ???
#10 0x155553c6dbbb in ???
#11 0x155554ffb01c in ???
#12 0x15555503d08e in ???
#13 0x155555130172 in ???
#14 0x48aa73 in ???
#15 0x48dd34 in ???
#16 0x5bc137 in ???
#17 0x5f8a0a in ???
#18 0x5fb2fe in ???
#19 0x67d9b8 in ???
#20 0x43613f in ???
#21 0x436280 in ???
#22 0x405d2c in ???
#23 0x15555483feaf in ???
#24 0x15555483ff5f in ???
#25 0x405d64 in ???
#26 0xffffffffffffffff in ???
srun: error: c007: task 0: Segmentation fault (core dumped)
slurmstepd: error: mpi/pmix_v4: _errhandler: c007 [0]: pmixp_client_v2.c:211: Error handler invoked: status = -61, source = [slurm.pmix.35173.0:0]
slurmstepd: error: *** STEP 35173.0 ON c007 CANCELLED AT 2024-12-27T13:05:32 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: c009: tasks 128-191: Killed
srun: error: c008: tasks 64-127: Killed
srun: error: c007: tasks 1-63: Killed
I have had this problem across different machines and compilers.
I would like to resolve this so as to be able to write a wwm hotstart file at the end of my run.
Can you please advise ?
felicio93
Metadata
Metadata
Assignees
Labels
No labels