-
Notifications
You must be signed in to change notification settings - Fork 171
Description
What happened?
I was testing some CAM cases with a fixed CTSM version for ne0ARCTICne30x4 and noticed this test:
SMS_D_Ln9_P1280x1.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.FHIST.derecho_intel.cam-outfrq9s
still fails. But, it looks like it has too few processors, because increasing it to 5120 or to the default PE layout for this grid does work.
What are the steps to reproduce the bug?
Use, what will be ctsm5.3.059 (ESCOMP/CTSM#2950) in cesm3_0_alpha07a and run the above test.
It looks like there are a couple tests in the testlist that use 1280 tasks and they just need to be bumped up to a higher task count.
What CAM tag were you using?
cam6_4_089
What machine were you running CAM on?
CISL machine (e.g. cheyenne)
What compiler were you using?
Intel
Path to a case directory, if applicable
No response
Will you be addressing this bug yourself?
Any CAM SE can do this
Extra info
These two tests both work:
SMS_D_Ln9.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.FHIST.derecho_intel.cam-outfrq9s
SMS_D_Ln9_P5120x1.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.FHIST.derecho_intel.cam-outfrq9s
However, the one with 1280 tasks fails with the following in the cesm log files. It looks like it's failing in an ESMF regrid operation in CTSM, so it likely just ran out of memory.
cesm.log:
dec0225.hsn.de.hpc.ucar.edu 0: (t_initf) profile_single_file= F
dec0225.hsn.de.hpc.ucar.edu 0: (t_initf) profile_global_stats= T
dec0225.hsn.de.hpc.ucar.edu 0: (t_initf) profile_ovhd_measurement= F
dec0225.hsn.de.hpc.ucar.edu 0: (t_initf) profile_add_detail= F
dec0225.hsn.de.hpc.ucar.edu 0: (t_initf) profile_papi_enable= F
dec0233.hsn.de.hpc.ucar.edu 463: forrtl: error (65): floating invalid
dec0233.hsn.de.hpc.ucar.edu 463: Image PC Routine Line Source
dec0233.hsn.de.hpc.ucar.edu 463: libpthread-2.31.s 00001489D4B598C0 Unknown Unknown Unknown
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC2C539D exec_psssDstRra<d 6567 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC2AAC5B psssDstRra<double 6539 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC2A6271 psssDstRra<double 6503 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC2913CE psssDstRra<double 6463 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC258D2B psssDstRra<int, i 6423 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC247802 exec 4842 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC246955 exec 4410 ESMCI_DELayout.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC0F989E sparseMatMulStore 11399 ESMCI_Array.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC0EDEC1 tSparseMatMulStor 9603 ESMCI_Array.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC0EAD83 sparseMatMulStore 8896 ESMCI_Array.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC1B536F c_esmc_arraysmmst 1105 ESMCI_Array_F.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC7E0402 ESMCI_regrid_crea 639 ESMCI_Mesh_Regrid_Glue.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC7280A4 regrid_create 1658 ESMCI_MeshCap.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC84A0E6 c_esmc_regrid_cre 93 ESMCI_Regrid_F.C
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DDA2E211 c_esmc_regrid_cre 0 ESMF_Regrid.F90
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DDA2C69B esmf_regridstore 360 ESMF_Regrid.F90
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DD32A747 esmf_fieldregrids 1238 ESMF_FieldRegrid.F90
dec0233.hsn.de.hpc.ucar.edu 463: cesm.exe 00000000080CE668 lnd_set_decomp_an 497 lnd_set_decomp_and_domain.F90
dec0233.hsn.de.hpc.ucar.edu 463: cesm.exe 00000000080C6E22 lnd_set_decomp_an 128 lnd_set_decomp_and_domain.F90
dec0233.hsn.de.hpc.ucar.edu 463: cesm.exe 0000000008098E13 lnd_comp_nuopc_mp 645 lnd_comp_nuopc.F90
dec0233.hsn.de.hpc.ucar.edu 463: libesmf.so 00001489DC450219 callVFuncPtr 2187 ESMCI_FTable.C
dec0233.hsn.de.hpc.ucar.edu 475: forrtl: error (65): floating invalid
lnd.log:
Input land mesh file /glade/campaign/cesm/cesmdata/inputdata/share/meshes/ne0ARCTICne30x4_ESMFmesh_c20200727.nc
Input mask mesh file /glade/campaign/cesm/cesmdata/inputdata/share/meshes/tx0.1v2_ESMFmesh_cd5_c20210105.nc
Obtaining land mask and fraction from mask file /glade/campaign/cesm/cesmdata/inputdata/share/meshes/tx0.1v2_ESMFmesh_cd5_c20210105.nc
Attempting to read global dimensions from surface dataset
(GETFIL): attempting to find local file
surfdata_ne0np4.ARCTIC.ne30x4_hist_1979_78pfts_c240908.nc
(GETFIL): using /glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/surfdata_esmf/ctsm5.3.0/surfdata_ne0np4.ARCTIC.ne30x4_hist_1979_78pfts_c240908.nc
global ni,nj = 117398 1
model grid is not 2-dimensional
Computing land fraction and land mask by mapping mask from mesh_mask file
Metadata
Metadata
Assignees
Labels
Type
Projects
Status