Skip to content

Code crash loading partitioned mesh with some number of mesh partitions #159

@zhangchonglin

Description

@zhangchonglin

While loading gmsh mesh file with partitioned mesh:

  • when some number of mesh partitions is used, the mesh loading will crash.
  • with other number of mesh partitions, the mesh loading works fine.
  • this is confirmed with the Comet code loading several different mesh files and partitions: some work fine, some crashed.
  • it is also confirmed when loading the same mesh and partition file using ptn_loading unit test: the crash behavior is the same as above.

I will include a test mesh file separately.

Stack trace from the core dump file using ptn_loading, which is very similar from stack trace generated with Comet code.

#0  0x000014beee11f6c8 in PMPI_Irecv () from /opt/cray/pe/lib64/libmpi_gnu_123.so.12
#1  0x000014bef05c511a in MPI_Irecv (buf=0x4271e544, count=592923, datatype=-1946157051, source=<optimized out>, tag=<optimized out>, comm=<optimized out>, request=<optimized out>) at darshan-apmpi.c:842
#2  0x0000000000881fdc in pumipic::ParticleBalancer::ParticleBalancer(pumipic::Mesh&) ()
#3  0x000000000083a77e in pumipic::Mesh::constructPICPart(Omega_h::Mesh&, std::shared_ptr<Omega_h::Comm>, Omega_h::Read<int>, Omega_h::Write<int>, Omega_h::Write<int>, bool) ()
#4  0x000000000083cb1c in pumipic::Mesh::Mesh(Omega_h::Mesh&, Omega_h::Read<int>, int, int) ()
#5  0x00000000004279f5 in main ()

Job submission scripts on Polaris using 4 mesh partitions:

mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth \
--env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./set_affinity_gpu_polaris.sh \
./ptn_loading 2d_cylinder.msh 2d_cylinder_4.ptn 1 3   

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions