Skip to content

Defect: no assumption can be made about MPI_Win opaque handler #801

@ggouaillardet

Description

@ggouaillardet
! syncall test
!
! Copyright (c) 2012-2014, Sourcery, Inc.
! All rights reserved.
!
! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions are met:
!     * Redistributions of source code must retain the above copyright
!       notice, this list of conditions and the following disclaimer.
!     * Redistributions in binary form must reproduce the above copyright
!       notice, this list of conditions and the following disclaimer in the
!       documentation and/or other materials provided with the distribution.
!     * Neither the name of the Sourcery, Inc., nor the
!       names of its contributors may be used to endorse or promote products
!       derived from this software without specific prior written permission.
!
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
! ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
! WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
! DISCLAIMED. IN NO EVENT SHALL SOURCERY, INC., BE LIABLE FOR ANY
! DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
! (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
! LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
! ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
! (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
!
program syncall
  implicit none

  integer :: me,np,i

  me = this_image()
  np = num_images()

  call mysyncall()

end program syncall

subroutine mysyncall()
  use mpi_f08
  implicit none

  integer :: me,np,i
  integer, allocatable, dimension(:), codimension[:]  :: scalar2
  logical :: success = .true.

  integer(c_int), allocatable :: tally(:)
  integer :: rank, size
  integer :: base
  integer(kind=mpi_address_kind) :: sz
  type(MPI_Win) :: win = mpi_win_null
  call mpi_comm_rank(mpi_comm_world, rank)
  call mpi_comm_size(mpi_comm_world, size)
  print *,"hello ", rank, " / ", size
  base = 100
  sz = 4096
  ! comment the line below to hide the issue
  if (rank.eq.1) call mpi_win_create(base, sz, 4, mpi_info_null, mpi_comm_self, win)

  me = this_image()
  np = num_images()
  allocate(scalar2(1)[*])
  scalar2(1) = -1

  if(me /= 1) call sleep(1)

  scalar2(1) = 1

  sync all

  if(me == 1) then
     do i=1,np
        if(scalar2(1)[i] /= 1) then
           success = .false.
        endif
     end do
  end if

  if(me == 1) then
    if (success) then
      print *,'Test passed.'
    else
      print *,'Test failed.'
    endif
  endif

  if (win.ne.mpi_win_null) call mpi_win_free(win)
end

The program above fails when ran on two nodes with gfortran-15, the latest OpenCoarray and MPICH
The root cause is that the MPICH implementation does not guarantee all ranks of the same window will have the same MPI_Win opaque handler, even if one might be lucky most of the time.

Refs #800

The title of the issue should start with Defect: followed by a
succinct title.

  • [x ] I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.10.2-32-g3d0fa68
  • Fortran Compiler: gfortran 15.1.0
  • C compiler used for building lib: gcc 15.1.0
  • Installation method: cmake && make && make install
  • All flags & options passed to the installer
  • Output of uname -a: 4.18.0-553.22.1.el8_10.aarch64
  • MPI library being used: MPICH 4.3.1
  • Machine architecture and number of physical cores: A64fx 48+2 cores
  • Version of CMake: 3.26.5

To help us debug your issue please explain:

What you were trying to do (and why)

What happened (include command output, screenshots, logs, etc.)

$ cafrun -n 2 ./a.out
 Test failed.

What you expected to happen

$ cafrun -n 2 ./a.out
 Test passed.

Step-by-step reproduction instructions to reproduce the error/bug

$ caf -I/opt/mpich-4.3.1/include ns.f90
$ cafrun -n 2 ./a.out

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions