Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Dec 15, 2025

User description

User description

Description

This has changes to resolve the MPI mode on the hipergator system

Fixes #1056

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Scope

How Has This Been Tested?

I ran various examples on the hipergator system with NVHPC/25.9 and OpenMPI/5.0.7 loaded

Checklist


PR Type

Bug fix


Description

  • Fix MPI I/O operations by specifying array element indices

  • Remove debug print statement from serial data reading

  • Update hipergator module configuration for NVHPC/25.9

  • Simplify MPI I/O data size parameter in one code path


Diagram Walkthrough

flowchart LR
  A["MPI I/O Operations"] -->|"Add array indices"| B["MPI_FILE_READ/WRITE calls"]
  C["Debug Code"] -->|"Remove"| D["Print statement"]
  E["Hipergator Config"] -->|"Update modules"| F["NVHPC/25.9 & OpenMPI/5.0.7"]
  B --> G["Fixed MPI I/O"]
Loading

File Walkthrough

Relevant files
Bug fix
m_data_input.f90
Remove debug print statement                                                         

src/post_process/m_data_input.f90

  • Removed debug print statement that output q_cons_vf(i)%sf(:, 0, 0)
    after reading data
+0/-1     
m_data_output.fpp
Fix MPI I/O array indexing in write operations                     

src/pre_process/m_data_output.fpp

  • Changed MPI_FILE_WRITE_ALL calls to specify array element (1, 1, 1)
    instead of entire array
  • Applied fix to three separate MPI write operations for different
    variable groups
  • Simplified data size parameter in one write call from
    data_size*mpi_io_type to data_size
+4/-3     
m_start_up.fpp
Fix MPI I/O array indexing in read operations                       

src/simulation/m_start_up.fpp

  • Changed MPI_FILE_READ and MPI_FILE_READ_ALL calls to specify array
    element (1, 1, 1) instead of entire array
  • Applied fix to four separate MPI read operations for different
    variable groups
  • Simplified data size parameter in one read call from
    data_size*mpi_io_type to data_size
+6/-5     
Configuration changes
modules
Update hipergator module configuration                                     

toolchain/modules

  • Replaced manual MPI path configuration with module load openmpi/5.0.7
  • Removed hardcoded CUDA and OpenMPI paths for NVHPC/25.3
  • Updated LD_LIBRARY_PATH to use NVHPC/25.9 math libraries
  • Added UCX network device configuration for InfiniBand optimization
+4/-7     


CodeAnt-AI Description

Fix MPI file I/O alignment, remove stray debug print, and update Hipergator modules

What Changed

  • MPI file read/write now target the first element of each variable array when performing parallel I/O so files are read/written with correct offsets (fixes misaligned or corrupted MPI I/O).
  • Some MPI read/write calls use the corrected data-size argument so collective I/O uses the proper byte count.
  • Removed a debug print that previously produced extra console output when reading serial data files.
  • Updated the Hipergator module configuration to load NVHPC 25.9 and OpenMPI 5.0.7 and set library and UCX device environment entries for GPU runs.

Impact

✅ Fewer MPI I/O data corruption errors during checkpoint/load
✅ Clearer console output when loading serial data (no stray debug prints)
✅ Works with Hipergator NVHPC 25.9 + OpenMPI 5.0.7 GPU runs

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • Chores
    • Removed a runtime debug print to reduce console noise during startup and I/O.
    • Adjusted MPI file read/write buffer handling to make parallel I/O more explicit and consistent.
    • Updated GPU/module environment and library path settings to align the runtime/toolchain configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

@codeant-ai
Copy link

codeant-ai bot commented Dec 15, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Removed a debug print in post-process input; changed MPI I/O buffer arguments to use explicit subarray indexing (...%sf(1,1,1)) for multiple reads/writes in startup and output; and updated the toolchain modules file to prefer openmpi/5.0.7, adjust CUDA/LD_LIBRARY_PATH/UCX environment variables and Python to 3.12. No public interfaces changed.

Changes

Cohort / File(s) Summary
Fortran I/O changes
src/post_process/m_data_input.f90, src/pre_process/m_data_output.fpp, src/simulation/m_start_up.fpp
Removed a single runtime debug print in m_data_input.f90. Replaced MPI read/write buffer arguments from MPI_IO_DATA%var(i)%sf to MPI_IO_DATA%var(i)%sf(1,1,1) across serial and parallel branches; data sizes and control flow unchanged.
Toolchain / modules
toolchain/modules
Replaced CUDA/HPC-OMPI–centric environment with openmpi/5.0.7 and python/3.12, removed CUDA_HOME/HPC_OMPI vars, set MFC_CUDA_CC=100, updated LD_LIBRARY_PATH to NVHPC 25.9 math libs, and added UCX_NET_DEVICES. No API changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify MPI buffer addresses/subarray usage matches Fortran array memory layout and any MPI derived datatype requirements.
  • Confirm removed debug print is not relied on by tests or scripts.
  • Validate module changes on target systems (HiPerGator/CI) for MPI+GPU runs.

Possibly related PRs

Suggested labels

Review effort 3/5, size:M

Suggested reviewers

  • sbryngelson

Poem

🐰
I hopped through buffers, careful and neat,
trimmed a print, aligned reads so they meet.
OpenMPI wakes, UCX on the way,
the little rabbit cheered — code saves the day! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Debug hipergator mpi' is vague and doesn't clearly summarize the actual changes made to the codebase. Use a more descriptive title that reflects the main changes, such as 'Fix MPI I/O alignment and update Hipergator modules for NVHPC/25.9' or 'Resolve MPI issues on Hipergator by fixing I/O operations and updating module configuration'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive, includes issue reference (#1056), identifies bug fix type, explains testing approach, and provides detailed explanations of changes across all modified files.
Linked Issues check ✅ Passed The PR directly addresses issue #1056 by fixing MPI I/O operations (array indexing in read/write calls), removing debug output, and updating Hipergator module configuration for NVHPC/25.9 and OpenMPI/5.0.7 compatibility.
Out of Scope Changes check ✅ Passed All changes are directly related to resolving MPI on Hipergator: MPI I/O fixes in data read/write operations, debug cleanup in data input, and module configuration updates for the target system.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa1a187 and a0b3f00.

📒 Files selected for processing (1)
  • toolchain/modules (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • toolchain/modules
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1056 - Partially compliant

Compliant requirements:

  • Resolve via changes to the toolchain modules file.

Non-compliant requirements:

  • Fix HiPerGator GPU runs so they work with MPI enabled.
  • Definition of done: an example can be run successfully with MPI enabled on HiPerGator.

Requires further human verification:

  • Re-run the exact reproduction steps on HiPerGator and confirm GPU+MPI works end-to-end.
  • Confirm multiple examples (and multi-rank runs) work, not just a single case/config.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The new MPI-IO calls pass MPI_IO_DATA%var(i)%sf(1, 1, 1) as the buffer start. Elsewhere arrays appear to be read with 0:m,0:n,0:p bounds, so hardcoding (1,1,1) may skip the first plane or even be out-of-bounds depending on the array lower bounds. Consider using lbound-based indexing (e.g., sf(lbound(sf,1), lbound(sf,2), lbound(sf,3))) or ensuring the array is definitively 1-based in all builds.

    call MPI_FILE_WRITE_ALL(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size*mpi_io_type, &
                            mpi_io_p, status, ierr)
end do
!Additional variables pb and mv for non-polytropic qbmm
if (qbmm .and. .not. polytropic) then
    do i = sys_size + 1, sys_size + 2*nb*nnode
        var_MOK = int(i, MPI_OFFSET_KIND)

        ! Initial displacement to skip at beginning of file
        disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1)

        call MPI_FILE_SET_VIEW(ifile, disp, mpi_io_p, MPI_IO_DATA%view(i), &
                               'native', mpi_info_int, ierr)
        call MPI_FILE_WRITE_ALL(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size*mpi_io_type, &
                                mpi_io_p, status, ierr)
    end do
Possible Issue

In one path the count argument changed from data_size*mpi_io_type to data_size for MPI_FILE_READ_ALL. Ensure data_size is in units of the datatype passed (elements of mpi_io_p) and not bytes; otherwise this can under/over-read and cause MPI-IO hangs/corruption on some MPI implementations.

    call MPI_FILE_READ_ALL(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size, &
                           mpi_io_p, status, ierr)
end do
Config Risk

The module config now hardcodes UCX_NET_DEVICES and injects an NVHPC math-lib LD_LIBRARY_PATH path. This may be site-specific and brittle (device names can vary per node/partition), and can override user environment unexpectedly. Consider making UCX selection conditional/optional, or documenting why these exact devices are required on HiPerGator; also verify OpenMPI module load already sets the needed runtime library paths to avoid conflicts.

h-gpu openmpi/5.0.7
h-gpu MFC_CUDA_CC=100 
h-gpu LD_LIBRARY_PATH=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/math_libs/12.9/lib64:$LD_LIBRARY_PATH
h-gpu UCX_NET_DEVICES="mlx5_4:1,mlx5_7:1,mlx5_8:1,mlx5_9:1,mlx5_10:1,mlx5_13:1,mlx5_14:1,mlx5_15:1"

@codeant-ai codeant-ai bot added the size:S This PR changes 10-29 lines, ignoring generated files label Dec 15, 2025
Comment on lines +768 to 769
call MPI_FILE_WRITE_ALL(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size*mpi_io_type, &
mpi_io_p, status, ierr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In the MPI_FILE_WRITE_ALL call, correct the count parameter from data_size*mpi_io_type to data_size to prevent writing an incorrect amount of data and ensure consistency with other changes in the PR. [possible issue, importance: 9]

Suggested change
call MPI_FILE_WRITE_ALL(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size*mpi_io_type, &
mpi_io_p, status, ierr)
call MPI_FILE_WRITE_ALL(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size, &
mpi_io_p, status, ierr)

Comment on lines +682 to 683
call MPI_FILE_READ(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size*mpi_io_type, &
mpi_io_p, status, ierr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In the MPI_FILE_READ call, correct the count parameter from data_size*mpi_io_type to data_size to prevent reading an incorrect amount of data and ensure consistency with other changes in the PR. [possible issue, importance: 9]

Suggested change
call MPI_FILE_READ(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size*mpi_io_type, &
mpi_io_p, status, ierr)
call MPI_FILE_READ(ifile, MPI_IO_DATA%var(i)%sf(1, 1, 1), data_size, &
mpi_io_p, status, ierr)

@codeant-ai
Copy link

codeant-ai bot commented Dec 15, 2025

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Array lower-bound assumption
    New MPI read calls pass the buffer as MPI_IO_DATA%var(i)%sf(1, 1, 1). Many arrays in this module are allocated with non-1 lower bounds (e.g. -1:...), so using (1,1,1) will skip or misplace data (or write out-of-bounds), causing incorrect initial conditions or memory corruption. Verify and use the actual lower bounds or pass the whole array pointer consistently.

  • Hard-coded buffer start
    The new MPI write calls use the hard-coded array element (...%sf(1, 1, 1)) as the buffer start. If the array sf has non-1 lower bounds (e.g. -1 or 0) or different allocations for downsampled buffers, indexing (1,1,1) can be out-of-bounds or point to the wrong memory location. This must be validated for every allocation pattern used by MPI_IO_DATA%var(i)%sf.

  • Inconsistent MPI count units
    The code uses different forms for the MPI write count argument: some calls pass data_size*mpi_io_type while one call passes data_size. It's unclear whether mpi_io_type is a bytes-per-element multiplier or an MPI datatype handle. If the count is in "number of elements" but is multiplied by bytes-per-element, writes will be incorrect or fail. Verify expected units for the count parameter and make them consistent across all MPI_FILE_WRITE_ALL calls.

  • Buffer contiguity / file view alignment
    The code relies on passing a starting element and previously used MPI_FILE_SET_VIEW to position the file pointer. Ensure the memory layout and MPI file view match (contiguity, element order, datatype) so that MPI writes contiguous data from memory into the correct file offsets. If arrays are non-contiguous (slices or derived types), consider creating an MPI derived datatype or packing into a contiguous buffer first.

  • MPI count inconsistency
    Most MPI calls use a count of data_size*mpi_io_type, but one new MPI_FILE_READ_ALL call uses data_size (no multiplication by mpi_io_type). This mismatch between count and datatype likely causes too-small/too-large reads and corrupts data. Ensure the count unit matches the MPI datatype semantics across all branches.

@codeant-ai
Copy link

codeant-ai bot commented Dec 15, 2025

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/pre_process/m_data_output.fpp (1)

768-783: MPI write buffers: sf(1,1,1) change looks correct; please re‑check count usage.

Switching the MPI-IO buffer from MPI_IO_DATA%var(i)%sf to MPI_IO_DATA%var(i)%sf(1, 1, 1) is a good fix here: it passes the base element of the contiguous array and is consistent with the corresponding MPI read changes in m_start_up, which should help with NVHPC/OpenMPI 5 interfaces. The rest of the call signature (including data_size*mpi_io_type in the bubbles_euler branches) remains coherent.

In the non‑bubbles_euler parallel-IO branch (Line 796), you still use data_size (without *mpi_io_type) as the count, whereas other MPI-IO calls with mpi_io_p use data_size*mpi_io_type. That asymmetry predates this PR, but since you’re already touching these sites, it may be worth confirming that mpi_io_type == 1 is guaranteed in this path; otherwise, aligning this call with the others would make the intent clearer.

Also applies to: 796-797

toolchain/modules (1)

87-90: HiPerGator GPU module lines match the tested stack; consider minor robustness tweaks.

The new h-gpu stanza (NVHPC 25.9 + openmpi/5.0.7, MFC_CUDA_CC=100, NVHPC math_libs in LD_LIBRARY_PATH, and UCX device selection) matches the environment described in the PR and is a reasonable way to encode the working setup into mfc.sh.

Two optional improvements you might consider:

  • Instead of hard-coding /apps/compilers/nvhpc/25.9/.../math_libs/12.9/lib64, derive this from the NVHPC module (for example via $NVHPC) so that future NVHPC minor updates don’t require editing this file.
  • Periodically re-check that the UCX_NET_DEVICES setting matches current HiPerGator RC recommendations, since network topology and best-practice UCX configs can change over time.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae42aa and 9d8edcc.

📒 Files selected for processing (4)
  • src/post_process/m_data_input.f90 (0 hunks)
  • src/pre_process/m_data_output.fpp (3 hunks)
  • src/simulation/m_start_up.fpp (4 hunks)
  • toolchain/modules (1 hunks)
💤 Files with no reviewable changes (1)
  • src/post_process/m_data_input.f90
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{fpp,f90}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{fpp,f90}: Use 2-space indentation; continuation lines align beneath &
Use lower-case keywords and intrinsics (do, end subroutine, etc.)
Name modules with m_ pattern (e.g., m_transport)
Name public subroutines with s_ pattern (e.g., s_compute_flux)
Name public functions with f
_ pattern
Keep subroutine size ≤ 500 lines, helper subroutines ≤ 150 lines, functions ≤ 100 lines, files ≤ 1000 lines
Limit routine arguments to ≤ 6; use derived-type params struct if more are needed
Forbid goto statements (except in legacy code), COMMON blocks, and save globals
Every argument must have explicit intent; use dimension/allocatable/pointer as appropriate
Call s_mpi_abort() for errors, never use stop or error stop

**/*.{fpp,f90}: Indent 2 spaces; continuation lines align under &
Use lower-case keywords and intrinsics (do, end subroutine, etc.)
Name modules with m_<feature> prefix (e.g., m_transport)
Name public subroutines as s_<verb>_<noun> (e.g., s_compute_flux) and functions as f_<verb>_<noun>
Keep private helpers in the module; avoid nested procedures
Enforce size limits: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000
Limit subroutines to ≤ 6 arguments; otherwise pass a derived-type 'params' struct
Avoid goto statements (except unavoidable legacy); avoid global state (COMMON, save)
Every variable must have intent(in|out|inout) specification and appropriate dimension / allocatable / pointer
Use s_mpi_abort(<msg>) for error termination instead of stop
Use !> style documentation for header comments; follow Doxygen Fortran format with !! @param and !! @return tags
Use implicit none statement in all modules
Use private declaration followed by explicit public exports in modules
Use derived types with pointers for encapsulation (e.g., pointer, dimension(:,:,:) => null())
Use pure and elemental attributes for side-effect-free functions; combine them for array ...

Files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
src/simulation/**/*.{fpp,f90}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

src/simulation/**/*.{fpp,f90}: Wrap tight GPU loops with !$acc parallel loop gang vector default(present) reduction(...); add collapse(n) when safe; declare loop-local variables with private(...)
Allocate large GPU arrays with managed memory or move them into persistent !$acc enter data regions at start-up
Avoid stop/error stop inside GPU device code
Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers

src/simulation/**/*.{fpp,f90}: Mark GPU-callable helpers with $:GPU_ROUTINE(function_name='...', parallelism='[seq]') immediately after declaration
Do not use OpenACC or OpenMP directives directly; use Fypp macros from src/common/include/parallel_macros.fpp instead
Wrap tight loops with $:GPU_PARALLEL_FOR(private='[...]', copy='[...]') macro; add collapse=n for safe nested loop merging
Declare loop-local variables with private='[...]' in GPU parallel loop macros
Allocate large arrays with managed or move them into a persistent $:GPU_ENTER_DATA(...) region at start-up
Do not place stop or error stop inside device code

Files:

  • src/simulation/m_start_up.fpp
src/**/*.fpp

📄 CodeRabbit inference engine (.cursor/rules/mfc-agent-rules.mdc)

src/**/*.fpp: Use .fpp file extension for Fypp preprocessed files; CMake transpiles them to .f90
Start module files with Fypp include for macros: #:include 'macros.fpp'
Use the fypp ASSERT macro for validating conditions: @:ASSERT(predicate, message)
Use fypp macro @:ALLOCATE(var1, var2) for device-aware allocation instead of standard Fortran allocate
Use fypp macro @:DEALLOCATE(var1, var2) for device-aware deallocation instead of standard Fortran deallocate

Files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
🧠 Learnings (15)
📓 Common learnings
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Avoid stop/error stop inside GPU device code
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight GPU loops with !$acc parallel loop gang vector default(present) reduction(...); add collapse(n) when safe; declare loop-local variables with private(...)
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Do not use OpenACC or OpenMP directives directly; use Fypp macros from `src/common/include/parallel_macros.fpp` instead
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight loops with `$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')` macro; add `collapse=n` for safe nested loop merging
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Mark GPU-callable helpers with `$:GPU_ROUTINE(function_name='...', parallelism='[seq]')` immediately after declaration
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers

Applied to files:

  • toolchain/modules
  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Allocate large arrays with `managed` or move them into a persistent `$:GPU_ENTER_DATA(...)` region at start-up

Applied to files:

  • src/simulation/m_start_up.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Allocate large GPU arrays with managed memory or move them into persistent !$acc enter data regions at start-up

Applied to files:

  • src/simulation/m_start_up.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Do not use OpenACC or OpenMP directives directly; use Fypp macros from `src/common/include/parallel_macros.fpp` instead

Applied to files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight GPU loops with !$acc parallel loop gang vector default(present) reduction(...); add collapse(n) when safe; declare loop-local variables with private(...)

Applied to files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Use `s_mpi_abort(<msg>)` for error termination instead of `stop`

Applied to files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Avoid stop/error stop inside GPU device code

Applied to files:

  • src/simulation/m_start_up.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight loops with `$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')` macro; add `collapse=n` for safe nested loop merging

Applied to files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to **/*.{fpp,f90} : Call s_mpi_abort(<msg>) for errors, never use stop or error stop

Applied to files:

  • src/simulation/m_start_up.fpp
  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Declare loop-local variables with `private='[...]'` in GPU parallel loop macros

Applied to files:

  • src/simulation/m_start_up.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Use `wp` (working precision) parameter from `m_precision_select` instead of hardcoded precision like `real*8`

Applied to files:

  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to **/*.{fpp,f90} : Every argument must have explicit intent; use dimension/allocatable/pointer as appropriate

Applied to files:

  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Each variable should have one clear purpose; do not use the same variable for multiple purposes

Applied to files:

  • src/pre_process/m_data_output.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Every variable must have `intent(in|out|inout)` specification and appropriate `dimension` / `allocatable` / `pointer`

Applied to files:

  • src/pre_process/m_data_output.fpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Detect File Changes
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Build & Publish
🔇 Additional comments (1)
src/simulation/m_start_up.fpp (1)

682-692: MPI read buffers: sf(1,1,1) usage is consistent with writes; verify mpi_io_type assumptions.

Updating the MPI-IO reads to use MPI_IO_DATA%var(i)%sf(1, 1, 1) (both in the file-per-process branch and the global restart branch) is aligned with the corresponding write-side changes and should address the Fortran/MPI buffer descriptor issues seen with NVHPC + OpenMPI 5.0.7. Read/write symmetry for all MPI_IO_DATA variables (including qbmm pb/mv) now looks correct.

As in m_data_output, note that:

  • bubbles_euler/elasticity branches use data_size*mpi_io_type as the count;
  • the non‑bubbles branch with MPI_FILE_READ_ALL uses data_size only.

That pattern matches the existing write code, so it’s not a new behavioral change, but given these MPI-IO fixes are specifically targeting a portability bug, it may be worth confirming that mpi_io_type is always 1 in the non‑bubbles case, or otherwise harmonizing the count for clarity.

Also applies to: 828-841, 853-855

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Prompt for AI agents (all 2 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/pre_process/m_data_output.fpp">

<violation number="1" location="src/pre_process/m_data_output.fpp:796">
P2: Inconsistent MPI write count parameter: this call uses `data_size` while other MPI_FILE_WRITE_ALL calls in this file use `data_size*mpi_io_type`. This inconsistency could cause incorrect write sizes or runtime MPI errors. Ensure all MPI write operations use the same count/datatype convention.</violation>
</file>

<file name="src/simulation/m_start_up.fpp">

<violation number="1" location="src/simulation/m_start_up.fpp:853">
P2: Inconsistent MPI read count parameter: this call uses `data_size` while other MPI_FILE_READ/MPI_FILE_READ_ALL calls in this file use `data_size*mpi_io_type`. This mismatch could cause incorrect read sizes and data corruption. Ensure all MPI read operations use a consistent count/datatype convention.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@codecov
Copy link

codecov bot commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.29%. Comparing base (8ae42aa) to head (a0b3f00).

Files with missing lines Patch % Lines
src/simulation/m_start_up.fpp 0.00% 5 Missing ⚠️
src/pre_process/m_data_output.fpp 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1089      +/-   ##
==========================================
- Coverage   44.08%   42.29%   -1.79%     
==========================================
  Files          71       71              
  Lines       20332    20423      +91     
  Branches     1981     1982       +1     
==========================================
- Hits         8963     8638     -325     
- Misses      10236    10702     +466     
+ Partials     1133     1083      -50     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danieljvickers
Copy link
Member Author

Every single benchmark failed, which doesn't seem like spurious CI to me... Really odd to pass the test suite and have this minor change result in each benchmark failing. Any way I can get access to the specific logs for one of these benchmarks? The raw longs point to some other file that aren't available through here.

@danieljvickers
Copy link
Member Author

@sbryngelson All of the benchmarks are failing with code 143 for a seg fault. I can't see the logs for Phoenix directly on here, but the Frontier logs are seg faulting when building pre_process. I just hopped on Frontier and built the cases fine with --case-optimization, GPU, MPI, etc to match the bench test.

Since it was every single benchmark on both machines, I don't think it was a random fluke, but I am running them again just to be safe. Do you have any thoughts on this in the mean time?

@sbryngelson
Copy link
Member

I'm not sure what's going on here. Have you tried debugging this? For example, you could revert your changes and see if you still experience failures.

@danieljvickers
Copy link
Member Author

I manually got on frontier earlier today and I was not able to recreate the compiler seg fault that we are seeing in the logs. However the code did not work on cray compilers, but appears to be gone with NVHPC on Phoenix.

It makes sense because the test suite doesn't exercise any of the changes (because the test suite runs without parallel_io). So it seems like this fox won't work with cray compilers sadly. This all ties into the conversation with you, me, and Mat earlier on slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 4/5 size:S This PR changes 10-29 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

MPI Broken on HiPerGator

2 participants