Skip to content

Commit

Permalink
v2.3.0
Browse files Browse the repository at this point in the history
Update release notes and README
  • Loading branch information
soumagne committed Jun 6, 2023
1 parent b289d53 commit 7fe6422
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 110 deletions.
146 changes: 64 additions & 82 deletions Documentation/CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,12 @@ This version brings bug fixes and updates to our v2.0.0 release.

## New features

<span style="color:lightblue">Added in rc5</span>

- __[HG/NA]__
- Add `HG_Init_opt2()` / `HG_Core_init_opt2()` / `NA_Initialize_opt2()` to
safely pass updated init info while maintaining ABI compatibility between
versions
- __[CMake]__
- Add `NA_INSTALL_PLUGIN_DIR` variable to control plugin install path

---
<span style="color:lightblue">Added in rc4</span>

- __[HG]__
- Add `HG_Context_unpost()` / `HG_Core_context_unpost()` for optional
2-step context shutdown
---
<span style="color:lightblue">Added in rc2</span>

- __[HG Test]__
- Perf test now supports multi-client / multi-server workloads
- Add `BUILD_TESTING_UNIT` and `BUILD_TESTING_PERF` CMake options
- __[NA OFI]__
- Add support for libfabric log redirection
- Requires libfabric >= 1.16.0, disabled if FI_LOG_LEVEL is set
- Add `libfabric` log subsys (off by default)
- Bump FI_VERSION to 1.13 when log redirection is supported
- __[HG util]__
- Add HG_LOG_WRITE_FUNC() macro to pass func/line info
- Add also `module` / `no_return` parameters to hg_log_write()
- Remove `HG_ATOMIC_VAR_INIT` (deprecated)
---
<span style="color:lightblue">Added in rc1</span>

- Add `HG_Get_na_protocol_info()` / `HG_Free_na_protocol_info()` and add
`hg_info` utility for basic listing of protocols
- __[HG]__
- Add support for multi-recv operations (OFI plugin only)
- Currently disable multi-recv when auto SM is on
Expand All @@ -50,6 +23,8 @@ This version brings bug fixes and updates to our v2.0.0 release.
- Make use of subsys logs (`cls`, `ctx`, `addr`, `rpc`, `poll`) to control
log output
- Add init info struct versioning
- Add `HG_Context_unpost()` / `HG_Core_context_unpost()` for optional
2-step context shutdown
- __[HG bulk]__
- Update to new logging system through `bulk` subsys log.
- __[HG proc]__
Expand All @@ -58,23 +33,28 @@ This version brings bug fixes and updates to our v2.0.0 release.
- Refactor tests to separate perf tests from unit tests
- Add NA/HG test common library
- Add `hg_rate` / `hg_bw_write` and `hg_bw_read` perf tests
- Install perf tests if `BUILD_TESTING` is `ON`
- Perf test now supports multi-client / multi-server workloads
- Add `BUILD_TESTING_UNIT` and `BUILD_TESTING_PERF` CMake options
- __[NA]__
- Add support for multi-recv operations
- Add `NA_Msg_multi_recv_unexpected()` and
`na_cb_info_multi_recv_unexpected` cb info
- Add `flags` parameter to `NA_Op_create()` and `NA_Msg_buf_alloc()`
- Add `NA_Has_opt_feature()` to query multi recv capability
- Remove int return type from NA callbacks and return void
- Remove `int` return type from NA callbacks and return `void`
- Remove unused `timeout` parameter from `NA_Trigger()`
- `NA_Addr_free()` / `NA_Mem_handle_free()` and `NA_Op_destroy()` now
return void
- `na_mem_handle_t` and `na_addr_t` types to no longer include pointer type
- Add `NA_PLUGIN_PATH` env variable to optionally control plugin loading
path
- Add `NA_DEFAULT_PLUGIN_PATH` CMake option to control default plugin path
(default is lib install path)
- Add `NA_USE_DYNAMIC_PLUGINS` CMake option (OFF by default)
return `void`
- `na_mem_handle_t` and `na_addr_t` types no longer include pointer type
- Add support for dynamically loaded plugins
- Add `NA_PLUGIN_PATH` env variable to optionally control plugin loading
path (default is `NA_INSTALL_PLUGIN_DIR`)
- Add `NA_INSTALL_PLUGIN_DIR` variable to control plugin install path
(default is lib install path)
- Add `NA_USE_DYNAMIC_PLUGINS` CMake option (OFF by default)
- Add ability to query protocol info from plugins
- Add `NA_Get_protocol_info()`/`NA_Free_protocol_info()` API routines
- Add `na_protocol_info` struct to na_types
- Bump NA library version to 4.0.0
- __[NA OFI]__
- Add support for multi-recv operations and use `FI_MSG`
Expand All @@ -94,6 +74,11 @@ This version brings bug fixes and updates to our v2.0.0 release.
- Add support for `tcp` with and without `ofi_rxm`
- `tcp` defaults to `tcp;ofi_rxm` for libfabric < 1.18
- Enable plugin to be built as a dynamic plugin
- Add support for `get_protocol_info` to query list of protocols
- Add support for libfabric log redirection
- Requires libfabric >= 1.16.0, disabled if FI_LOG_LEVEL is set
- Add `libfabric` log subsys (off by default)
- Bump FI_VERSION to 1.13 when log redirection is supported
- __[NA UCX]__
- Attempt to disable UCX backtrace if `UCX_HANDLE_ERRORS` is not set
- Add support for `UCP_EP_PARAM_FIELD_LOCAL_SOCK_ADDR`
Expand All @@ -104,101 +89,98 @@ This version brings bug fixes and updates to our v2.0.0 release.
- Attempt to reconnect EP if disconnected
- This concerns cases where a peer would have reappeared after a
previous disconnection
- Add support for `get_protocol_info` to query list of protocols
- Enable plugin to be built as a dynamic plugin
- __[NA Test]__
- Update NA test perf to use multi-recv feature
- Update perf test to use hugepages
- Add support for multi-targets and add lookup test
- Install perf tests if `BUILD_TESTING` is `ON`
- Install perf tests if `BUILD_TESTING_PERF` is `ON`
- __[HG util]__
- Change return type of `hg_time_less()` to be `bool`
- Change return type of `hg_time_less()` to `bool`
- Add `HG_LOG_WRITE_FUNC()` macro to pass func/line info
- Add also `module` / `no_return` parameters to hg_log_write()
- Add support for hugepage allocations
- Use `isb` for `cpu_spinwait` on `aarch64`
- Add `mercury_dl` to support dynamically loaded modules
- Bump HG util version to 4.0.0

## Bug fixes

<span style="color:lightblue">Added in rc5</span>

- __[Examples]__
- Allow examples to build without Boost support
- __[CMake]__
- Fix internal/external dependencies that were not correctly set
- Fix pkg-config entries wrongly set as public/private

<span style="color:lightblue">Added in rc4</span>

- __[NA OFI]__
- Add runtime version check
- Ensure that runtime version is greater than min version
- Replace prov/tcp compile check by runtime check
- __[NA SM]__
- Fix issue where an expected msg that is no longer posted arrives
- In that particular case just drop the incoming msg
---
<span style="color:lightblue">Added in rc3</span>

- __[NA OFI]__
- Log redirection requires libfabric >= 1.16.0
---
<span style="color:lightblue">Added in rc2</span>

- __[HG/NA]__
- Ensure init info version is compatible
- __[NA OFI]__
- Fix handling of extra caps to not always follow advertised caps
- Pass `FI_COMPLETION` to RMA ops as flag is currently not ignored
(`prov/opx` tmp fix)
- __[CMake]__
- Ensure `VERSION`/`SOVERSION` is not set on `MODULE` libraries
- Allow for in-source builds (RPM support)
- Add missing `DL` lib dependency
- Fix object target linking on CMake < 3.12
- Ensure we build with PIC and PIE when available
---
<span style="color:lightblue">Added in rc1</span>

- __[HG]__
- Ensure init info version is compatible with previous versions of the struct
- Clean up and refactoring fixes
- Fix race condition in `hg_core_forward` with debug enabled
- Simplify RPC map and fix hashing for RPC IDs larger than 32-bit integer
- Refactor context pools and cleanup
- Fix potential leak on ack buffer
- Ensure list of created RPC handles is empty before closing context
- Bump pre-allocated requests to 512 to make use of 2M hugepages
- Bump default number of pre-allocated requests from 256 to 512 to make use
of 2M hugepages by default
- Add extra error checking to prevent class mismatch
- Fix potential race when sending one-way RPCs to ourself
- __[HG Bulk]__
- Add extra error checking to prevent class mismatch
- __[HG Test]__
- Refactor `test_rpc` to correctly handle timeout return values
- Fix overflow of number of target / classes
- Number of targets was limited to `UINT8_MAX`
- __[NA OFI]__
- Fix handling of extra caps to not always follow advertised caps
- Ensure also that extra caps passed are honored by provider
- Force `sockets` provider to use shared domains
- This prevents a performance regression when multiple classes are
being used (`FI_THREAD_DOMAIN` is therefore disabled for this provider)
- Refactor unexpected and expected sends, retry of OFI operations, handling
of RMA operations
- Always include `FI_DIRECTED_RECV` in primary caps
- Disable use of `FI_SOURCE` for most providers to reduce lookup overhead
- Separate code paths for providers that do not support `FI_SOURCE`
- Remove insert of FI addr into secondary table if `FI_SOURCE` is
not used
- Remove `NA_OFI_SOURCE_MSG` flag that was matching `FI_SOURCE_ERR`
- Fix potential refcount race when sharing domains
- Check domain's optimal MR count if non-zero
- Fix potential double free of src_addr info
- Refactor auth key parsing code to build without extension headers
- Merge latest changes required for `opx` provider enablement
- Pass `FI_COMPLETION` to RMA ops as flag is currently not ignored
(`prov/opx` tmp fix)
- Add runtime version check
- Ensure that runtime version is greater than min version
- __[NA SM]__
- Fix handling of 0-size messages when no receive has been posted
- Fix handling of 0-size messages when no receive has been posted
- Fix issue where an expected msg that is no longer posted arrives
- In that particular case just drop the incoming msg
- Add perf warning message for unexpected messages without recv posted
- __[NA UCX]__
- Fix handling of UCS return types to match NA types
- Enforce src_addr port used for connections to be 0
- This fixes a port conflict between listener and connection ports
- Fix handling of unexpected messages without pre-posted recv
- __[NA BMI]__
- Clean up and fix some coverity warnings
- __[NA MPI]__
- Clean up and fix some coverity warnings
- __[NA Test]__
- Fix NA latency test to ensure recvs are always pre-posted
- Do not use MPI_Init_thread() if not needed
- Fix missing return check of na_test_mpi_init()
- __[HG util]__
- Clean up logging and set log root to `hg_all`
- `hg_all` subsys can now be set to turn on logging in all subsystems
- Set log subsys to `hg_all` if log level env is set
- Fixes to support WIN32 builds
- __[CMake]__
- Fix internal/external dependencies that were not correctly set
- Fix pkg-config entries wrongly set as public/private
- Ensure `VERSION`/`SOVERSION` is not set on `MODULE` libraries
- Allow for in-source builds (RPM support)
- Add `DL` lib dependency
- Fix object target linking on CMake < 3.12
- Ensure we build with PIC and PIE when available
- __[Examples]__
- Allow examples to build without Boost support

## :warning: Known Issues

Expand Down
46 changes: 19 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,16 @@ Architectures supported
Architectures supported by MPI implementations are generally supported by the
network abstraction layer.

The OFI libfabric plugin as well as the SM plugin
are stable and provide the best performance in most workloads. Libfabric
providers currently supported are: `tcp`, `verbs`, `gni`, `cxi`.
The OFI libfabric plugin as well as the shared-memory (SM) plugin
are stable and provide the best performance in most workloads.

The UCX plugin is also available as an alternative transport on platforms
for which libfabric is either not available or not recommended to use,
currently supported protocols are tcp and verbs.
for which libfabric is either not available or not recommended to use.

MPI and BMI (tcp) plugins are still supported but gradually being moved as
deprecated, therefore should only be used as fallback methods.
The CCI plugin is deprecated and no longer supported.
For both OFI and UCX plugins, please run the `hg_info` command for a list of
available transports on the system.

MPI, CCI and BMI plugins are deprecated and no longer supported.

See the [plugin requirements](#plugin-requirements) section for
plugin requirement details.
Expand All @@ -63,26 +62,13 @@ instructions available on this [page][libfabric].
To make use of the UCX plugin, please refer to the UCX build
instructions available on this [page][ucx].

To make use of the native NA SM (shared-memory) plugin on Linux,
To make use of the native NA shared-memory (SM) plugin on Linux,
the cross-memory attach (CMA) feature introduced in kernel v3.2 is required.
The yama security module must also be configured to allow remote process memory
to be accessed (see this [page][yama]). On MacOS, code signing with inclusion of
the na_sm.plist file into the binary is currently required to allow process
memory to be accessed.

To make use of the BMI plugin, the most convenient way is to install it through
spack or one can also do:

git clone https://github.com/radix-io/bmi.git && cd bmi
./prepare && ./configure --enable-shared --enable-bmi-only
make && make install

To make use of the MPI plugin, Mercury requires a _well-configured_ MPI
implementation (MPICH2 v1.4.1 or higher / OpenMPI v1.6 or higher) with
`MPI_THREAD_MULTIPLE` available on targets that will accept remote
connections. Processes that are _not_ accepting incoming connections are
_not_ required to have a multithreaded level of execution.

Optional requirements
---------------------

Expand Down Expand Up @@ -124,15 +110,18 @@ Type `'c'` multiple times and choose suitable options. Recommended options are:
BUILD_SHARED_LIBS ON (or OFF if the library you link
against requires static libraries)
BUILD_TESTING ON/OFF
BUILD_TESTING_PERF ON/OFF
BUILD_TESTING_UNIT ON/OFF
Boost_INCLUDE_DIR /path/to/include/directory
CMAKE_INSTALL_PREFIX /path/to/install/directory
MERCURY_ENABLE_DEBUG ON/OFF
MERCURY_TESTING_ENABLE_PARALLEL ON/OFF
MERCURY_USE_BOOST_PP ON
MERCURY_USE_BOOST_PP ON/OFF
MERCURY_USE_CHECKSUMS ON/OFF
MERCURY_USE_SYSTEM_BOOST ON/OFF
MERCURY_USE_SYSTEM_MCHECKSUM ON/OFF
MERCURY_USE_XDR OFF
MERCURY_USE_XDR ON/OFF
NA_USE_DYNAMIC_PLUGINS ON/OFF
NA_USE_BMI ON/OFF
NA_USE_MPI ON/OFF
NA_USE_OFI ON/OFF
Expand Down Expand Up @@ -163,12 +152,15 @@ from the build directory:

make install

If `RPATH` is not requested, ensure also that `CMAKE_SKIP_INSTALL_RPATH` has
previously been set when configuring the project with CMake.

Testing
=======

Tests can be run to check that basic RPC functionality (requests and bulk
data transfers) is properly working. CTest is used to run the tests,
simply run from the build directory:
data transfers) is properly working. With `BUILD_TESTING_UNIT` set to `ON`,
CTest is used to run the tests, simply run from the build directory:

ctest .

Expand All @@ -184,7 +176,7 @@ Extra verbose information can be displayed by inserting `-VV`. E.g.:

Some tests run with one server process and X client processes. To change the
number of client processes that are being used, the `MPIEXEC_MAX_NUMPROCS`
variable needs to be modified (toggle to advanced mode if you do not see
variable may need to be modified (toggle to advanced mode if you do not see
it). The default value is automatically detected by CMake based on the number
of cores that are available.
Note that you need to run `make` again after the makefile generation
Expand Down
2 changes: 1 addition & 1 deletion version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.3.0rc5
2.3.0

0 comments on commit 7fe6422

Please sign in to comment.