Skip to content

Conversation

@juntyr
Copy link

@juntyr juntyr commented Apr 24, 2025

I'm opening this draft early so that I can get some help with building libpressio.

The main idea is to update this repo in preparation for publishing to crates.io. The most important step here is to ensure that libpressio can be built automatically by Rust. To start, we should aim for supporting a minimal built with a few statically linked compressors, once that works we can add Rust features corresponding to other compressors we want to support building.

Currently the build doesn't work yet since

  • std_compat is not available - we might need to vendor and build it as well
  • bindgen needs to be run after cmake so that the version file is configured

@robertu94 Perhaps you could help with including std_compat in this build and configuring cmake for a minimal build?

Once we've got libpressio-sys working, I'd also like to make a quick pass over the higher-level wrapper.

  • Add a CI step once it builds locally

@robertu94 robertu94 self-assigned this Apr 24, 2025
@juntyr
Copy link
Author

juntyr commented Apr 24, 2025

@robertu94 Fantastic, thank you!

@juntyr
Copy link
Author

juntyr commented Apr 24, 2025

I can either take over again, or you can complete the PR (whatever you prefer). For CI, we could use something similar to my setup for tthresh-rs: https://github.com/juntyr/tthresh-rs/tree/main/.github/workflows

@robertu94
Copy link
Owner

I just built a minimum viable LibPressio. This has a few known limitations:

  1. It requires a C++17 compiler. We could push backwards compatibility back to C++11, but this requires boost for boost::optional and boost::variant to polyfil for std::variant and std::optional. Getting to C++11 is probably unnessisary, and we can likely stop at C++14 which gets us all major and supported distros.
  2. I'm not sure this will compile on MacOS's clang -- called in the C++ community AppleClang. This is because it doesn't support OpenMP. Disabling this has significant performance implications for several compressors.
  3. This has just what LibPressio calls "core" libraries, and doesn't include SZ, ZFP, etc... support which we would obviously want. We can continue to build this up by vendoring more and more things, but I want this to be 1) maintainable for me 2) possible to override with a local copy of LibPressio for development.
    1. Regarding maintainability, with 50+ packages in the compression ecosystem, I don't want to maintain compatibility tables for 30+ packages, and some of these packages are gnarly to compile (e.g. PETSc a distributed matrix math library)
    2. Regarding local development, I suspect there is a recommended way to handle this, but I don't know the Rust side as well.
  4. I suspect we're going to run into some problems with CMake as we do this, specifically around finding the appropriate prefix paths for other libraries on all platforms. For example right now I hardcode "lib64" as the library path for libpressio-sys because that is what it will be on most machines. Maybe this gets better with CMake 4.1 which will include the common package specification

@robertu94
Copy link
Owner

I can either take over again, or you can complete the PR (whatever you prefer). For CI, we could use something similar to my setup for tthresh-rs: https://github.com/juntyr/tthresh-rs/tree/main/.github/workflows

I can add the CI part; just breaking up the work into separate commits :)

I also would appreciate your thoughts on the scoping limitations and which ones are important to solve

@juntyr
Copy link
Author

juntyr commented Apr 24, 2025

I also would appreciate your thoughts on the scoping limitations and which ones are important to solve

I'll quickly draft a minimal PR in numcodecs-rs that doesn't wrap anything yet but checks if we can build with this PR. That should give us info on if there are any problems we would have to fix. Only bundling core compressors makes sense. We can always add a README line saying that one can always build dynlibs for the other ones manually using normal libpressio.

@juntyr
Copy link
Author

juntyr commented Apr 24, 2025

Currently I'm getting the following error on juntyr/numcodecs-rs#23

error: could not find native static library `libpressio`, perhaps an -L flag is missing?

error: could not compile `libpressio-sys` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
error[E0308]: arguments to this function are incorrect
    --> /home/codespace/.cargo/git/checkouts/libpressio-rs-77241dfd574264ad/ddceba6/src/lib.rs:204:13
     |
204  |             libpressio_sys::pressio_data_new_empty(dtype, dim_arr.len() as u64, dim_arr.as_ptr())
     |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^        -------------------- expected `usize`, found `u64`
     |
note: expected `*const usize`, found `*const u64`
    --> /home/codespace/.cargo/git/checkouts/libpressio-rs-77241dfd574264ad/ddceba6/src/lib.rs:204:81
     |
204  |             libpressio_sys::pressio_data_new_empty(dtype, dim_arr.len() as u64, dim_arr.as_ptr())
     |                                                                                 ^^^^^^^^^^^^^^^^
     = note: expected raw pointer `*const usize`
                found raw pointer `*const u64`
note: function defined here
    --> /workspaces/numcodecs-rs/target/debug/build/libpressio-sys-b909809dad052c49/out/bindings.rs:1351:12
     |
1351 |     pub fn pressio_data_new_empty(
     |            ^^^^^^^^^^^^^^^^^^^^^^
help: you can convert a `u64` to a `usize` and panic if the converted value doesn't fit
     |
204  |             libpressio_sys::pressio_data_new_empty(dtype, (dim_arr.len() as u64).try_into().unwrap(), dim_arr.as_ptr())
     |                                                           +                    +++++++++++++++++++++

error[E0308]: arguments to this function are incorrect
    --> /home/codespace/.cargo/git/checkouts/libpressio-rs-77241dfd574264ad/ddceba6/src/lib.rs:219:13
     |
219  |             libpressio_sys::pressio_data_new_copy(
     |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
222  |                 input_array.ndim() as u64,
     |                 ------------------------- expected `usize`, found `u64`
     |
note: expected `*const usize`, found `*const u64`
    --> /home/codespace/.cargo/git/checkouts/libpressio-rs-77241dfd574264ad/ddceba6/src/lib.rs:223:17
     |
223  |                 input_array.shape().as_ptr() as *const u64,
     |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     = note: expected raw pointer `*const usize`
                found raw pointer `*const u64`
note: function defined here
    --> /workspaces/numcodecs-rs/target/debug/build/libpressio-sys-b909809dad052c49/out/bindings.rs:1334:12
     |
1334 |     pub fn pressio_data_new_copy(
     |            ^^^^^^^^^^^^^^^^^^^^^
help: you can convert a `u64` to a `usize` and panic if the converted value doesn't fit
     |
222  |                 (input_array.ndim() as u64).try_into().unwrap(),
     |                 +                         +++++++++++++++++++++

error[E0308]: mismatched types
   --> /home/codespace/.cargo/git/checkouts/libpressio-rs-77241dfd574264ad/ddceba6/src/lib.rs:341:25
    |
338 |                     libpressio_sys::pressio_options_set_strings(
    |                     ------------------------------------------- arguments to this function are incorrect
...
341 |                         option_value_cptr.len() as u64,
    |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u64`
    |
note: function defined here
   --> /workspaces/numcodecs-rs/target/debug/build/libpressio-sys-b909809dad052c49/out/bindings.rs:945:12
    |
945 |     pub fn pressio_options_set_strings(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: you can convert a `u64` to a `usize` and panic if the converted value doesn't fit
    |
341 |                         (option_value_cptr.len() as u64).try_into().unwrap(),
    |                         +                              +++++++++++++++++++++

For more information about this error, try `rustc --explain E0308`.
error: could not compile `libpressio` (lib) due to 3 previous errors

So presumably you updated libpressio to size_t in the meantime, and somehow we're not emitting the correct link path in the build script yet.

@juntyr
Copy link
Author

juntyr commented Apr 25, 2025

@robertu94 I made some progress and got the latest ref of this PR to build locally and on CI.

Next, I tried to build it for WASM with numcodecs-wasm-builder (checkout juntyr/numcodecs-rs#23)

cargo run --bin numcodecs-wasm-builder -- --crate numcodecs-pressio --version 0.1.0 --codec IdentityCodec --output pressio.wasm --path MY-LOCAl-PATH/numcodecs-rs/codecs/pressio

Now we get an error when building std_compat. I think it's because its build doesn't recognize WASI and enables STDCOMPAT_BOOST_REQUIRED=TRUE and then fails with

Platform/wasi to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
  CMake Error at /nix/store/35h5x40slmq4vp8glkg3b2mi309b12c9-cmake-3.30.5/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
    Could NOT find Boost (missing: Boost_INCLUDE_DIR thread)
  Call Stack (most recent call first):
    /nix/store/35h5x40slmq4vp8glkg3b2mi309b12c9-cmake-3.30.5/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
    /nix/store/35h5x40slmq4vp8glkg3b2mi309b12c9-cmake-3.30.5/share/cmake-3.30/Modules/FindBoost.cmake:2409 (find_package_handle_standard_args)
    CMakeLists.txt:158 (find_package)

@juntyr
Copy link
Author

juntyr commented Apr 25, 2025

The cmake output includes

  -- The CXX compiler identification is Clang 19.1.7
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /nix/store/l62rpwmzknjqkfc75pkrhijinlnc2fkg-clang-19.1.7/bin/clang++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Performing Test HAVE_NO_OMIT_FRAME_POINTER
  -- Performing Test HAVE_NO_OMIT_FRAME_POINTER - Success
  -- Checking for lerp: FALSE
  -- Checking for transform_reduce: FALSE
  -- Checking for exclusive_scan: FALSE
  -- Checking for exchange: TRUE
  -- Checking for rbeginend: FALSE
  -- Checking for optional: TRUE
  -- Checking for variant: FALSE
  -- Checking for make_unique: TRUE
  -- Checking for conjunction: TRUE
  -- Checking for multiplies: TRUE
  -- Checking for midpoint: FALSE
  -- Checking for nth_element: FALSE
  -- Checking for span: FALSE
  -- Checking for void_t: TRUE
  -- Checking for negation: TRUE
  -- Checking for clamp: TRUE
  -- Checking for string_view: FALSE
  -- Checking for is_null_pointer: TRUE
  -- Checking for four_arg_equals: FALSE
  -- Checking for endian: FALSE
  -- Checking for shared_mutex: FALSE
  -- Checking for mutex: FALSE
  -- Checking for cbeginend: FALSE
  -- Checking for byteswap: FALSE
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - not found
  -- Check if compiler accepts -pthread
  -- Check if compiler accepts -pthread - no
  -- Could NOT find Threads (missing: Threads_FOUND)
  -- Configuring incomplete, errors occurred!

so my guess is that you require some newer C++ features that the wasi-sdk doesn't include.

@juntyr
Copy link
Author

juntyr commented Apr 25, 2025

If there's no way around using boost, Pyodide has a working build for it at https://github.com/pyodide/pyodide/blob/main/packages/boost-cpp/meta.yaml, so we could try to build on that

+ PERF: modified libpressio-sys/build.rs to ignore spurious rebuilds caused by
  generated files (e.g. pressio_version.h) that are always considered
  fresh
+ FIX: better handled u64 vs usize differences on MacOS where these are
  different types, by ensuring that the correct type (usize) is used
  instead
+ FIX: silence spurious warnings about non standard naming conventions
  in low-level bindgen generated rust code that calls libpressio
+ BUG: the mechanism that libpressio for dynamic module linkage does not
  work in static rust because the symbols are being stripped from the
  resulting binary.  This needs to be resolved by using a explicit
  registration pattern
@robertu94
Copy link
Owner

The problems are std::shared_mutex and std::variant. The former makes sense since threading on WASI is limited. It is also easier to work around as it is only used in 3 optional places we are not using right now. I could add a libstdcompat disable threading configuration option that could remove this dependency and then check for it in cmake for libpressio to disable the 4 files that need it.

The latter, does not make sense and is much harder to fix. Do you know what libstdc++ or libc++ is being used, or can you provide me a command to inspect the sysroot so I can check myself? std::variant I believe is part of freestanding and should have been include in clang since clang/libc++4 so it should be compilable on WASI

@juntyr
Copy link
Author

juntyr commented Apr 25, 2025

WASM is usually quite good at hiding its lack of threads until you try to spawn one, so even the std::shared_mutex should be fine. I think it's a case of an outdated libc++.

Ok, it's weird ...

When you run the numcodecs-wasm-builder, it spits out all the env variables it passes to the eventual cargo build. In it, I get

"-DCMAKE_CXX_FLAGS= --target=wasm32-wasip1 -nodefaultlibs -resource-dir /nix/store/gj6ghd75ki5461spdpqy5vkr10wjdw7m-clang-19.1.7-lib/lib/clang/19 --sysroot=/nix/store/ki1m25h4iqsqv1vgm874br77hwfz13wf-wasi-sysroot-25.0 -isystem /nix/store/ki1m25h4iqsqv1vgm874br77hwfz13wf-wasi-sysroot-25.0/include/wasm32-wasip1/c++/v1 -isystem /nix/store/ki1m25h4iqsqv1vgm874br77hwfz13wf-wasi-sysroot-25.0/include/c++/v1 -isystem /nix/store/gj6ghd75ki5461spdpqy5vkr10wjdw7m-clang-19.1.7-lib/lib/clang/19/include -isystem /nix/store/ki1m25h4iqsqv1vgm874br77hwfz13wf-wasi-sysroot-25.0/include/wasm32-wasip1 -isystem /nix/store/ki1m25h4iqsqv1vgm874br77hwfz13wf-wasi-sysroot-25.0/include -B /nix/store/i0axrg7hv4dwpf3k8r010y6yzvsh96w2-lld-19.1.7/bin -D_WASI_EMULATED_PROCESS_CLOCKS -include /Users/junityre/numcodecs-rs/target/debug/build/scratch-c87039d66226dffc/out/numcodecs-wasm-builder-0.1.0+wasi0.2.3/numcodecs-pressio-wasm-0.1.0/include.hpp"

When I then inspect /nix/store/ki1m25h4iqsqv1vgm874br77hwfz13wf-wasi-sysroot-25.0/include/wasm32-wasip1/c++/v1, it contains the files named variant and shared_mutex so they seem to be there ...

@robertu94
Copy link
Owner

Ok, I'll explore further then why it failed in the libstdcompat check, and try a fix there.

Robert

@robertu94
Copy link
Owner

I wanted to provide an update on this. I've been working on a series of modifications to the libpressio cmake code to dynamically generate a "statics export" function to ensure that the linker does not eliminate the code used to register the compressors. I'll be pushing an update to robertu94/libpressio when that is completed.

@juntyr
Copy link
Author

juntyr commented May 5, 2025

Thank you for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants