Skip to content

Modernize the build system #2428

@ihincks

Description

@ihincks

This issue is a proposal for a series of steps to modernize the build system of qiskit-aer. I am not an expert at C++ build systems, so this proposal needs review and criticism from people who are, both on the plan itself and on the resulting PRs. I am volunteering to do the legwork and coordinate the effort via Claude 4.7, with the understanding that my time is limited and several other things are higher priority for me: the progress of closing this issue will move at a deliberate, perhaps slow, pace so long as I am assigned.

Motivation

The recent windows-latestwindows-2022 pin (#2426) is a workaround for VS 2025 incompatibility, not a fix. The runner image will eventually be retired (as windows-2019 was), and the underlying problems are the actual cause:

  • Conan v1 is no longer maintained. Conan's own docs label it "legacy, no longer recommended", and Conan Center stopped accepting v1 packages in late 2024. It does not know about apple-clang 17 (hence the sed ~/.conan/settings.yml hack duplicated in three CI locations) and is a poor fit for newer MSVC toolchains. The vendored cmake/conan.cmake is unmaintained (JFrog 2018 vintage).
  • scikit-build (legacy) is no longer the recommended Python+CMake backend; scikit-build-core is. The current setup.py + pyproject.toml split is internally inconsistent (python_requires=">=3.7" vs classifiers 3.10+).
  • pybind11 patterns in aer_circuit_binding.hpp (py::init([]{ return new T(...); }) and py::enum_ with arithmetic/export_values) trigger ambiguous template instantiation on gcc 14, MSVC from VS 2025, and CUDA 12.9. Addressed by @wshanks's PRs Fix pybind11 init template errors #2416 / Switch from pybind11 enum_ to native_enum #2417.
  • nlohmann_json is pinned to 3.1.1 (2018). Newer versions (3.10.3+) need explicit JSON conversion changes — @wshanks's PR Add explicit json conversions #2418.
  • CMake minimum is 3.8; scikit-build-core requires 3.15+ and FetchContent FIND_PACKAGE_ARGS needs 3.24.
  • CI inconsistencies: windows-latest in deploy.yml vs windows-2022 elsewhere; mixed cibuildwheel versions (v3.2.1 vs v2.19.2); stale actions/checkout@v3; vestigial actions-rs/toolchain (no Rust here); pip cache keys hash setup.py not pyproject.toml; docs.yml has a Windows pip cache path on a Linux runner.

The intended end state is a build that:

  1. Works on windows-latest (VS 2025) and macos-latest (Apple Silicon, clang 17+) without runner pins or sed hacks.
  2. Compiles against modern toolchains (gcc 14, MSVC 19.40+, CUDA 12.9+, current ROCm).
  3. Uses maintained tooling (scikit-build-core, CMake FetchContent).
  4. Has internally consistent CI.

Step 0 — Merge @wshanks's prereq PRs

Each is one commit; the steps below assume them merged. Order is independent (they touch disjoint files); listed smallest-scope-first.

  • PR Fix pybind11 init template errors #2416Fix pybind11 init template errors (42217e8). Replaces py::init([](args){ return new T(args); }) with py::init<args...>() for the 8 constructors in qiskit_aer/backends/wrappers/aer_circuit_binding.hpp. Required before any toolchain bump that triggers the ambiguous-template error (gcc 14, CUDA 12.9, recent MSVC). As a side effect, should also resolve Build fails for CUDA 12.9 #2405.
  • PR Switch from pybind11 enum_ to native_enum #2417Switch from pybind11 enum_ to native_enum (4ca3f2c). Replaces py::enum_ with py::native_enum for AerUnaryOp and AerBinaryOp. Bumps the minimum pybind11 to 3.0, which the rest of this work inherits.
  • PR Add explicit json conversions #2418Add explicit json conversions (13b01f8). Adds explicit std::to_json calls in pybind_json.hpp and the noise-model load path. Required before bumping nlohmann_json past 3.10.3.

Step 1 — Drop Conan; vendor deps via CMake FetchContent

Replace Conan v1 with FetchContent for the small set of deps Aer actually consumes. Keep a system-libs path for downstream packagers (conda-forge etc.).

Why FetchContent rather than upgrading to Conan 2?

The same migration cost applies either way (rewriting recipes/profiles for Conan 2, or FetchContent_Declare calls), but FetchContent is the better fit here:

  • Aer's dependency set is small and mostly header-only. nlohmann_json, spdlog, and Thrust are header-only; Catch2 is test-only. The value a package manager adds — binary caching, transitive dep resolution, multi-project version pinning — doesn't apply.
  • FetchContent ships with CMake. No extra build dependency, no extra tool for contributors to install or learn, no profile files to maintain, no ~/.conan/settings.yml for CI to monkey-patch. Toolchain compatibility (apple-clang 17, new MSVC) becomes CMake's problem, which is actively maintained, rather than ours via Conan profiles.
  • FIND_PACKAGE_ARGS gives packagers what they want. With CMake 3.24+, FetchContent_Declare(... FIND_PACKAGE_ARGS ...) calls find_package first and only fetches if the system version is missing or too old. This is strictly simpler for conda-forge and distros than the current DISABLE_CONAN flag, and it's the same UX as AER_USE_SYSTEM_LIBS=ON in this proposal.
  • It composes cleanly with scikit-build-core (Step 2). Conan inside a PEP 517 build backend is doable but awkward; FetchContent is just CMake.
  • Conan 2 doesn't fix the underlying problem. The apple-clang 17 issue isn't that Conan v1 is buggy — it's that Conan needs per-toolchain knowledge (settings/profiles) that someone has to maintain. Switching to Conan 2 moves the problem; switching to FetchContent removes it.

The tradeoff is no binary caching: a clean source build re-downloads (and for Catch2 re-compiles) the deps. CI already pip-caches the build environment.

Files modified

  • Rewrite cmake/dependency_utils.cmake. Keep the _use_system_libraries branch, gated by a renamed AER_USE_SYSTEM_LIBS (was DISABLE_CONAN). Replace setup_conan() with FetchContent_Declare / FetchContent_MakeAvailable, using FIND_PACKAGE_ARGS so a system install short-circuits the fetch.
  • Delete cmake/conan_utils.cmake and cmake/conan.cmake.
  • CMakeLists.txt: bump cmake_minimum_required to 3.24; drop include(conan_utils).
  • pyproject.toml: drop conan<2.0.0; delete the [tool.cibuildwheel.macos] before-build sed hack.
  • Delete the duplicate macOS conan hacks in tests.yml and deploy.yml (CIBW_BEFORE_BUILD).

Dependency version bumps

Dep Current Target
nlohmann_json 3.1.1 (2018) 3.11.3 minimum (requires #2418 merged); 3.12.0 is current
spdlog 1.9.2 1.14.x minimum; 1.17.0 is current
Thrust 1.9.5 1.17.2 from the legacy NVIDIA/thrust repo (header-only use for OMP/TBB Thrust backends); CUDA backend keeps using the toolkit-shipped Thrust. Switching to NVIDIA/cccl is also reasonable but a larger lift; happy to revisit.
Catch2 2.13.6 3.x — header changes (<catch2/catch_test_macros.hpp> etc.); ~10–20 mechanical edits in BUILD_TESTS=True paths

OpenMP on macOS

Today the Conan path fetches llvm-openmp/12.0.1 and copies libomp.dylib (cmake/conan_utils.cmake:82–91). Replace with find_package(OpenMP) + brew install libomp in CI (tests.yml, pyproject.toml before-all). This matches what conda-forge already does; delocate-wheel bundles libomp.dylib into the macOS wheel.

On Apple Silicon, CMake's FindOpenMP doesn't search Homebrew prefixes by default (libomp lives under /opt/homebrew/opt/libomp, not the default search path). CMakeLists.txt should probe brew --prefix libomp and feed the result through OpenMP_ROOT so source builds from a vanilla pip install . find it without per-user environment setup.

Step 2 — scikit-buildscikit-build-core; delete setup.py

  • Rewrite [build-system]:
    [build-system]
    requires = ["scikit-build-core>=0.10", "pybind11>=3.0"]
    build-backend = "scikit_build_core.build"
  • Add a real [project] table (name/version/license/classifiers/dependencies/requires-python = ">=3.10") — moved from setup.py.
  • Add [tool.scikit-build]: cmake.version = ">=3.24", cmake.build-type = "Release", wheel.packages = ["qiskit_aer"], sdist include/exclude.
  • Pull version from qiskit_aer/VERSION.txt via [tool.scikit-build.metadata.version] regex provider (preserves current behavior).
  • Delete setup.py.
  • tox.ini: drop py38, py39 from envlist; replace python -I -m build --wheel -C=--build-option=… with scikit-build-core's -C cmake.define.… config syntax; drop setup.py references.
  • deploy.yml: replace python setup.py sdist with python -m build --sdist; drop the explicit pip install -U scikit-build wheel.
  • All workflows: pip cache keys hash pyproject.toml (not setup.py).

CUDA wheel-name override (QISKIT_AER_CUDA_MAJOR / QISKIT_AER_PACKAGE_NAME, used by deploy.yml to publish qiskit-aer-gpu, qiskit-aer-gpu-cu11, …): scikit-build-core ships four built-in dynamic-metadata providers (setuptools_scm, regex, fancy_pypi_readme, template); none of them is a clean drop-in for "rename the wheel based on an env var". The likely path is one of: (a) keep the existing tools/configure_package_name.py-style pre-build script that rewrites a small portion of pyproject.toml before each CUDA wheel build, (b) use scikit-build-core's template metadata provider to derive the name from a regex'd input file, or (c) write a tiny custom plugin. Open to alternatives — this is one of the spots I'd most like reviewer input on.

Step 3 — Make Windows VS-version-agnostic

Once the toolchain compiles cleanly on VS 2025 (Step 0 + Step 1 dep updates), drop the version pins. (Note: line numbers below assume the windows-2022 pin from PR #2426 has landed on main. If it's reverted instead of merged, the runner-pin items become no-ops.)

  • build.yml: "windows-2022""windows-latest".
  • tests.yml: runs-on: windows-2022windows-latest; drop CMAKE_GENERATOR: "Visual Studio 17 2022" from the tests_windows job env.
  • pyproject.toml [tool.cibuildwheel.windows]: drop the CMAKE_GENERATOR = "Visual Studio 17 2022" override.
  • Drop microsoft/setup-msbuild@v2 from build.yml and tests.yml — no longer needed once we're not pinning the multi-config VS generator.

The goal is to be generator-agnostic, not Ninja-specifically. Removing the pin lets scikit-build-core pick whatever's available on the runner, and CI tells us whether that's good enough on windows-latest (which has both VS 2022/2025 and pip-installable Ninja). If CI shows it falls back to a multi-config VS generator that breaks on the runner image of the day, the follow-up is small and obvious: install Ninja explicitly (before-build = ["pip install ninja"]) and pin environment = { CMAKE_GENERATOR = "Ninja" } for cibuildwheel + the test job. scikit-build-core's docs say "ninja/make or MSVC used by default, respects CMAKE_GENERATOR", so an explicit Ninja opt-in is the worst case we expect, not a default we have to plan for.

Step 4 — CI consistency cleanup

Standalone; can land in parallel with Step 3.

  • deploy.yml: already windows-latest for the wheel matrix; keep consistent with Step 3.
  • deploy.yml: remove the three calls to actions-rs/toolchain@v1 (no Rust in the repo).
  • deploy.yml: bump the ppc64le pypa/cibuildwheel@v2.19.2@v3.2.1 to match the other cibuildwheel jobs. Optionally bump all three to the current latest (@v3.4.1 at time of writing).
  • deploy.yml: bump the stragglers from actions/checkout@v3@v4 and actions/setup-python@v4@v5 in the sdist, gpu-build-cuda11, and gpu-build-cuda12 jobs (the rest of the file is already on v4/v5). Optional: bump everything to checkout@v6 / setup-python@v6 (current latest).
  • docs.yml: in the tutorials job (Linux runner), fix the pip cache path from ~\AppData\Local\pip\Cache to ~/.cache/pip — copy-paste bug that silently disabled the cache.

Out of scope (intentionally)

  • The bigger pybind11 / JSON-roundtrip overhaul (src/framework/pybind_json.hpp:221–313 — every Python noise model passes through Python dict → JSON → C++ struct). Performance-relevant but orthogonal to "make the build work"; track separately.
  • Replacing the numpy-based 3D-loop serialization in pybind_json.hpp:240–245 with the buffer protocol.
  • Dropping the legacy QasmSimulator / StatevectorSimulator / UnitarySimulator shims.

Related issues and PRs

Would directly close

Likely fixes / materially helps

Overlapping PRs — coordination needed

Closed issues that would have been prevented

A look through closed issues shows the same handful of failure modes repeating year after year. The categories below are exactly the things this overhaul targets at the root.

Conan-related build failures (Step 1 makes these unreachable):

Windows / MSVC toolchain breakage (Step 3 dropping the VS-version pin + scikit-build-core's modern build flow make these toolchain-version-agnostic):

macOS toolchain breakage (Step 1's find_package(OpenMP) + Homebrew libomp + dropping the ~/.conan/settings.yml sed hack handles these):

setup.py / scikit-build legacy issues (Step 2 removes setup.py and adopts PEP 621 metadata via scikit-build-core):

CMake-version / generic build-from-source breakage (Step 1's CMake floor bump to 3.24 + cleaner FetchContent flow):

Delivery plan

Each step is its own PR. I plan to develop them as a local stack (each branch based on the previous) and rebase as parents merge.

Verification per step (locally on darwin/arm64 before pushing): clean venv pip install ., python -m build --wheel/--sdist, tools/verify_wheels.py. Already exercised end-to-end through Step 2; the local pass found one real bug (Apple Silicon OpenMP_ROOT) that's already folded into the Step 1 commit. The CI matrix (Linux / macOS / Windows VS 2025) lights up properly starting at Step 3.

Asks of reviewers:

  1. Does the staging make sense, or should anything be split / merged / reordered?
  2. Better suggestions for the CUDA wheel-name mechanism in Step 2?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions