[SYCL] Optimize getUrEvents #20895

PatKamin · 2025-12-15T12:47:08Z

avoid increasing reference counter of a shared_ptr by retrieving a handle directly
reserve memory for the whole vector of handles beforehand, avoiding possible reallocations

This reverts commit 666693d.

- avoid increasing reference counter of a shared_ptr by retrieving a handle directly - reserve memory for the whole vector of handles beforehand, avoiding possible reallocations

devops/actions/run-tests/benchmark/action.yml

+    run: |
+      # Build and run benchmarks

+      echo "::group::install_python_deps"
      echo "Installing python dependencies..."
      # Using --break-system-packages because:
      # - venv is not installed
      # - unable to install anything via pip, as python packages in the docker
      #   container are managed by apt
      # - apt is unable to install anything due to unresolved dpkg dependencies,
      #   as a result of how the sycl nightly images are created
      pip install --user --break-system-packages -r ./devops/scripts/benchmarks/requirements.txt

      echo "::endgroup::"
-  - name: Run sycl-ls
-    shell: bash
-    run: |
-      # Run sycl-ls
+      echo "::group::establish_parameters_and_vars"
+
+      export CMPLR_ROOT=./toolchain
+      # By default, the benchmark scripts forceload level_zero
+      FORCELOAD_ADAPTER="${ONEAPI_DEVICE_SELECTOR%%:*}"
+      echo "Adapter: $FORCELOAD_ADAPTER"
+
+      case "$ONEAPI_DEVICE_SELECTOR" in
+        level_zero:*) SAVE_SUFFIX="L0" ;;
+        level_zero_v2:*)
+          SAVE_SUFFIX="L0v2"
+          export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"  # "level_zero_v2:gpu" not supported anymore
+          export SYCL_UR_USE_LEVEL_ZERO_V2=1
+          ;;
+        opencl:*) SAVE_SUFFIX="OCL" ;;
+        *) SAVE_SUFFIX="${ONEAPI_DEVICE_SELECTOR%%:*}";;
+      esac
+      case "$RUNNER_TAG" in
+        '["PVC_PERF"]') MACHINE_TYPE="PVC" ;;
+        '["BMG_PERF"]') MACHINE_TYPE="BMG" ;;
+        # Best effort at matching
+        *)
+          MACHINE_TYPE="${RUNNER_TAG#[\"}"
+          MACHINE_TYPE="${MACHINE_TYPE%_PERF=\"]}"
+          ;;
+      esac
+      SAVE_NAME="${SAVE_PREFIX}_${MACHINE_TYPE}_${SAVE_SUFFIX}"
+      echo "SAVE_NAME=$SAVE_NAME" >> $GITHUB_ENV
+      SAVE_TIMESTAMP="$(date -u +'%Y%m%d_%H%M%S')"  # Timestamps are in UTC time
+ 
+      # Cache the compute_runtime version from dependencies.json, but perform a
+      # check with L0 version before using it: This value is not guaranteed to
+      # accurately reflect the current compute_runtime version used, as the
+      # docker images are built nightly. 
+      export COMPUTE_RUNTIME_TAG_CACHE="$(cat ./devops/dependencies.json | jq -r .linux.compute_runtime.github_tag)"
+
+      echo "::endgroup::"
+      echo "::group::sycl_ls"
      sycl-ls --verbose
-  - name: Build and run benchmarks
-    shell: bash
-    env:
-      BENCH_WORKDIR: ${{ steps.establish_outputs.outputs.BENCH_WORKDIR }}
-      BENCHMARK_RESULTS_REPO_PATH: ${{ steps.establish_outputs.outputs.BENCHMARK_RESULTS_REPO_PATH }}
-    run: |
-      # Build and run benchmarks
+      echo "::endgroup::"
+      echo "::group::run_benchmarks"

-      echo "::group::setup_workdir"
-      if [ -n "$BENCH_WORKDIR" ] && [ -d "$BENCH_WORKDIR" ] && [[ "$BENCH_WORKDIR" == *llvm_test_workdir* ]]; then rm -rf "$BENCH_WORKDIR" ; fi
+      WORKDIR="$(realpath ./llvm_test_workdir)"
+      if [ -n "$WORKDIR" ] && [ -d "$WORKDIR" ] && [[ "$WORKDIR" == *llvm_test_workdir* ]]; then rm -rf "$WORKDIR" ; fi

      # Clean up potentially existing, old summary files
      [ -f "github_summary_exe.md" ] && rm github_summary_exe.md
      [ -f "github_summary_reg.md" ] && rm github_summary_reg.md

-      echo "::endgroup::"
-      echo "::group::run_benchmarks"
-
      numactl --cpunodebind "$NUMA_NODE" --membind "$NUMA_NODE" \
-      ./devops/scripts/benchmarks/main.py "$BENCH_WORKDIR" \
-        --sycl "$(realpath $CMPLR_ROOT)" \
+      ./devops/scripts/benchmarks/main.py "$WORKDIR" \
+        --sycl "$(realpath ./toolchain)" \
        --ur "$(realpath ./ur/install)" \
        --adapter "$FORCELOAD_ADAPTER" \
        --save "$SAVE_NAME" \
        --output-html remote \
-        --results-dir "${BENCHMARK_RESULTS_REPO_PATH}/" \
-        --output-dir "${BENCHMARK_RESULTS_REPO_PATH}/" \
+        --results-dir "./llvm-ci-perf-results/" \
+        --output-dir "./llvm-ci-perf-results/" \
        --preset "$PRESET" \
        --timestamp-override "$SAVE_TIMESTAMP" \
        --detect-version sycl,compute_runtime \
        --produce-github-summary \
        ${{ inputs.exit_on_failure == 'true' && '--exit-on-failure --iterations 1' || '' }}
      # TODO: add back: "--flamegraph inclusive" once works properly

      echo "::endgroup::"
      echo "::group::compare_results"
-
      python3 ./devops/scripts/benchmarks/compare.py to_hist \
        --avg-type EWMA \
        --cutoff "$(date -u -d '7 days ago' +'%Y%m%d_%H%M%S')" \
        --name "$SAVE_NAME" \
-        --compare-file "${BENCHMARK_RESULTS_REPO_PATH}/results/${SAVE_NAME}_${SAVE_TIMESTAMP}.json" \
-        --results-dir "${BENCHMARK_RESULTS_REPO_PATH}/results/" \
+        --compare-file "./llvm-ci-perf-results/results/${SAVE_NAME}_${SAVE_TIMESTAMP}.json" \
+        --results-dir "./llvm-ci-perf-results/results/" \
        --regression-filter '^[a-z_]+_sycl .* CPU count' \
        --regression-filter-type 'SYCL benchmark (measured using CPU cycle count)' \
        --verbose \
        --produce-github-summary \
        ${{ inputs.dry_run == 'true' && '--dry-run' || '' }} \

      echo "::endgroup::"
-  - name: Run benchmarks integration tests
-    shell: bash
-    if: ${{ github.event_name == 'pull_request' }}
-    env:
-      BENCH_WORKDIR: ${{ steps.establish_outputs.outputs.BENCH_WORKDIR }}
-      LLVM_BENCHMARKS_UNIT_TESTING: 1
-      COMPUTE_BENCHMARKS_BUILD_PATH: ${{ steps.establish_outputs.outputs.BENCH_WORKDIR }}/compute-benchmarks-build
-    run: |
-      # Run benchmarks' integration tests

+      # Run benchmarks' integration tests
      # NOTE: Each integration test prints its own group name as part of test script
-      python3 ./devops/scripts/benchmarks/tests/test_integration.py
-  - name: Upload github summaries and cache changes
+      if [ '${{ github.event_name == 'pull_request' }}' = 'true' ]; then
+        export LLVM_BENCHMARKS_UNIT_TESTING=1
+        export COMPUTE_BENCHMARKS_BUILD_PATH=$WORKDIR/compute-benchmarks-build
+        python3 ./devops/scripts/benchmarks/tests/test_integration.py
+      fi
+  - name: Cache changes and upload github summary
    if: always()
    shell: bash
-    env:
-      BENCHMARK_RESULTS_REPO_PATH: ${{ steps.establish_outputs.outputs.BENCHMARK_RESULTS_REPO_PATH }}


devops/actions/run-tests/benchmark/action.yml

+  - name: Checkout results repo
+    uses: actions/checkout@v5
+    with:
+      ref: ${{ env.BENCHMARK_RESULTS_BRANCH }}
+      path: llvm-ci-perf-results


PatKamin · 2025-12-16T10:45:46Z

No perf improvement gained

PatKamin temporarily deployed to WindowsCILock December 15, 2025 12:47 — with GitHub Actions Inactive

PatKamin temporarily deployed to WindowsCILock December 15, 2025 13:13 — with GitHub Actions Inactive

PatKamin added 2 commits December 16, 2025 10:10

Revert "[CI][Bench] Refactor bench workflow (intel#20742)"

82994de

This reverts commit 666693d.

[SYCL] Optimize getUrEvents

a4b8002

- avoid increasing reference counter of a shared_ptr by retrieving a handle directly - reserve memory for the whole vector of handles beforehand, avoiding possible reallocations

PatKamin force-pushed the optimize-geturevents branch from 78a3a26 to a4b8002 Compare December 16, 2025 09:11

PatKamin had a problem deploying to WindowsCILock December 16, 2025 09:11 — with GitHub Actions Failure

PatKamin temporarily deployed to WindowsCILock December 16, 2025 09:11 — with GitHub Actions Inactive

github-advanced-security bot found potential problems Dec 16, 2025

View reviewed changes

PatKamin temporarily deployed to WindowsCILock December 16, 2025 09:36 — with GitHub Actions Inactive

PatKamin had a problem deploying to WindowsCILock December 16, 2025 09:36 — with GitHub Actions Failure

PatKamin temporarily deployed to WindowsCILock December 16, 2025 09:36 — with GitHub Actions Inactive

PatKamin temporarily deployed to WindowsCILock December 16, 2025 09:45 — with GitHub Actions Inactive

PatKamin closed this Dec 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Optimize getUrEvents #20895

[SYCL] Optimize getUrEvents #20895

Uh oh!

PatKamin commented Dec 15, 2025

Uh oh!

Check failure

Check warning

PatKamin commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SYCL] Optimize getUrEvents #20895

[SYCL] Optimize getUrEvents #20895

Uh oh!

Conversation

PatKamin commented Dec 15, 2025

Uh oh!

Check failure

Check warning

PatKamin commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant