Skip to content

Conversation

@HudsonGraeme
Copy link
Member

@HudsonGraeme HudsonGraeme commented Jan 9, 2026

Summary by CodeRabbit

  • New Features

    • Per-conv tile-size CLI, full Autotiler (spatial tiling + channel-splitting), tiled slice outputs, tensor-graph driven slicing, lazy shape tracing, rich structured slice/run metadata schema, new Prover and Verifier orchestrators, parallel compile/run/prove/verify, run-dir/resume support, tile-range prove/verify.
  • Bug Fixes

    • More deterministic ONNX execution, dtype-aware input casting, robust multi-input handling, stronger backend fallbacks, consistent path/metadata normalization, and improved output processing.
  • Tests

    • Major expansion of unit and integration tests for autotiling, tiling/channel-splitting, slicing, compilation, run, proving, verification, and full pipeline.

✏️ Tip: You can customize this high-level summary in your review settings.

@HudsonGraeme HudsonGraeme self-assigned this Jan 9, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

Walkthrough

Refactors slicing/metadata to a typed schema, adds an Autotiler for spatial tiling and channel-splitting, introduces Tile/Channel executors, parallel compile/run/prove/verify pipelines, lazy ONNX tracing, many backend utilities (JSTprove/EZKL/ONNX), CLI flags for tiling/parallelism, and extensive tests across unit and integration suites.

Changes

Cohort / File(s) Summary
Autotiler & slicer
dsperse/src/slice/autotiler.py, dsperse/src/cli/slice.py, dsperse/src/slice/slicer.py, dsperse/src/slice/utils/onnx_utils.py
Adds Autotiler, integrates tiling/channel-splitting into slicing flow, adds --tile-size CLI, lazy ONNX tracing, new extraction pipeline, extensive helpers for tile/channel generation, and metadata updates.
Schema & analyzers
dsperse/src/analyzers/schema.py, dsperse/src/analyzers/onnx_analyzer.py, dsperse/src/analyzers/runner_analyzer.py
Introduces comprehensive dataclass-based metadata schema (Model/Slice/Run/Tiling/etc.) and updates analyzers to produce/consume typed schema objects with signature changes.
Tensor analysis
dsperse/src/slice/tensor_graph.py
New TensorGraph/TensorInfo for tensor producer/consumer mapping, tensor lifetime, and slice dependency computation used by slicer and analyzers.
Tiling & channel executors
dsperse/src/run/tile_executor.py, dsperse/src/run/channel_split_executor.py
New TileExecutor and ChannelSplitExecutor for tile splitting/reconstruction, per-tile tasks, backend invocation, output aggregation, and in-memory/on-disk task modes.
Runner & runtime utils
dsperse/src/run/runner.py, dsperse/src/run/utils/runner_utils.py
Runner redesigned to dispatch tiled/channel flows, support run_dir/threads/batch/resume and per-slice callbacks; RunnerUtils split into stateless helpers for resume, persistence, and execution orchestration.
Compiler & compiler utils
dsperse/src/compile/compiler.py, dsperse/src/compile/utils/compiler_utils.py, dsperse/src/cli/compile.py
Parallel, worker-based per-slice compilation with backend selection/fallback, slice-dir-aware path resolution, tiling/channel compilation support, and CLI --parallel/--resume.
Prover & Verifier + CLI
dsperse/src/prove/prover.py, dsperse/src/verify/verifier.py, dsperse/src/cli/prove.py, dsperse/src/cli/verify.py, dsperse/src/prove/utils/prover_utils.py, dsperse/src/verify/utils/verifier_utils.py
Adds new Prover/Verifier orchestrators and utilities supporting tiled/per-tile proving & verification, parallelism, tile-range parsing, and replaces legacy top-level modules.
Backends & utils
dsperse/src/backends/onnx_models.py, dsperse/src/backends/jstprove.py, dsperse/src/backends/ezkl.py, dsperse/src/backends/utils/jstprove_utils.py, dsperse/src/backends/utils/ezkl_utils.py
ONNX Runtime session helpers, multi-input inference, JSTprove/EZKL utilities (compatibility, SRS checks, artifact initialization), API/type refinements for backend calls.
Compile/runtime helpers & packaging
dsperse/src/compile/utils/compiler_utils.py, dsperse/src/run/channel_split_executor.py, dsperse/src/run/tile_executor.py, dsperse/src/utils/utils.py
Backend parsing, slice-dir resolution, compilation block builders, tiling path relativization, ONNX save helper, dsperse-version helper.
Schema-driven I/O & metadata
dsperse/src/analyzers/schema.py, dsperse/src/analyzers/onnx_analyzer.py, dsperse/src/slice/utils/onnx_utils.py
Moves metadata generation to ModelMetadata/SliceMetadata classes; updates reads/writes and consumers to use typed objects.
Tests & CI
tests/unit/test_autotiler.py, tests/unit/test_compiler_utils.py, tests/integration/*, tests/conftest.py, tests/integration/test_pipeline.py
Extensive new/updated unit and integration tests covering autotiler, tiling parity, channel-splitting, compile/run/prove/verify tiled flows, parallelism, and test cleanup.
Removals / replacements
dsperse/src/prover.py, dsperse/src/verifier.py
Legacy Prover/Verifier modules removed and replaced by new implementations under dsperse/src/prove/ and dsperse/src/verify/; watch for stale imports.
Project config
pyproject.toml, .gitignore
Bumped Python to 3.13, loosened onnxruntime pin, added onnxslim and jstprove dependency, and updated ignore patterns for slice/test outputs.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI
  participant Slicer as Slicer/OnnxSlicer
  participant Autotiler as Autotiler
  participant Compiler as Compiler
  participant Runner as Runner
  participant TileExec as TileExecutor
  participant Prover as Prover
  participant Verifier as Verifier

  CLI->>Slicer: slice_model(tile_size?)
  Slicer->>Autotiler: detect_tiling / apply_tiling
  Autotiler-->>Slicer: tiled_slices + tiled_info
  Slicer->>Compiler: compile(slices, targets)
  Compiler-->>Runner: compiled artifacts metadata
  CLI->>Runner: run(run_dir, threads)
  Runner->>Runner: dispatch_slice (select backend, tiled/channel)
  alt tiled
    Runner->>TileExec: split_into_tiles
    loop per tile (parallel)
      Runner->>TileExec: execute_worker(tile)
      TileExec-->>Runner: tile result
    end
    Runner->>TileExec: reconstruct_from_tiles
  else channel-split
    Runner->>ChannelSplitExecutor: execute_group(s)
  else single
    Runner->>Runner: run_onnx_single / run_jstprove_inference / run_ezkl_inference
  end
  Runner->>Prover: prove(run_dir, tiles_range?, parallel)
  Prover-->>Verifier: proof metadata
  CLI->>Verifier: verify(run_dir, tiles_range?, parallel)
  Verifier-->>CLI: verification results
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • 0pendansor
  • shirin-shahabi

Poem

🐇
I hop through graphs and split each thread,
Tiles and channels snugly spread,
Halos trimmed and proofs take flight,
Parallel carrots gleam tonight—
A joyous hop for code well-fed!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'V3' is vague and generic, providing no meaningful context about the changeset's primary purpose or scope. Use a descriptive title that summarizes the main change, such as 'Add autotiling and tiling orchestration for sliced model execution' or 'Implement multi-backend tiling, prove/verify refactoring, and runner enhancements'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch autotiler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @dsperse/src/slice/autotiler.py:
- Around line 91-106: Non-conv nodes are being re-created with helper.make_node
using only op_type, inputs and outputs, which drops node-specific attributes;
when iterating non_conv_ops (orig_node) and building new nodes (appended to
nodes), preserve orig_node.attribute by either passing those attributes into
helper.make_node or by cloning orig_node and replacing its input/output fields
(so attributes like LeakyRelu.alpha or Clip.min/max are retained); update the
creation of the new node in the non_conv_ops loop (where new_inputs/new_outputs
are computed and helper.make_node(...) is called) to include the original
attributes from orig_node.attribute.
🧹 Nitpick comments (5)
dsperse/src/slice/autotiler.py (4)

10-16: Minimum tile size constraint is implicit.

The function searches for tile sizes down to 8 (since range(..., 7, -1) excludes 7). Consider adding a comment explaining the minimum tile size constraint, or making it a named constant for clarity.

Suggested improvement
+MIN_TILE_SIZE = 8
+
 def find_tile_size(spatial_dim: int, target: int) -> int | None:
+    """Find a tile size that evenly divides spatial_dim, searching from target down to MIN_TILE_SIZE."""
     if spatial_dim <= target:
         return None
-    for tile in range(min(target, spatial_dim), 7, -1):
+    for tile in range(min(target, spatial_dim), MIN_TILE_SIZE - 1, -1):
         if spatial_dim % tile == 0:
             return tile
     return None

44-44: Prefix unused variables with underscore.

As flagged by static analysis, h_in and w_in are unpacked but never used.

Suggested fix
-    _, c_in, h_in, w_in = orig_dims
+    _, c_in, _h_in, _w_in = orig_dims

139-142: Hard-coded slice limit may be insufficient.

The loop iterates up to 500 slices. Very large models could exceed this. Consider using a more dynamic approach or making this configurable.

Suggested improvement
-    for i in range(500):
-        path = os.path.join(slices_dir, f"slice_{i}", "payload", f"slice_{i}.onnx")
-        if not os.path.exists(path):
-            continue
+    i = 0
+    while True:
+        path = os.path.join(slices_dir, f"slice_{i}", "payload", f"slice_{i}.onnx")
+        if not os.path.exists(path):
+            break
+        i += 1
+        # ... rest of loop body moved here

Alternatively, use glob or os.listdir to discover slice directories dynamically.


153-155: Prefix unused variables and consider tiling based on both dimensions.

Variables c and w are unused. Additionally, only h is checked against tile_size, but w might also need tiling for non-square spatial dimensions.

Suggested fix
-        _, c, h, w = dims
+        _, _c, h, w = dims
         if h <= tile_size:
             continue
+        # Note: Currently only tiles based on height; width is assumed to match or be smaller
dsperse/src/slice/onnx_slicer.py (1)

500-500: Use explicit Optional or union type for tile_size.

As flagged by static analysis (PEP 484), tile_size: int = None should use explicit optional typing.

Suggested fix
-    def slice_model(self, output_path=None, tile_size: int = None):
+    def slice_model(self, output_path=None, tile_size: int | None = None):

Or using Optional:

-    def slice_model(self, output_path=None, tile_size: int = None):
+    def slice_model(self, output_path=None, tile_size: Optional[int] = None):

Note: Optional would need to be imported from typing if not already available.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d6bf96 and c4523b6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • dsperse/src/analyzers/onnx_analyzer.py
  • dsperse/src/cli/slice.py
  • dsperse/src/slice/autotiler.py
  • dsperse/src/slice/onnx_slicer.py
  • dsperse/src/slice/slicer.py
🧰 Additional context used
🧬 Code graph analysis (3)
dsperse/src/cli/slice.py (3)
dsperse/src/slice/onnx_slicer.py (1)
  • slice_model (500-523)
dsperse/src/slice/slicer.py (1)
  • slice_model (58-79)
dsperse/src/slice/utils/torch_utils.py (1)
  • slice_model (558-578)
dsperse/src/slice/onnx_slicer.py (4)
dsperse/src/slice/autotiler.py (1)
  • autotile_slices (130-171)
dsperse/src/cli/slice.py (1)
  • slice_model (111-208)
dsperse/src/slice/slicer.py (1)
  • slice_model (58-79)
dsperse/src/analyzers/onnx_analyzer.py (1)
  • generate_slices_metadata (337-389)
dsperse/src/slice/slicer.py (3)
dsperse/src/cli/slice.py (1)
  • slice_model (111-208)
dsperse/src/slice/onnx_slicer.py (1)
  • slice_model (500-523)
dsperse/src/slice/utils/torch_utils.py (1)
  • slice_model (558-578)
🪛 Ruff (0.14.10)
dsperse/src/slice/onnx_slicer.py

500-500: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

dsperse/src/slice/autotiler.py

44-44: Unpacked variable h_in is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


44-44: Unpacked variable w_in is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


153-153: Unpacked variable c is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


153-153: Unpacked variable w is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🔇 Additional comments (11)
dsperse/src/slice/autotiler.py (4)

1-8: LGTM on imports and constant definition.

The imports and ELEMENTWISE_OPS constant are well-defined. The set of elementwise operations appears comprehensive for typical ONNX models with convolutions.


19-23: LGTM!

The tileability check correctly ensures a model has a Conv operation and all other operations are elementwise.


26-37: LGTM!

The function correctly extracts convolution parameters with appropriate ONNX defaults.


67-69: Halo calculation assumes symmetric padding.

The halo is computed from pads[0] and pads[1] only (top/left padding). If the original Conv has asymmetric padding (where pads[0] != pads[2] or pads[1] != pads[3]), the tiled output may be incorrect.

Consider validating symmetric padding or documenting this assumption:

+    # Assumes symmetric padding; asymmetric padding not supported
+    if pads[0] != pads[2] or pads[1] != pads[3]:
+        logger.warning(f"Asymmetric padding detected in {slice_path}, tiling may produce incorrect results")
     halo_h, halo_w = pads[0], pads[1]
dsperse/src/cli/slice.py (2)

40-43: LGTM on CLI argument additions.

The --tile-size and --no-tile arguments are well-defined with appropriate types and help text. The --no-tile flag correctly acts as an override to disable tiling.


188-193: LGTM on tiling integration in slice command.

The logic correctly derives tile_size (respecting --no-tile as an override), logs it, and passes it through to the slicer. The integration is clean.

dsperse/src/slice/slicer.py (1)

58-79: LGTM on Slicer orchestrator changes.

The tile_size parameter is correctly added to the signature, documented in the docstring, and propagated to the underlying slicer implementation. The existing behavior for output_type conversion is preserved.

dsperse/src/analyzers/onnx_analyzer.py (2)

337-352: LGTM on tiled_info parameter addition.

The optional tiled_info parameter with None default and normalization to empty dict is a clean approach that avoids None checks throughout the method.


378-380: LGTM on tiling metadata injection.

Correctly checks if segment_idx exists in tiled_info and adds tiling metadata to the segment. This integrates seamlessly with the existing metadata flow.

dsperse/src/slice/onnx_slicer.py (2)

9-9: LGTM on autotiler import.

The import correctly brings in the autotile_slices function from the new autotiler module.


514-521: LGTM on autotiling integration.

The autotiling logic is correctly implemented:

  • Initializes tiled_info as empty dict
  • Only calls autotile_slices when tile_size is provided
  • Logs the number of tiled slices for visibility
  • Passes tiled_info to metadata generation

The fallback for slices_dir when output_path is None is consistent with existing patterns in the codebase.

Copy link
Contributor

@millioner millioner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🔥
But mostly a black magic for me. Would be nice to have some comments with explanations.
Also would be nice to use pathlib.Path consistently, but that's related to whole repo. Otherwise looks great

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 103-106: The two methods run_inference and run_inference_multi
have inconsistent failure return values (run_inference returns (False, None)
while run_inference_multi returns (False, str(e))); make them consistent by
choosing one convention and applying it to both: either return (False, None) on
exceptions in run_inference_multi (replace return False, str(e) with return
False, None) or change run_inference to return (False, str(e)) by catching the
exception and returning its string; update the error logging to include the
exception variable (e) in both functions to preserve diagnostics and ensure
callers are updated to expect the chosen tuple shape.
🧹 Nitpick comments (5)
dsperse/src/backends/onnx_models.py (2)

54-56: Session options inconsistency with run_inference.

The multi-input path uses minimal session options while run_inference (lines 23-29) sets additional flags like enable_profiling=False, enable_mem_pattern=False, etc. This inconsistency could lead to different behavior between single and multi-input inference paths.

♻️ Suggested fix: Align session options or extract to shared helper
     @staticmethod
     def run_inference_multi(model_path: str, primary_input_file: str, extra_tensors: dict, output_file: str):
         """Run inference with multiple named inputs."""
         try:
             session_options = ort.SessionOptions()
             session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
+            session_options.enable_profiling = False
+            session_options.enable_mem_pattern = False
+            session_options.enable_cpu_mem_arena = False
+            session_options.enable_mem_reuse = False
+            session_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
             session = ort.InferenceSession(model_path, session_options)

77-90: Silent data truncation and zero-padding may mask input mismatches.

When arr.shape != tuple(shape), the code truncates to fit (line 78). When there's insufficient data, it zero-pads (lines 87-89). Both operations silently alter input data without warning, which could hide upstream bugs or misconfigurations.

Consider logging a warning when truncation or padding occurs to aid debugging.

♻️ Suggested improvement: Add logging for data adjustments
                 if arr.shape != tuple(shape):
+                    logger.warning(f"Input '{name}' shape {arr.shape} reshaped to {shape}")
                     arr = arr.flatten()[:np.prod(shape)].reshape(shape)
                 input_dict[name] = arr
             else:
                 needed = np.prod(shape)
                 flat = primary_np.flatten()
                 if flat.size >= needed:
                     input_dict[name] = flat[:needed].reshape(shape)
                     primary_np = flat[needed:].reshape(-1)
                 else:
+                    logger.warning(f"Input '{name}' zero-padded: had {flat.size}, needed {needed}")
                     padded = np.zeros(needed, dtype=np.float32)
                     padded[:flat.size] = flat
                     input_dict[name] = padded.reshape(shape)
                     primary_np = np.array([], dtype=np.float32)
dsperse/src/run/runner.py (1)

349-354: Prefer next(iter(...)) over list(...)[0] for single element access.

As flagged by static analysis, using next(iter(result['output_tensors'].values())) is more efficient than constructing a full list just to access the first element.

♻️ Suggested fix
             if ok and result:
                 if isinstance(result, dict) and 'output_tensors' in result:
                     for oname, tensor in result['output_tensors'].items():
                         tensor_cache[oname] = tensor
-                    first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                    first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
                     final_tensor = first_output if first_output is not None else result.get('logits')
dsperse/src/slice/autotiler.py (2)

182-183: Hardcoded opset version 13 may cause compatibility issues.

If the original model uses operators from newer opsets (14+), the tiled model may fail to load or produce incorrect results.

♻️ Suggested fix: Preserve original model's opset
     graph = helper.make_graph(nodes, "tiled_slice", [X], [Y], initializers)
-    model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 13)])
-    model.ir_version = 8
+    # Preserve original opset versions
+    model = helper.make_model(graph, opset_imports=m.opset_import)
+    model.ir_version = m.ir_version

240-242: Unused variables c and w in unpacking.

Per static analysis, prefix unused variables with underscore.

♻️ Suggested fix
-        _, c, h, w = dims
+        _, _c, h, _w = dims
         if h <= tile_size:
             continue
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4523b6 and 0fa8fba.

📒 Files selected for processing (4)
  • dsperse/src/backends/onnx_models.py
  • dsperse/src/cli/slice.py
  • dsperse/src/run/runner.py
  • dsperse/src/slice/autotiler.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
  • RunnerUtils (16-411)
  • preprocess_input (183-219)
  • process_final_output (223-239)
  • save_to_file_flattened (272-287)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

353-353: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


364-364: Create your own exception

(TRY002)


364-364: Avoid specifying long messages outside the exception class

(TRY003)


404-404: Do not catch blind exception: Exception

(BLE001)

dsperse/src/backends/onnx_models.py

45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


103-103: Consider moving this statement to an else block

(TRY300)


104-104: Do not catch blind exception: Exception

(BLE001)

dsperse/src/slice/autotiler.py

240-240: Unpacked variable c is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


240-240: Unpacked variable w is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🔇 Additional comments (10)
dsperse/src/cli/slice.py (2)

40-41: LGTM! Clean CLI option for tile size.

The option is well-defined with appropriate type constraint, sensible default, and clear help text explaining the purpose.


186-192: LGTM! Proper propagation of tile_size to the slicer API.

The tile_size is correctly retrieved and passed through to the slicer, with appropriate logging for debugging.

dsperse/src/backends/onnx_models.py (1)

31-43: LGTM! Output tensor collection for downstream propagation.

The addition of output_tensors dictionary enables proper tensor propagation through multi-input slice chains.

dsperse/src/run/runner.py (3)

323-326: LGTM! Good tensor cache initialization for multi-input propagation.

The tensor cache keyed by input name enables proper routing of intermediate outputs to downstream slices that need them.


367-368: Fallback to input_tensor when final_tensor is None may be unexpected.

If no slice successfully produced output, falling back to the original input for probability calculation could yield misleading results. Consider raising an error or logging a warning if no slice produced valid output.


387-410: LGTM! Clean multi-input slice execution method.

The method properly handles path resolution, delegates to OnnxModels.run_inference_multi, and returns a consistent exec_info structure.

dsperse/src/slice/autotiler.py (4)

1-30: LGTM! Excellent documentation.

The module docstring clearly explains the tiling strategy with a concrete example, making the algorithm easy to understand.


33-42: Edge case: Returns None when target >= spatial_dim.

The condition 7 <= target < spatial_dim means if target >= spatial_dim, no tile size is returned. This is likely intentional (no tiling needed if target is larger than input), but could be surprising if spatial_dim itself is a valid tile size (when target == spatial_dim).

Is the behavior of returning None when target >= spatial_dim intentional? If tiling is still desired when target == spatial_dim, the condition should be 7 <= target <= spatial_dim.


117-119: Assumes symmetric padding (top=bottom, left=right).

The halo calculation uses only pads[0] and pads[1] (top and left padding). If the original Conv has asymmetric padding (e.g., [1, 1, 2, 2]), this will compute incorrect tile boundaries.

Consider whether asymmetric padding needs to be supported, or add a check to reject/warn on asymmetric padding:

if pads[0] != pads[2] or pads[1] != pads[3]:
    logger.warning("Asymmetric padding not supported for tiling")
    return None

203-260: LGTM! Well-structured batch tiling orchestration.

The function properly scans directories, filters tileable slices, and persists tiling metadata. The early-exit guards prevent unnecessary processing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 387-400: The _run_slice_multi_input function currently may
continue with an invalid onnx_path after all resolution attempts; update it to
fail fast: after the existing resolution attempts (checking slice_info["path"]
and slice_info["relative_path"]), if onnx_path is still missing or the resolved
path does not exist, either raise a clear exception (e.g., ValueError with
context) or return early so run_inference_multi is not called with an invalid
path; reference the onnx_path variable, slice_info and slice_dir in the error
message to aid debugging.
🧹 Nitpick comments (4)
dsperse/src/run/runner.py (3)

349-361: Suggest safer access pattern for first output.

The logic for extracting the first output tensor could raise an IndexError if output_tensors is unexpectedly empty. Consider using next(iter(...), None) for safer access.

♻️ Proposed refactor
-                first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None

364-364: Consider a custom exception for slice execution failures.

Using a generic Exception makes it harder for callers to catch and handle specific failure modes. A custom exception class (e.g., SliceExecutionError) would improve error handling granularity.


401-415: Improve exception handling specificity.

Catching all exceptions without differentiation makes debugging harder and could mask unexpected errors. Consider catching specific exceptions (e.g., FileNotFoundError, RuntimeError) or at least logging the full traceback.

♻️ Proposed refactor
     start_time = time.time()
     try:
         success, result = OnnxModels.run_inference_multi(
             model_path=onnx_path,
             primary_input_file=primary_input_file,
             extra_tensors=extra_tensors,
             output_file=output_file
         )
     except Exception as e:
+        logger.exception(f"Multi-input inference failed for {onnx_path}")
         success, result = False, str(e)
dsperse/src/backends/onnx_models.py (1)

104-106: Consider more specific exception handling.

Catching all exceptions broadly makes debugging harder. Consider catching specific exceptions that session.run or helper methods might raise, or at least log the full traceback for operational visibility.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0fa8fba and 7e09afb.

📒 Files selected for processing (2)
  • dsperse/src/backends/onnx_models.py
  • dsperse/src/run/runner.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (2)
dsperse/src/run/utils/runner_utils.py (4)
  • RunnerUtils (16-411)
  • preprocess_input (183-219)
  • process_final_output (223-239)
  • save_to_file_flattened (272-287)
dsperse/src/run/runner.py (1)
  • run (46-88)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

353-353: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


364-364: Create your own exception

(TRY002)


364-364: Avoid specifying long messages outside the exception class

(TRY003)


409-409: Do not catch blind exception: Exception

(BLE001)

dsperse/src/backends/onnx_models.py

45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


103-103: Consider moving this statement to an else block

(TRY300)


104-104: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (6)
dsperse/src/run/runner.py (4)

323-326: LGTM: Tensor caching enables multi-input support.

The tensor cache initialization correctly maps the model's primary input name to the input tensor, enabling downstream slices to reference intermediate outputs by name.


336-342: LGTM: Input source selection correctly handles cached and fresh inputs.

The logic appropriately selects the input tensor from the cache when dependencies specify filtered_inputs, otherwise falling back to the model's primary input. The fallback chain (cache → input_tensor) ensures robustness.


343-348: LGTM: Clean separation of single-input and multi-input execution paths.

The branching logic correctly identifies multi-input slices and prepares the extra_tensors dictionary while maintaining backward compatibility for single-input workflows.


370-374: No changes needed — shape handling is correct and covers all tensor formats from all backends.

All backends (ONNX, EZKL, JSTprove) and fallback paths consistently return 2D tensors [batch_size, num_classes]. The else clause (lines 372–373) correctly handles this by flattening and unsqueezing before applying softmax, which works for 1D, 2D, and higher-dimensional tensors. The 3D condition on line 370 is unreachable in practice but does not affect correctness.

dsperse/src/backends/onnx_models.py (2)

36-43: LGTM: Output collection enables multi-output support.

Collecting all model outputs into a named dictionary enables downstream slices to reference specific outputs, which is essential for multi-input scenarios. The backward compatibility is maintained by continuing to use the first output as the primary result.


50-66: LGTM: Multi-input inference setup is correct.

The method properly initializes the ONNX session with optimizations disabled (matching single-input behavior) and prepares the primary input tensor for distribution across model inputs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
dsperse/src/slice/onnx_slicer.py (1)

524-531: Use explicit Optional or union type for tile_size parameter.

Per PEP 484, int = None implies Optional[int] which should be explicit. This improves type clarity and satisfies static analysis.

Suggested fix
-    def slice_model(self, output_path=None, tile_size: int = None):
+    def slice_model(self, output_path=None, tile_size: int | None = None):
dsperse/src/run/runner.py (1)

343-361: Good branching logic for multi-input slices.

The conditional execution path for multi-input slices is well-structured. The tensor cache update logic correctly handles both multi-output and single-output cases.

One minor style improvement per static analysis: prefer next(iter(...)) over list(...)[0] on line 353.

Optional style fix for line 353
-                    first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                    first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e09afb and e5ffd4b.

📒 Files selected for processing (3)
  • dsperse/src/backends/onnx_models.py
  • dsperse/src/run/runner.py
  • dsperse/src/slice/onnx_slicer.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/slice/onnx_slicer.py (2)
dsperse/src/slice/autotiler.py (1)
  • autotile_slices (203-262)
dsperse/src/analyzers/onnx_analyzer.py (1)
  • generate_slices_metadata (337-389)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

353-353: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


364-364: Create your own exception

(TRY002)


364-364: Avoid specifying long messages outside the exception class

(TRY003)


406-406: Do not catch blind exception: Exception

(BLE001)

dsperse/src/slice/onnx_slicer.py

524-524: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

dsperse/src/backends/onnx_models.py

45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


66-66: Abstract raise to an inner function

(TRY301)


66-66: Avoid specifying long messages outside the exception class

(TRY003)


75-75: Abstract raise to an inner function

(TRY301)


75-75: Avoid specifying long messages outside the exception class

(TRY003)


90-90: Consider moving this statement to an else block

(TRY300)


91-91: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (12)
dsperse/src/slice/onnx_slicer.py (6)

9-9: LGTM!

Import of the new autotile_slices function aligns with the autotiler feature being added in this PR.


166-179: LGTM!

The addition of future_inputs parameter enables proper handling of cross-segment tensor dependencies. The default to an empty set is appropriate.


238-257: LGTM!

The revised output determination logic correctly handles three cases: tensors not consumed internally, model outputs, and tensors needed by future segments. This ensures proper cross-segment data flow.


364-382: LGTM!

The pre-computation of segment_inputs_map correctly handles segment boundary calculation with the if i > 0 else 0 guard. This enables accurate identification of cross-segment dependencies before the main slicing loop.


405-413: LGTM!

The computation of future_inputs by aggregating inputs from all downstream segments is correct and properly passed to _get_segment_details.


538-545: LGTM!

The autotiling integration is clean and backward-compatible. When tile_size is not provided, the behavior remains unchanged. The tiled_info is properly propagated to metadata generation.

dsperse/src/backends/onnx_models.py (3)

36-43: LGTM!

The addition of output_tensors collection enables proper tensor propagation for multi-output models, which is essential for the new multi-slice chaining feature.


46-48: Improved error handling.

Returning str(e) instead of None on failure provides actionable debug information to callers. This is a good improvement.


79-93: LGTM!

The execution and output collection follows the same pattern as run_inference, maintaining consistency. The output_tensors map enables downstream tensor propagation.

dsperse/src/run/runner.py (3)

323-326: LGTM!

The tensor cache initialization with the model's primary input enables cross-slice tensor propagation. The default fallback to "images" is reasonable for common image model workflows.


367-378: LGTM!

The probability computation correctly handles various tensor shapes. The fallback to input_tensor when final_tensor is None provides robustness. The 3D case handles batch=1 classification outputs, while the else branch correctly handles 2D and other formats.


387-412: LGTM!

The _run_slice_multi_input method is well-structured with proper path validation and error handling. The exec_info structure is consistent with other slice execution methods, enabling uniform result processing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 46-48: Replace the current logger.warning(...) in the except
Exception as e blocks that return (False, str(e)) with logger.exception(...) so
the traceback is preserved; also restore the previous error-return contract by
returning (False, None) (or, if you intentionally want to change the contract,
return a documented structured error object and update all callers accordingly);
apply this change to both except blocks that currently return (False, str(e)) so
callers continue to receive the original (False, None) behavior unless you
intentionally migrate them to the new structured error format and update call
sites.
- Around line 31-45: The code double-converts raw_output[0] and will crash if
the model has no outputs; update the ONNX run handling in OnnxModels (around
session.run / output_tensors) to first validate that raw_output (or
session.get_outputs()) is non-empty and return/raise a clear error if no outputs
exist, then derive output_tensor from the already-built output_tensors (e.g.,
take the first value from output_tensors) instead of reconverting raw_output[0],
and pass that tensor into RunnerUtils.process_final_output and result population
to avoid the redundant conversion.
🧹 Nitpick comments (1)
dsperse/src/backends/onnx_models.py (1)

54-57: Align SessionOptions between single- and multi-input inference (behavior/perf drift).

run_inference disables several ORT options, but run_inference_multi only disables graph optimizations. If this is unintentional, reuse the same options builder in both paths.

Proposed patch
         try:
             session_options = ort.SessionOptions()
             session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
+            session_options.enable_profiling = False
+            session_options.enable_mem_pattern = False
+            session_options.enable_cpu_mem_arena = False
+            session_options.enable_mem_reuse = False
+            session_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
             session = ort.InferenceSession(model_path, session_options)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5ffd4b and 19e03cb.

📒 Files selected for processing (1)
  • dsperse/src/backends/onnx_models.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
  • RunnerUtils (16-411)
  • preprocess_input (183-219)
  • process_final_output (223-239)
  • save_to_file_flattened (272-287)
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py

45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


65-65: Abstract raise to an inner function

(TRY301)


65-65: Avoid specifying long messages outside the exception class

(TRY003)


75-75: Abstract raise to an inner function

(TRY301)


75-75: Avoid specifying long messages outside the exception class

(TRY003)


77-77: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


79-79: Abstract raise to an inner function

(TRY301)


79-79: Avoid specifying long messages outside the exception class

(TRY003)


94-94: Consider moving this statement to an else block

(TRY300)


95-95: Do not catch blind exception: Exception

(BLE001)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 50-108: model_input.type is a string (e.g. "tensor(float)"), so
the code at run_inference_multi that reads
model_input.type.tensor_type.elem_type will raise AttributeError; instead parse
the type string to extract the element type (e.g. regex "tensor(NAME)") and map
NAME (float, double, int32, int64, uint8, int8, uint16, int16, bool, float16,
uint32, uint64) to the appropriate numpy dtype rather than using
tensor_type.elem_type; update the onnx_dtype_map lookup to accept these string
names and convert arr.astype accordingly; also make session_options in
run_inference_multi match the flags used in run_inference and apply_onnx_shape
(enable_profiling, enable_cpu_mem_arena, set
execution_mode/graph_optimization_level as in those functions) so behavior is
consistent.
- Around line 31-45: The fix: preserve each ONNX output's original dtype when
populating output_tensors instead of forcing dtype=torch.float32; in both
run_inference and run_inference_multi use a conversion that respects the numpy
dtype (e.g., torch.from_numpy or infer dtype from raw_output[i]) so int64/bool
outputs stay intact, and only coerce the primary logits/output used by
RunnerUtils.process_final_output to float32 before calling that helper; also
remove the redundant explicit exception interpolation in logger.exception calls
(call logger.exception(...) without f"...{e}").
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19e03cb and c9f8104.

📒 Files selected for processing (1)
  • dsperse/src/backends/onnx_models.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
  • RunnerUtils (16-411)
  • preprocess_input (183-219)
  • process_final_output (223-239)
  • save_to_file_flattened (272-287)
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py

45-45: Consider moving this statement to an else block

(TRY300)


47-47: Redundant exception object included in logging.exception call

(TRY401)


71-71: Abstract raise to an inner function

(TRY301)


71-71: Avoid specifying long messages outside the exception class

(TRY003)


85-85: Abstract raise to an inner function

(TRY301)


85-85: Avoid specifying long messages outside the exception class

(TRY003)


87-87: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


89-89: Abstract raise to an inner function

(TRY301)


89-89: Avoid specifying long messages outside the exception class

(TRY003)


104-104: Consider moving this statement to an else block

(TRY300)


106-106: Redundant exception object included in logging.exception call

(TRY401)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 58-60: The session options here only set graph_optimization_level,
causing inconsistent behavior with run_inference; update the session_options
before creating the ort.InferenceSession to also set enable_profiling,
enable_mem_pattern, enable_cpu_mem_arena, enable_mem_reuse and execution_mode to
the same values used in run_inference so both inference paths use identical
ort.SessionOptions (keep graph_optimization_level as currently set).
🧹 Nitpick comments (3)
dsperse/src/backends/onnx_models.py (3)

36-52: Output extraction and error handling look correct.

The logic to extract named outputs and map them to tensors is sound. Minor note: logger.exception() automatically includes the exception traceback, so passing e in the f-string is redundant.

🔧 Optional cleanup
         except Exception as e:
-            logger.exception(f"Error during inference: {e}")
+            logger.exception("Error during inference")
             return False, str(e)

91-93: Add strict=True to zip() for defensive validation.

Although the rank is validated at line 88-89, adding strict=True would make the intent explicit and catch any edge cases where dimensions differ unexpectedly.

-                for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape)):
+                for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape, strict=True)):

99-115: Consider extracting shared output processing logic.

Lines 99-112 are nearly identical to lines 36-49 in run_inference. Extracting this into a helper method would reduce duplication and ensure consistent behavior.

♻️ Example helper extraction
@staticmethod
def _process_outputs(session, raw_output, output_file):
    """Process model outputs and save results."""
    model_outputs = session.get_outputs()
    if not model_outputs:
        raise ValueError("Model has no outputs")

    output_tensors = {}
    for i, out in enumerate(model_outputs):
        output_tensors[out.name] = torch.tensor(raw_output[i], dtype=torch.float32)

    first_output_name = model_outputs[0].name
    output_tensor = output_tensors[first_output_name]
    result = RunnerUtils.process_final_output(output_tensor)
    result['output_tensors'] = output_tensors
    RunnerUtils.save_to_file_flattened(result['logits'], output_file)
    return result

Then both methods can call:

result = OnnxModels._process_outputs(session, raw_output, output_file)
return True, result

Also, same minor note on line 114: logger.exception() already includes the exception, so the f-string interpolation is redundant.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9f8104 and 6f41cfb.

📒 Files selected for processing (1)
  • dsperse/src/backends/onnx_models.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
  • RunnerUtils (16-411)
  • preprocess_input (183-219)
  • process_final_output (223-239)
  • save_to_file_flattened (272-287)
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py

38-38: Abstract raise to an inner function

(TRY301)


38-38: Avoid specifying long messages outside the exception class

(TRY003)


49-49: Consider moving this statement to an else block

(TRY300)


51-51: Redundant exception object included in logging.exception call

(TRY401)


75-75: Abstract raise to an inner function

(TRY301)


75-75: Avoid specifying long messages outside the exception class

(TRY003)


89-89: Abstract raise to an inner function

(TRY301)


89-89: Avoid specifying long messages outside the exception class

(TRY003)


91-91: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


93-93: Abstract raise to an inner function

(TRY301)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


101-101: Abstract raise to an inner function

(TRY301)


101-101: Avoid specifying long messages outside the exception class

(TRY003)


112-112: Consider moving this statement to an else block

(TRY300)


114-114: Redundant exception object included in logging.exception call

(TRY401)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dsperse/src/backends/onnx_models.py (1)

75-105: Use strict=True in zip for safer dimension validation.

The multi-input validation logic is thorough and well-structured. However, consider adding strict=True to the zip() call at line 95 for more defensive programming, even though the rank check at lines 91-93 should already ensure matching lengths.

♻️ Proposed fix
-                for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape)):
+                for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape, strict=True)):
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f41cfb and 87b2892.

📒 Files selected for processing (1)
  • dsperse/src/backends/onnx_models.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)


85-85: Abstract raise to an inner function

(TRY301)


85-85: Avoid specifying long messages outside the exception class

(TRY003)


93-93: Abstract raise to an inner function

(TRY301)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


95-95: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


97-97: Abstract raise to an inner function

(TRY301)


97-97: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (6)
dsperse/src/backends/onnx_models.py (6)

3-3: LGTM!

The re import is necessary for the new _parse_onnx_type method.


30-46: LGTM!

The output processing logic correctly converts raw ONNX outputs to structured tensors, derives the primary output from the first model output, and persists results appropriately.


48-59: LGTM!

The refactoring to use helper methods improves code organization and maintainability. The consistent tuple return type (bool, result/error) and proper exception logging are good practices.


121-121: LGTM!

The update to use _create_session maintains consistency with the refactored session creation pattern.


18-28: The deterministic session configuration is correct and all specified SessionOptions (enable_profiling, enable_mem_pattern, enable_cpu_mem_arena, enable_mem_reuse, ORT_DISABLE_ALL, and ORT_SEQUENTIAL) are supported in onnxruntime 1.21.0. The setup appropriately disables optimizations and uses sequential execution to ensure deterministic behavior for ZK proofs.


61-73: Verify complete ONNX type coverage is not necessary for this codebase.

The type mapping in _parse_onnx_type already covers all types actually used in this project. The repository contains only vision inference models (age, doom, net, resnet, version) which standardly use float32, float16, and integer types—all present in the type map. Advanced types like bfloat16, string, and complex are not used anywhere in the codebase. The regex pattern tensor\((\w+)\) correctly matches the standard ONNX type format with no detected variations. The fallback to np.float32 for unmapped types provides safe handling of edge cases and aligns with the codebase's universal float32 convention.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
dsperse/src/run/runner.py (5)

331-341: Unused variable output_name on line 340.

The variable output_name is assigned but never used in _run_tiled_slice. This was flagged by static analysis.

Suggested fix
     input_name = tiling["input_name"]
-    output_name = tiling["output_name"]
+    # output_name available in tiling["output_name"] if needed
     input_tensor = tensor_cache.get(input_name)

365-396: Race condition: concurrent writes to shared tensor_cache dict.

run_single_tile is executed in parallel threads, and each thread writes to tensor_cache (lines 394-395) from within the as_completed loop. While the writes happen in the main thread after future.result(), the closure captures tensor_cache by reference. This is safe as written, but the variable tile_output_name is unpacked but never used (line 390).

Prefix unused variable with underscore
             for future in as_completed(futures):
-                tile_idx, tile_output_name, ok, result = future.result()
+                tile_idx, _tile_output_name, ok, result = future.result()
                 if not ok:
                     raise Exception(f"Tile {tile_idx} failed: {result}")

387-387: Consider capping max_workers based on system resources.

Using min(32, num_tiles) is reasonable, but for very large tile counts or resource-constrained environments, you may want to make this configurable or use os.cpu_count() as a factor.


356-357: Consider defining custom exception classes for tiling errors.

Multiple places raise generic Exception with long messages. Per static analysis hints (TRY002, TRY003), consider creating domain-specific exceptions (e.g., TiledSliceError) for better error handling downstream.

Example custom exception
class TiledSliceError(Exception):
    """Raised when a tiled slice operation fails."""
    pass

# Usage:
if not success:
    raise TiledSliceError(f"Split slice failed: {split_result}")

Also applies to: 391-392, 413-414, 493-493


482-483: Use next(iter(...)) instead of list(...)[0].

Static analysis suggests using next(iter(result['output_tensors'].values())) for efficiency when only the first element is needed.

Suggested fix
                     for oname, tensor in result['output_tensors'].items():
                         tensor_cache[oname] = tensor
-                    first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                    first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
                     final_tensor = first_output if first_output is not None else result.get('logits')
dsperse/src/slice/autotiler.py (4)

166-168: Remove unused variable orig_output.

Static analysis flagged that orig_output is assigned but never used.

Suggested fix
     m = onnx.load(slice_path)
     orig_input = m.graph.input[0]
-    orig_output = m.graph.output[0]
     orig_dims = [d.dim_value for d in orig_input.type.tensor_type.shape.dim]

336-429: Code duplication with create_tile_slice.

tile_conv_slice shares significant logic with create_tile_slice (weight extraction, graph building, non-Conv op handling). Consider refactoring to share common logic if this legacy function needs long-term maintenance.


453-455: Square-only constraint may be limiting.

The check h <= tile_size or h != w restricts tiling to square spatial dimensions only. This may be intentional for simplicity, but it excludes non-square inputs (e.g., 1920x1080). Consider documenting this limitation or making it configurable in the future.


560-563: Silent failure on malformed slice directory names.

When int(entry.name.split("_")[1]) fails, the slice is silently skipped. This is fine for robustness, but consider logging at debug level for troubleshooting.

Add debug logging
+import logging
+logger = logging.getLogger(__name__)

# In autotile_slices:
         try:
             i = int(entry.name.split("_")[1])
         except (IndexError, ValueError):
+            logger.debug(f"Skipping malformed slice directory: {entry.name}")
             continue
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87b2892 and 8adebd0.

📒 Files selected for processing (2)
  • dsperse/src/run/runner.py
  • dsperse/src/slice/autotiler.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/run/runner.py (2)
dsperse/src/utils/utils.py (3)
  • Utils (12-362)
  • write_input (127-131)
  • read_input (134-171)
dsperse/src/backends/onnx_models.py (3)
  • OnnxModels (14-255)
  • run_inference (49-59)
  • run_inference_multi (76-105)
🪛 Ruff (0.14.10)
dsperse/src/slice/autotiler.py

168-168: Local variable orig_output is assigned to but never used

Remove assignment to unused variable orig_output

(F841)

dsperse/src/run/runner.py

340-340: Local variable output_name is assigned to but never used

Remove assignment to unused variable output_name

(F841)


357-357: Create your own exception

(TRY002)


357-357: Avoid specifying long messages outside the exception class

(TRY003)


390-390: Unpacked variable tile_output_name is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


392-392: Create your own exception

(TRY002)


392-392: Avoid specifying long messages outside the exception class

(TRY003)


414-414: Create your own exception

(TRY002)


414-414: Avoid specifying long messages outside the exception class

(TRY003)


482-482: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


493-493: Create your own exception

(TRY002)


493-493: Avoid specifying long messages outside the exception class

(TRY003)


535-535: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (17)
dsperse/src/run/runner.py (6)

9-9: LGTM!

Import of ThreadPoolExecutor and as_completed is appropriate for the parallel tile execution.


46-47: LGTM!

Initialization of tiled_info to None in __init__ is correct.


322-329: LGTM!

The _load_tiled_info method correctly loads tiled metadata from the expected path and handles the case where the file doesn't exist.


439-441: LGTM!

Properly initializing tensor cache with the model's input name and reading the input tensor.


472-476: LGTM!

Good handling of multi-input slices by checking filtered_inputs length and routing appropriately.


516-541: LGTM with minor note.

The _run_slice_multi_input method correctly handles path resolution and error cases. The broad except Exception (line 535) is flagged by static analysis (BLE001), but given this is inference code where various exceptions can occur, it's acceptable to catch broadly here while logging the exception upstream in run_inference_multi.

dsperse/src/slice/autotiler.py (11)

1-27: LGTM!

Clear docstring explaining the tiling architecture and good constant definition for elementwise ops.


30-35: Edge case: find_tile_size may return None unexpectedly for valid inputs.

When target >= spatial_dim, the function returns None. This is correct, but the loop range(target, 7, -1) will also return None if no divisor is found. Consider documenting that this is expected behavior when no suitable tile size exists.


38-44: LGTM!

is_tileable correctly checks for single-input models with Conv and only elementwise ops.


47-58: LGTM!

get_conv_params extracts Conv node attributes with sensible defaults.


61-150: LGTM!

create_split_slice correctly creates a Pad + Slice graph for extracting tiles with halo regions. The ONNX graph construction is well-structured.


177-180: LGTM - good guard against division by zero.

Checking sh == 0 or sw == 0 before division is correct defensive programming.


191-192: LGTM!

Correctly returns None when weights are not found in initializers.


267-333: LGTM!

create_concat_slice correctly builds a hierarchical Concat (rows first, then columns) to reassemble tiles.


432-543: LGTM!

autotile_slice orchestrates the split/tile/concat creation correctly and returns comprehensive metadata.


546-579: LGTM!

autotile_slices correctly scans directories, skips already-tiled slices, and saves aggregated metadata. The use of sorted() ensures deterministic ordering.


224-244: Logic for handling inter-dependent non-Conv ops is sound, but naming collision edge case exists.

When copying non-Conv ops (lines 230–242), input remapping preserves intermediate tensor names from the original graph, enabling correct dependencies between chained ops. For example, if the graph contains Conv → Relu → Add, the Relu's output name is preserved and the Add can reference it through the else clause on line 238.

However, there is a potential edge case: if the original model contains tensors named "conv_out" or matching the pattern f"tile_{slice_idx}_{tile_idx}_*" (used for newly generated input/output names), a naming collision could occur and break the graph structure. While unlikely in practice (models typically use descriptive names), this should be documented or guarded against.

Initializer handling at lines 226–228 correctly includes all non-Conv initializers, which are referenced by name in the remapped inputs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 374-376: The code calls tensor_cache.get(tile_input_name) and
passes the result to Utils.write_input without checking for None; add a
validation after retrieving tile_tensor (from tensor_cache.get(tile_input_name))
to detect missing tensors and raise or log a clear error referencing
tile_input_name and tile_in (or otherwise handle the missing case) before
calling Utils.write_input so downstream failures are avoided and the error
message clearly identifies the missing tile input.
- Line 405: The line building concat_tensors uses direct dict access and can
raise a raw KeyError if a name in concat_info["input_names"] is missing from
tensor_cache; before constructing concat_tensors (or while doing so), validate
that every name in concat_info["input_names"] exists in tensor_cache and if any
are missing raise a clear exception or log an informative error that includes
the missing names and context (e.g., which concat operation or concat_info id),
then build concat_tensors from the validated names; reference the symbols
concat_tensors, concat_info["input_names"], and tensor_cache when implementing
the check.
- Around line 339-341: The code uses input_tensor = tensor_cache.get(input_name)
without validating it; update the runner so after retrieving input_tensor (from
tensor_cache.get(input_name)) you check for None and handle it explicitly (e.g.,
raise a clear exception or log an error and abort) before calling
Utils.write_input; reference the tiling dict (input_name/output_name) and ensure
any error message includes the missing input_name and context (tiling) to make
debugging straightforward.
- Around line 498-499: The current fallback that sets final_tensor =
input_tensor when final_tensor is None can mask failures; instead, detect
whether the inference chain or slice-processing loop actually ran (e.g., check
the slice iteration count or chain length) and if it did but final_tensor is
still None, either log a warning with context (including any error/exception
info) or raise an explicit RuntimeError; only allow the silent fallback to
input_tensor when the chain was intentionally empty. Update the block around
final_tensor/input_tensor to perform this check and use the module logger or
raise the error so silent masking is avoided.
🧹 Nitpick comments (5)
dsperse/src/run/runner.py (5)

322-329: Consider handling malformed JSON gracefully.

If tiled_info.json exists but contains invalid JSON, json.load() will raise an exception that propagates uncaught. For robustness, consider wrapping in a try-except to log a warning and fall back to None.

💡 Suggested improvement
     def _load_tiled_info(self):
         tiled_info_path = self.slices_path / "tiled" / "tiled_info.json"
         if tiled_info_path.exists():
-            with open(tiled_info_path, 'r') as f:
-                self.tiled_info = json.load(f)
-            logger.info(f"Loaded tiled info for slices: {self.tiled_info.get('tiled_indices', [])}")
+            try:
+                with open(tiled_info_path, 'r') as f:
+                    self.tiled_info = json.load(f)
+                logger.info(f"Loaded tiled info for slices: {self.tiled_info.get('tiled_indices', [])}")
+            except json.JSONDecodeError as e:
+                logger.warning(f"Failed to parse tiled_info.json: {e}")
+                self.tiled_info = None
         else:
             self.tiled_info = None

388-396: Prefix unused variable with underscore.

The tile_output_name is unpacked but never used, as flagged by static analysis (RUF059).

♻️ Proposed fix
             for future in as_completed(futures):
-                tile_idx, tile_output_name, ok, result = future.result()
+                tile_idx, _tile_output_name, ok, result = future.result()
                 if not ok:
                     raise Exception(f"Tile {tile_idx} failed: {result}")

484-485: Prefer next(iter(...)) over list slice.

Static analysis (RUF015) suggests using next(iter(result['output_tensors'].values())) for better efficiency.

♻️ Proposed fix
-                        first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                        first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None

537-538: Broad exception catch is acceptable here but could be narrowed.

Static analysis flags the blind Exception catch (BLE001). While acceptable for wrapping external ONNX calls, consider catching more specific exceptions (e.g., RuntimeError, onnxruntime.OrtException) if the failure modes are known.


387-396: Consider adding a timeout to parallel tile execution.

The ThreadPoolExecutor runs tiles without a timeout. If a tile hangs indefinitely (e.g., due to model issues), the entire run will block forever. For production robustness, consider adding a timeout to future.result().

💡 Suggested improvement
+TILE_TIMEOUT_SECONDS = 300  # 5 minutes per tile
+
         with ThreadPoolExecutor(max_workers=min(32, num_tiles)) as executor:
             futures = {executor.submit(run_single_tile, ti): ti for ti in tile_infos}
             for future in as_completed(futures):
-                tile_idx, _tile_output_name, ok, result = future.result()
+                try:
+                    tile_idx, _tile_output_name, ok, result = future.result(timeout=TILE_TIMEOUT_SECONDS)
+                except TimeoutError:
+                    ti = futures[future]
+                    raise Exception(f"Tile {ti['tile_idx']} timed out after {TILE_TIMEOUT_SECONDS}s")
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8adebd0 and c9a8895.

📒 Files selected for processing (1)
  • dsperse/src/run/runner.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

340-340: Local variable output_name is assigned to but never used

Remove assignment to unused variable output_name

(F841)


357-357: Create your own exception

(TRY002)


357-357: Avoid specifying long messages outside the exception class

(TRY003)


390-390: Unpacked variable tile_output_name is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


392-392: Create your own exception

(TRY002)


392-392: Avoid specifying long messages outside the exception class

(TRY003)


414-414: Create your own exception

(TRY002)


414-414: Avoid specifying long messages outside the exception class

(TRY003)


457-457: Avoid specifying long messages outside the exception class

(TRY003)


484-484: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


495-495: Create your own exception

(TRY002)


495-495: Avoid specifying long messages outside the exception class

(TRY003)


537-537: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)
dsperse/src/run/runner.py (1)

46-47: LGTM!

Clean initialization of tiled_info attribute for later population.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 322-330: The _load_tiled_info method should handle malformed JSON:
wrap the json.load call for tiled_info_path in a try/except that catches
json.JSONDecodeError (and optionally ValueError), log a warning via logger
including the file path (tiled_info_path) and the exception message, and set
self.tiled_info = None instead of letting the exception propagate; keep the
existing exists() check and the info log only when parsing succeeds.
- Around line 389-401: The tile worker loop writes to the shared tensor_cache
from multiple threads causing a race condition; wrap all modifications to
tensor_cache (the block that iterates result['output_tensors'] in the loop that
handles future.result() from run_single_tile) with a threading.Lock (create a
lock near where ThreadPoolExecutor is created and acquire/release around the
write) to ensure atomic updates, and rename the unused tile_output_name variable
to _tile_output_name to satisfy static analysis; leave tile_results assignment
as-is but do it while holding the lock if it depends on tensor_cache
consistency.
🧹 Nitpick comments (2)
dsperse/src/run/runner.py (2)

339-344: Remove or use the unused output_name variable.

output_name is assigned on line 340 but never used, as flagged by static analysis.

♻️ Suggested fix
         input_name = tiling["input_name"]
-        output_name = tiling["output_name"]
         input_tensor = tensor_cache.get(input_name)

491-492: Prefer next(iter(...)) over single-element slice.

Per static analysis (RUF015), using next(iter(...)) is more efficient than creating a list for a single element.

♻️ Suggested fix
-                        first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                        first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9a8895 and c27c844.

📒 Files selected for processing (1)
  • dsperse/src/run/runner.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

340-340: Local variable output_name is assigned to but never used

Remove assignment to unused variable output_name

(F841)


343-343: Avoid specifying long messages outside the exception class

(TRY003)


359-359: Create your own exception

(TRY002)


359-359: Avoid specifying long messages outside the exception class

(TRY003)


379-379: Avoid specifying long messages outside the exception class

(TRY003)


394-394: Unpacked variable tile_output_name is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


396-396: Create your own exception

(TRY002)


396-396: Avoid specifying long messages outside the exception class

(TRY003)


411-411: Avoid specifying long messages outside the exception class

(TRY003)


421-421: Create your own exception

(TRY002)


421-421: Avoid specifying long messages outside the exception class

(TRY003)


464-464: Avoid specifying long messages outside the exception class

(TRY003)


491-491: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


502-502: Create your own exception

(TRY002)


502-502: Avoid specifying long messages outside the exception class

(TRY003)


546-546: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (5)
dsperse/src/run/runner.py (5)

9-9: LGTM!

The ThreadPoolExecutor import and tiled_info attribute initialization are appropriate for the new parallel tiled execution feature.

Also applies to: 46-47


405-438: LGTM!

The concat stage logic is well-structured with proper input validation and timing metrics. The error handling follows the pattern established elsewhere in this method.


446-451: LGTM!

The tensor cache initialization and tiled indices extraction properly handle the case when tiled_info is None.


505-518: LGTM!

The final tensor handling provides a reasonable fallback when no output tensor is produced, and the softmax logic handles the expected output dimensions.


527-552: LGTM!

The multi-input slice execution follows the established error handling pattern used in _run_ezkl_slice and _run_jstprove_slice, where broad exception catching ensures graceful degradation with structured error info.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 491-493: The current construction of extra_tensors = {name:
tensor_cache.get(name) for name in filtered_inputs} can introduce None values if
tensor_cache lacks a key, which will then be forwarded into
_run_slice_multi_input / run_inference_multi and trigger confusing ONNX errors;
update the call-site so you first detect missing = [n for n in filtered_inputs
if n not in tensor_cache] and either raise/log a clear error (including the
missing names) or omit missing entries and fail-fast, then build extra_tensors
only from present keys (e.g., {n: tensor_cache[n] for n in filtered_inputs if n
in tensor_cache}) before calling _run_slice_multi_input to avoid passing None
values.
🧹 Nitpick comments (2)
dsperse/src/run/runner.py (2)

344-348: Remove unused variable output_name.

The variable output_name on line 345 is assigned but never used. Based on static analysis hint.

Suggested fix
         input_name = tiling["input_name"]
-        output_name = tiling["output_name"]
         input_tensor = tensor_cache.get(input_name)

500-502: Minor: prefer next(iter(...)) over list slice.

Per static analysis, using next(iter(...)) is more efficient than creating a list just to access the first element.

Suggested fix
-                        first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                        first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c27c844 and 9f66688.

📒 Files selected for processing (1)
  • dsperse/src/run/runner.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

345-345: Local variable output_name is assigned to but never used

Remove assignment to unused variable output_name

(F841)


348-348: Avoid specifying long messages outside the exception class

(TRY003)


364-364: Create your own exception

(TRY002)


364-364: Avoid specifying long messages outside the exception class

(TRY003)


384-384: Avoid specifying long messages outside the exception class

(TRY003)


402-402: Create your own exception

(TRY002)


402-402: Avoid specifying long messages outside the exception class

(TRY003)


418-418: Avoid specifying long messages outside the exception class

(TRY003)


428-428: Create your own exception

(TRY002)


428-428: Avoid specifying long messages outside the exception class

(TRY003)


474-474: Avoid specifying long messages outside the exception class

(TRY003)


501-501: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


512-512: Create your own exception

(TRY002)


512-512: Avoid specifying long messages outside the exception class

(TRY003)


556-556: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (5)
dsperse/src/run/runner.py (5)

8-10: LGTM!

The threading and concurrent.futures imports are appropriate for the parallel tile execution feature.


47-48: LGTM!

Proper initialization of the tiled_info attribute.


323-334: LGTM!

Good defensive handling with graceful fallback on JSON parse errors.


372-407: Thread-safe cache access pattern looks correct.

The implementation correctly:

  1. Populates tensor_cache with split outputs before spawning tile threads
  2. Tile workers only read from cache (via .get())
  3. Uses cache_lock for concurrent writes to tensor_cache

The min(32, num_tiles) worker limit is a reasonable cap. Consider making this configurable if needed for different hardware profiles.


537-562: LGTM with note on exception handling.

The method follows the same pattern as other slice runners. Catching broad Exception (line 556) is pragmatic here since ONNX runtime can raise various exception types. The error is properly captured and returned.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
dsperse/src/run/runner.py (6)

323-334: Consider catching broader file I/O errors.

The method catches json.JSONDecodeError but other file errors (e.g., PermissionError, IOError) would propagate and potentially crash the runner. Since tiling is optional, these should also gracefully degrade.

Suggested improvement
         if tiled_info_path.exists():
             try:
                 with open(tiled_info_path, 'r') as f:
                     self.tiled_info = json.load(f)
                 logger.info(f"Loaded tiled info for slices: {self.tiled_info.get('tiled_indices', [])}")
-            except json.JSONDecodeError as e:
-                logger.warning(f"Failed to parse tiled_info.json: {e}. Tiling disabled.")
+            except (json.JSONDecodeError, OSError) as e:
+                logger.warning(f"Failed to load tiled_info.json: {e}. Tiling disabled.")
                 self.tiled_info = None

344-346: Remove unused variable output_name.

The variable output_name is assigned but never used, as flagged by static analysis.

Suggested fix
         input_name = tiling["input_name"]
-        output_name = tiling["output_name"]
         input_tensor = tensor_cache.get(input_name)

394-407: Thread-safety pattern looks correct but is fragile.

The current concurrency pattern works because:

  1. Split writes all tile inputs before ThreadPoolExecutor starts
  2. Each tile reads only its pre-populated input (no concurrent writes to those keys)
  3. Tile outputs are written with cache_lock protection
  4. Concat runs after all tiles complete

However, the read on line 382 (tensor_cache.get(tile_input_name)) occurs without the lock. While safe given the current execution order, this implicit dependency could break if the code evolves. Consider adding a comment documenting this invariant.


504-505: Use next(iter(...)) for single element access.

Per static analysis, prefer next(iter(...)) over creating a list for single-element access.

Suggested fix
-                        first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                        first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None

364-364: Consider custom exception types for tiled execution errors.

Multiple places raise generic Exception with descriptive messages. Creating specific exception classes (e.g., TiledExecutionError, SplitError, ConcatError) would improve error handling and allow callers to catch specific failure modes.

Example custom exception
class TiledExecutionError(Exception):
    """Raised when tiled slice execution fails."""
    def __init__(self, slice_idx: int, stage: str, message: str):
        self.slice_idx = slice_idx
        self.stage = stage  # 'split', 'tile', 'concat'
        super().__init__(f"Tiled execution failed at {stage} for slice {slice_idx}: {message}")

Also applies to: 402-402, 428-428


472-480: Edge case: empty output_names leaves final_tensor unchanged.

If output_names is empty (line 478), final_tensor won't be updated from the tiled execution. While the metadata should always include outputs, this could cause unexpected behavior if metadata is malformed. The non-tiled path (lines 500-511) has more robust handling.

Suggested defensive check
             if ok and output_names:
                 final_tensor = tensor_cache.get(output_names[0])
+            elif ok:
+                logger.warning(f"Tiled slice {slice_idx} has no output_names in metadata")
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f66688 and c016f6d.

📒 Files selected for processing (1)
  • dsperse/src/run/runner.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/run/runner.py (4)
dsperse/src/utils/utils.py (3)
  • Utils (12-362)
  • write_input (127-131)
  • read_input (134-171)
dsperse/src/backends/onnx_models.py (3)
  • OnnxModels (14-255)
  • run_inference (49-59)
  • run_inference_multi (76-105)
dsperse/src/backends/pth_models.py (1)
  • run_inference (37-75)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-411)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py

345-345: Local variable output_name is assigned to but never used

Remove assignment to unused variable output_name

(F841)


348-348: Avoid specifying long messages outside the exception class

(TRY003)


364-364: Create your own exception

(TRY002)


364-364: Avoid specifying long messages outside the exception class

(TRY003)


384-384: Avoid specifying long messages outside the exception class

(TRY003)


402-402: Create your own exception

(TRY002)


402-402: Avoid specifying long messages outside the exception class

(TRY003)


418-418: Avoid specifying long messages outside the exception class

(TRY003)


428-428: Create your own exception

(TRY002)


428-428: Avoid specifying long messages outside the exception class

(TRY003)


474-474: Avoid specifying long messages outside the exception class

(TRY003)


494-494: Avoid specifying long messages outside the exception class

(TRY003)


504-504: Prefer next(iter(result['output_tensors'].values())) over single element slice

Replace with next(iter(result['output_tensors'].values()))

(RUF015)


515-515: Create your own exception

(TRY002)


515-515: Avoid specifying long messages outside the exception class

(TRY003)


559-559: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (3)
dsperse/src/run/runner.py (3)

8-10: LGTM!

The new imports for threading and concurrent futures are appropriate for the parallel tile execution pattern, and the instance variable initialization is correctly placed.

Also applies to: 47-48


518-521: Verify fallback behavior: using input tensor as output may mask errors.

When no output tensor is produced, the code falls back to using the input tensor and only logs a warning. This could silently produce incorrect predictions if the execution chain fails to propagate outputs correctly. Consider whether this should raise an error instead, or at least include this in the results metadata so callers can detect it.


540-565: LGTM!

The multi-input slice runner follows the established pattern in this file. The broad exception catch (line 559) is appropriate here for inference robustness, returning structured error info rather than crashing the pipeline.

Removed the `metadata` module and replaced its references with direct imports from `analyzers.schema`. Enhanced ONNX slicing by introducing shape inference, slicing point refinements, and runtime tracing utilities to improve slicing artifact accuracy and workflow efficiency.
…nce slice point refinement, and update tests for channel splitting and tiling workflows
Optimized channel splitting logic by removing redundant spatial tiling checks and improving constraint handling. Enhanced ONNX slicing with temporary model tracing, cleanup utilities, and refined error handling during slicing.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dsperse/src/prover.py (1)

245-281: Add guard for EZKL runner availability.

At line 269, self.ezkl_runner can be None if the EZKL CLI failed to initialize (set at line 152). Calling .prove() without checking will raise AttributeError. The JSTprove path already has this guard at line 258; apply the same pattern here.

Proposed fix
+        if self.ezkl_runner is None:
+            return False, ExecutionMethod.EZKL_PROVE, "EZKL CLI not available"
         if not pk_path or not os.path.exists(pk_path):
             return False, ExecutionMethod.EZKL_PROVE, f"Proving key not found at {pk_path}"
🤖 Fix all issues with AI agents
In `@dsperse/src/analyzers/onnx_analyzer.py`:
- Around line 543-571: The _get_tensor_shape function is incorrectly appending
graph.initializer dims into TensorShape.input (in the loop over
graph.initializer); remove that behavior so initializers are not added to
inputs, and instead, if you need to represent weights, collect initializer
shapes into a separate WeightShape structure or list (e.g., weight_shapes) and
return/use it where expected; update references to TensorShape(input=...,
output=...) so only graph.input shapes and graph.output shapes are included and
initializers are not mixed into inputs.

In `@dsperse/src/run/utils/runner_utils.py`:
- Around line 180-188: The current availability checks (has_jst / has_ezkl) only
consider meta.jstprove_circuit_path and meta.ezkl_circuit_path but the preparers
(_prepare_jstprove_meta and _prepare_ezkl_meta) accept meta.circuit_path as a
fallback; update the gating logic so backends are marked available when either
their specific circuit path OR meta.circuit_path is set and the corresponding
runner exists (e.g., include meta.circuit_path in the has_jst and has_ezkl
predicates alongside runner.jstprove_runner / runner.ezkl_runner), then preserve
adding Backend.JSTPROVE, Backend.EZKL, and Backend.ONNX to available as before.

In `@dsperse/src/slice/onnx_slicer.py`:
- Around line 33-41: The cleanup currently deletes the temporary traced file but
leaves self.onnx_path pointing to that deleted path; update the cleanup logic in
OnnxSlicer (e.g., in slice_model's teardown or a dedicated cleanup/reset method)
to restore the original path by saving the original value before replacing it
(store e.g., self._original_onnx_path when you set self._traced_model_path) and
then after removing the traced file set self.onnx_path =
self._original_onnx_path and clear/reset any traced analysis state
(self.traced_shapes, self.traced_dtypes, self._traced_model_path, and
self._original_onnx_path), or alternatively avoid deleting the traced file and
keep it for the instance lifetime—pick one consistent approach and implement it
in the cleanup code that currently removes self._traced_model_path so subsequent
calls to slice_model can reuse the object without file-not-found errors.

In `@dsperse/src/slice/utils/onnx_utils.py`:
- Around line 213-230: The isolate_conv function currently returns an unordered
set (updated_points) which leads to nondeterministic slice ordering; change it
to return a deterministically ordered collection by converting updated_points to
a sorted list of integers (e.g., sorted(updated_points)) before returning so
downstream slicing receives stable, ascending slice indices; ensure the return
type matches callers expecting a list and that the sorting is numeric (use
key=int if needed) while keeping the logic around nodes_dict, max_idx and index
additions intact.
- Around line 66-75: Move the onnxruntime.tools import into the try block inside
apply_symbolic_shape_inference so ImportError is caught; explicitly catch
ImportError and call logger.warning with a clear message that the optional ORT
shape inference tool is unavailable, and for other exceptions use
logger.exception to preserve stack traces when
symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model) fails; keep
returning the original onnx_model on failure.
♻️ Duplicate comments (25)
dsperse/src/analyzers/runner_analyzer.py (1)

66-73: Per-slice routing heuristic is still too broad.

The heuristic (slice_metadata present and layers absent) can still misroute model-level entries that include slice_metadata. Consider a stricter discriminator (e.g., require per-slice-only keys and exclude model-level keys).

dsperse/src/utils/utils.py (1)

18-26: Guard against unconditional IR/opset overrides.

This still force-writes IR 9 + opset versions. If a model targets higher IR/opset, this silently downgrades and can break compatibility. Consider a force_compat=False flag and only upgrade versions unless explicitly forced.

🐛 Suggested adjustment
-    def save_onnx_model(model: onnx.ModelProto, path: str | Path, opset_version: int = 18):
+    def save_onnx_model(model: onnx.ModelProto, path: str | Path, opset_version: int = 18, force_compat: bool = False):
         """Save ONNX model with compatible IR version (9) and specified opset."""
-        model.ir_version = 9
+        if force_compat:
+            model.ir_version = 9
         if model.opset_import:
             for opset in model.opset_import:
                 if opset.domain == "" or opset.domain == "ai.onnx":
-                    opset.version = opset_version
+                    if force_compat or opset.version < opset_version:
+                        opset.version = opset_version
ONNX IR version 9 opset 18 compatibility matrix and guidance on upgrading/downgrading opset_import
dsperse/src/verifier.py (2)

431-433: Split one‑line conditional returns (lint).
Line 431–433 still uses single-line if statements, which triggers lint failures in common setups.

🛠️ Proposed fix
-            if detected == "dslice": return self.verify_dslice(run_path, model_path, backend=backend)
-            if detected == "dsperse": return self.verify_dsperse(run_path, model_path, backend=backend)
-            if detected == "dirs": return self.verify_dirs(run_path, model_path, backend=backend)
+            if detected == "dslice":
+                return self.verify_dslice(run_path, model_path, backend=backend)
+            if detected == "dsperse":
+                return self.verify_dsperse(run_path, model_path, backend=backend)
+            if detected == "dirs":
+                return self.verify_dirs(run_path, model_path, backend=backend)

490-506: Guard missing backend runners in single‑slice verification.
Lines 490–506 call self.jstprove_runner.verify / self.ezkl_runner.verify without checking for None, which can raise AttributeError when the CLI isn’t available.

🛠️ Proposed fix
             if preferred == Backend.JSTPROVE:
+                if self.jstprove_runner is None:
+                    error_msg = "JSTprove CLI not available"
+                else:
                 circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path)
                 input_path, output_path = Path(run_path) / "input.json", Path(run_path) / "output.json"
                 wf = self._get_witness_file_from_run(run_path, slice_id)
                 witness_path = Path(wf) if wf else (Path(run_path) / "output_witness.bin")
                 try:
                     success = self.jstprove_runner.verify(str(proof_path), str(circuit_path), str(input_path),
                                                           str(output_path), str(witness_path))
                 except Exception as e:
                     error_msg = str(e)
             else:
+                if self.ezkl_runner is None:
+                    error_msg = "EZKL CLI not available"
+                else:
                 vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path)
                 settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path)
                 try:
                     success = self.ezkl_runner.verify(str(proof_path), settings_path=settings_path, vk_path=vk_path)
                 except Exception as e:
                     error_msg = str(e)
dsperse/src/prover.py (1)

19-52: Honor output_path for tiled proofs in worker path.
Line 51 hardcodes run_path/.../proof.json, ignoring output_path. This breaks CLI output redirection when using parallel workers.

🛠️ Proposed fix
-    (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, run_path, slice_dir) = args
+    (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, output_path, run_path, slice_dir) = args
@@
-            tile_proof_path = Path(run_path) / slice_id / tile_name / "proof.json"
-            os.makedirs(tile_proof_path.parent, exist_ok=True)
+            proof_rel = os.path.join(slice_id, tile_name)
+            tile_proof_path = Utils.proof_output_path(Path(run_path), proof_rel, output_path)
+            os.makedirs(tile_proof_path.parent, exist_ok=True)
@@
-            work_items.append((slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling.to_dict() if tiling else None, str(run_path), str(slice_dir)))
+            work_items.append((slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling.to_dict() if tiling else None, output_path, str(run_path), str(slice_dir)))

Also applies to: 423-423

dsperse/src/compile/compiler.py (3)

135-139: Use payload/ for non‑channel‑split output dirs.
Lines 135–139 write outputs to slice_dir/<backend>, which diverges from the payload layout used elsewhere (and expected by tests/consumers).

🛠️ Proposed fix
-            if tiling_info:
-                output_dir = os.path.join(slice_dir, be, "tiles")
-            else:
-                output_dir = os.path.join(slice_dir, be)
+            if tiling_info:
+                output_dir = os.path.join(slice_dir, "payload", be, "tiles")
+            else:
+                output_dir = os.path.join(slice_dir, "payload", be)

629-650: Don’t silently skip tiled slices when tile path is missing.
Lines 629–650 currently continue, which drops the slice from compilation and metadata updates. This should be a hard failure (or explicit fallback record) so the error isn’t masked.

🛠️ Proposed fix (fail fast)
-                        logger.warning(f"Slice {idx}: Tiled but tile path not found at {tile_path}, skipping")
-                        print(f"[compile] Slice {idx}: tile.onnx not found at {tile_path}, skipping tiled slice")
-                        continue
+                        msg = f"Slice {idx}: tiled but tile path not found at {tile_path}"
+                        logger.error(msg)
+                        raise FileNotFoundError(msg)
@@
-                    logger.warning(f"Slice {idx}: Tiled but tile path missing in metadata, skipping")
-                    print(f"[compile] Slice {idx}: tiled but tile path missing, skipping")
-                    continue
+                    msg = f"Slice {idx}: tiled but tile path missing in metadata"
+                    logger.error(msg)
+                    raise FileNotFoundError(msg)

708-720: compiled_count should increment once per slice, not per backend.
Line 720 increments inside the per‑backend loop, so it over‑counts when multiple backends compile a single slice.

🛠️ Proposed fix
-            for be, comp_block in compilation_blocks.items():
+            slice_compiled = False
+            for be, comp_block in compilation_blocks.items():
                 ...
-                compiled_count += 1
+                slice_compiled = True
+
+            if slice_compiled:
+                compiled_count += 1
dsperse/src/run/utils/runner_utils.py (2)

120-120: Use explicit Optional in run_onnx_slice signature.
Line 120 uses slice_dir: Path = None, which violates PEP 484. Prefer Path | None.

🛠️ Proposed fix
-    def run_onnx_slice(meta: RunSliceMetadata, input_tensor_path, output_tensor_path, slice_dir: Path = None):
+    def run_onnx_slice(meta: RunSliceMetadata, input_tensor_path, output_tensor_path, slice_dir: Path | None = None):

404-418: Avoid direct tensor["output"] access.
Line 405 raises KeyError if the result dict lacks "output". Use extract_output_tensor (already added) or guard explicitly.

🛠️ Proposed fix
-        output = tensor["output"]
+        output = RunnerUtils.extract_output_tensor(tensor)
+        if output is None:
+            raise ValueError("Missing output tensor in result")
dsperse/src/compile/utils/compiler_utils.py (3)

271-279: Drop unused success parameter.

It’s unused and the compiled state already lives in compilation_info. Removing it reduces API noise.

♻️ Suggested fix
-    def update_slice_metadata(idx: int, filepath: str | Path, success: bool, compilation_info: Dict[str, Any], backend_name: str = Backend.EZKL):
+    def update_slice_metadata(idx: int, filepath: str | Path, compilation_info: Dict[str, Any], backend_name: str = Backend.EZKL):

136-149: JSTprove success detection still keys off circuit.

JSTprove blocks use compiled, so this path can misclassify success. Please switch the detection to compiled (and keep the no‑vk/pk check).

🔧 Suggested fix
-        if compilation_data.get('circuit') and not compilation_data.get('vk_key'):
+        if compilation_data.get('compiled') and not compilation_data.get('vk_key'):
             return _ok('compiled') and _ok('settings')

470-477: Tiled compilation block still only emits tile_0.

When num_tiles > 1, tile_count doesn’t match the files map. Either emit one entry per tile or set tile_count to 1.

🔧 Suggested fix (populate all tile entries)
         if tiling_info:
             num_tiles = tiling_info.get("num_tiles", 1)
-            block.update({
+            tiles = {f"tile_{i}": pref_files for i in range(num_tiles)}
+            block.update({
                 "tiled": True,
                 "tile_size": tiling_info.get("tile_size"),
                 "tile_count": num_tiles,
-                "files": {"tile_0": pref_files}
+                "files": tiles
             })
dsperse/src/analyzers/schema.py (2)

107-115: Guard against short conv_out lists.

conv_out[1] will throw if the list has fewer than 2 entries. Add a defensive pad/default.

🔧 Suggested fix
         if not d:
             return None
         conv_out = d.get("conv_out", [0, 0])
+        if isinstance(conv_out, list) and len(conv_out) < 2:
+            conv_out = [0, 0]
         return cls(
             path=d.get("path", ""),
             conv_out=(conv_out[0], conv_out[1]) if isinstance(conv_out, list) else conv_out,
         )

243-265: Defensive checks for halo/out_tile/stride list lengths.

Short lists will raise IndexError during tuple construction. Pad/normalize first.

🔧 Suggested fix
         halo = d.get("halo", [0, 0])
         out_tile = d.get("out_tile", [0, 0])
         stride = d.get("stride", [1, 1])
+        if isinstance(halo, list) and len(halo) < 2:
+            halo = [0, 0]
+        if isinstance(out_tile, list) and len(out_tile) < 2:
+            out_tile = [0, 0]
+        if isinstance(stride, list) and len(stride) < 2:
+            stride = [1, 1]
dsperse/src/slice/autotiler.py (3)

63-93: Tile size search still ignores asymmetric stride.

find_optimal_tile_size only checks one stride, which can mismatch when sh != sw. Please enforce divisibility against both strides and pass a (sh, sw) tuple.

🔧 Suggested fix
-def find_optimal_tile_size(spatial_dim: int, target: int, min_tile: int = 7, stride: int = 1) -> int | None:
+def find_optimal_tile_size(
+    spatial_dim: int,
+    target: int,
+    min_tile: int = 7,
+    stride: tuple[int, int] = (1, 1),
+) -> int | None:
     """Find the largest divisor of spatial_dim that is <= target and >= min_tile."""
+    sh, sw = stride
     if min_tile <= target < spatial_dim:
         for tile in range(target, min_tile - 1, -1):
-            if spatial_dim % tile == 0 and tile % stride == 0:
+            if spatial_dim % tile == 0 and tile % sh == 0 and tile % sw == 0:
                 return tile
     return None
-        actual_spatial_tile_size = Autotiler.find_optimal_tile_size(spatial_h, target_tile, min_tile, stride=stride)
+        actual_spatial_tile_size = Autotiler.find_optimal_tile_size(spatial_h, target_tile, min_tile, stride=(sh, sw))

180-187: Channel splitting still allows elementwise ops.

Channel‑split paths sum group outputs, which breaks semantics for non‑linear ops. Restrict to pure‑Conv slices or apply elementwise ops after summing.

🔧 Minimal guard
     if conv_params['group'] != 1:
         return False
+    ops = {n.op_type for n in model.graph.node}
+    if ops != {"Conv"}:
+        return False
     return True

566-595: Don’t assume the last non‑Conv op produces the output.

Mapping tile_out to the last elementwise node breaks branching graphs or when the output is produced earlier. Wire the output based on the original graph’s output tensor instead.

🔧 Suggested fix
-        for i, orig_node in enumerate(non_conv_ops):
+        orig_output_name = m.graph.output[0].name if m.graph.output else "tile_out"
+        for i, orig_node in enumerate(non_conv_ops):
             new_inputs = []
             for inp in orig_node.input:
                 if any(inp == n.output[0] for n in m.graph.node if n.op_type == "Conv"):
                     new_inputs.append("conv_out")
                 elif inp == orig_input_name:
                     new_inputs.append("tile_in")
                 else:
                     new_inputs.append(inp)
-            
-            is_last = (i == len(non_conv_ops) - 1)
-            new_outputs = ["tile_out"] if is_last else list(orig_node.output)
+            new_outputs = [("tile_out" if o == orig_output_name else o) for o in orig_node.output]
             attr_kwargs = {a.name: helper.get_attribute_value(a) for a in orig_node.attribute}
             nodes.append(helper.make_node(orig_node.op_type, new_inputs, new_outputs, **attr_kwargs))
dsperse/src/run/runner.py (7)

81-82: Use explicit optional types in __init__ parameters.
These default to None, so make them str | None for clarity and PEP 484 compliance.

♻️ Proposed update
-    def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None, parallel_tiles: int = 1,
-                 circuit_cache_dir: str = None, run_dir: str = None, resume: bool = False, resume_run_dir: str = None):
+    def __init__(self, run_metadata_path: str | None = None, save_metadata_path: str | None = None, parallel_tiles: int = 1,
+                 circuit_cache_dir: str | None = None, run_dir: str | None = None, resume: bool = False, resume_run_dir: str | None = None):

163-163: Make slice_dir explicitly optional in slice helpers.

♻️ Proposed update
-    def _run_ezkl_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path = None):
+    def _run_ezkl_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path | None = None):

-    def _run_jstprove_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path = None):
+    def _run_jstprove_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path | None = None):

Also applies to: 202-202


860-865: Avoid list allocation for first output tensor.

♻️ Proposed tweak
-                        first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+                        first_output = next(iter(result['output_tensors'].values()), None)

26-75: Add JST→EZKL→ONNX fallback in the parallel tile worker.
When has_jst is true, the worker returns failure immediately on JST errors and never tries EZKL or ONNX, even though sequential execution does. This makes parallel runs strictly less robust and ignores the tile_onnx_path argument.

Please mirror the sequential fallback chain (JST → EZKL → ONNX) and use tile_onnx_path when both circuit backends fail.


247-248: Guard against None in tile method counting.
TileResult.method is optional; calling .startswith(...) on None will raise.

🐛 Proposed fix
-        def _count_tile_method(tiles, method_prefix):
-            return sum(1 for t in tiles if (t.method if isinstance(t, TileResult) else t.get("method", "")).startswith(method_prefix))
+        def _count_tile_method(tiles, method_prefix):
+            prefix = str(method_prefix)
+            return sum(
+                1
+                for t in tiles
+                if str((t.method if isinstance(t, TileResult) else t.get("method", "")) or "").startswith(prefix)
+            )

645-685: Respect force_backend in channel-split execution.
When force_backend is JSTPROVE, the current flow can still fall back to EZKL. Align this logic with the tiled path by gating backend availability on force_backend.

🐛 Suggested adjustment
-        forced = self.force_backend
-        has_jst = bool(group.jstprove_circuit_path) and self.jstprove_runner
-        has_ezkl = bool(group.ezkl_circuit_path) and group.vk_path
+        forced = self.force_backend or Backend.AUTO
+        has_jst = bool(group.jstprove_circuit_path) and self.jstprove_runner and forced in (Backend.JSTPROVE, Backend.AUTO)
+        has_ezkl = bool(group.ezkl_circuit_path) and group.vk_path and forced in (Backend.EZKL, Backend.AUTO)
-        if (forced == Backend.JSTPROVE and has_jst) or (has_jst and forced != Backend.EZKL):
+        if has_jst:
             ...
-        if has_ezkl:
+        if has_ezkl:
             ...

951-954: Demo still uses dict access on ExecutionInfo.
slice_results values are ExecutionInfo objects, so info['method'] will raise.

🐛 Proposed fix
-    for slice_id, info in results["slice_results"].items():
-        print(f"  {slice_id}: {info['method']}")
+    for slice_id, info in results["slice_results"].items():
+        method = info.method if hasattr(info, "method") else info.get("method", "unknown")
+        print(f"  {slice_id}: {method}")
🧹 Nitpick comments (5)
tests/unit/test_autotiler.py (2)

329-336: Silence unused locals in tiled runner.

These locals aren’t used; prefixing with underscores keeps lint noise down.

♻️ Proposed tweak
-        sh, sw = stride
+        _sh, _sw = stride
@@
-        _, c, h, w = input_tensor.shape
+        _, _c, _h, _w = input_tensor.shape

585-588: Prefer next(...) for single-element selection.

♻️ Proposed tweak
-        conv_node = [n for n in tile_model.graph.node if n.op_type == "Conv"][0]
+        conv_node = next(n for n in tile_model.graph.node if n.op_type == "Conv")
tests/integration/test_slice.py (1)

256-285: Drop unused capfd parameter.

♻️ Proposed tweak
-    def test_slice_with_channel_splitting(self, tmp_path, capfd):
+    def test_slice_with_channel_splitting(self, tmp_path):
dsperse/src/slice/utils/onnx_utils.py (1)

695-722: Rename unused parameter to make intent explicit.

root isn’t used in _normalize_slice_dirs; prefixing with _ (or removing it) avoids ARG004 noise.

♻️ Suggested tweak
-    def _normalize_slice_dirs(idx: int, seg: dict, root: Path) -> Tuple[Path | None, Path | None]:
+    def _normalize_slice_dirs(idx: int, seg: dict, _root: Path) -> Tuple[Path | None, Path | None]:
dsperse/src/slice/autotiler.py (1)

99-123: Remove/underscore unused parameters to silence lint.

c_out/min_tile in calculate_channel_split_config and weights in _calculate_tile_slice_dims are unused. Prefix with _ (or drop them) to make intent explicit.

♻️ Suggested tweak
-    def calculate_channel_split_config(
-        c_in: int,
-        c_out: int,
-        spatial_h: int,
-        spatial_w: int,
-        tile_size: int,
-        min_tile: int = 7,
-    ) -> tuple[int | None, int | None, str | None]:
+    def calculate_channel_split_config(
+        c_in: int,
+        _c_out: int,
+        spatial_h: int,
+        spatial_w: int,
+        tile_size: int,
+        _min_tile: int = 7,
+    ) -> tuple[int | None, int | None, str | None]:
-    def _calculate_tile_slice_dims(tile_size: int, conv_params: dict, weights: np.ndarray) -> dict:
+    def _calculate_tile_slice_dims(tile_size: int, conv_params: dict, _weights: np.ndarray) -> dict:

Also applies to: 521-539

Comment on lines +543 to 571
@staticmethod
def _get_tensor_shape(slice_path) -> TensorShape:
inputs = []
outputs = []

if slice_path:
onnx_model = onnx.load(slice_path)
graph = onnx_model.graph
for init in graph.initializer:
tensor_shape["input"].append(list(init.dims))
inputs.append(list(init.dims))
for inp in graph.input:
# Convert Dimension objects to simple values (string or number)
dimensions = []
for dim in inp.type.tensor_type.shape.dim:
if dim.HasField('dim_param'):
dimensions.append(dim.dim_param) # Use the string parameter directly
dimensions.append(dim.dim_param)
else:
dimensions.append(dim.dim_value) # Use the numeric value directly
tensor_shape["input"].append(dimensions)
dimensions.append(dim.dim_value)
inputs.append(dimensions)
for out in graph.output:
# Convert Dimension objects to simple values (string or number)
dimensions = []
for dim in out.type.tensor_type.shape.dim:
if dim.HasField('dim_param'):
dimensions.append(dim.dim_param) # Use the string parameter directly
dimensions.append(dim.dim_param)
else:
dimensions.append(dim.dim_value) # Use the numeric value directly
tensor_shape["output"].append(dimensions)
dimensions.append(dim.dim_value)
outputs.append(dimensions)

return tensor_shape
return TensorShape(input=inputs, output=outputs)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not include initializers in TensorShape inputs.
Lines 551‑553 append initializer dims to TensorShape.input. Per schema, TensorShape represents graph I/O shapes; weights belong in WeightShape. Including initializers can corrupt downstream input_shape checks.

🛠️ Proposed fix
-            for init in graph.initializer:
-                inputs.append(list(init.dims))
             for inp in graph.input:
                 dimensions = []
🤖 Prompt for AI Agents
In `@dsperse/src/analyzers/onnx_analyzer.py` around lines 543 - 571, The
_get_tensor_shape function is incorrectly appending graph.initializer dims into
TensorShape.input (in the loop over graph.initializer); remove that behavior so
initializers are not added to inputs, and instead, if you need to represent
weights, collect initializer shapes into a separate WeightShape structure or
list (e.g., weight_shapes) and return/use it where expected; update references
to TensorShape(input=..., output=...) so only graph.input shapes and
graph.output shapes are included and initializers are not mixed into inputs.

Comment on lines 180 to 188
has_jst = bool(meta.jstprove_circuit_path) and getattr(runner, "jstprove_runner", None)
has_ezkl = bool(meta.ezkl_circuit_path) and bool(meta.vk_path)

available = []
if has_jst:
available.append(Backend.JSTPROVE)
if has_ezkl:
available.append(Backend.EZKL)
available.append(Backend.ONNX)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's look at the actual code in runner_utils.py at lines 180-188
wc -l dsperse/src/run/utils/runner_utils.py

Repository: inference-labs-inc/dsperse

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Get the exact code at the specified lines
sed -n '175,195p' dsperse/src/run/utils/runner_utils.py

Repository: inference-labs-inc/dsperse

Length of output: 1048


🏁 Script executed:

#!/bin/bash
# Search for RunSliceMetadata class definition to understand the structure
rg -n "class RunSliceMetadata" --type=py

Repository: inference-labs-inc/dsperse

Length of output: 129


🏁 Script executed:

#!/bin/bash
# Search for circuit path population and usage
rg -n "jstprove_circuit_path|ezkl_circuit_path|circuit_path" --type=py -B2 -A2 | head -100

Repository: inference-labs-inc/dsperse

Length of output: 7486


🏁 Script executed:

#!/bin/bash
# Get RunSliceMetadata class definition
sed -n '429,500p' dsperse/src/analyzers/schema.py

Repository: inference-labs-inc/dsperse

Length of output: 3113


🏁 Script executed:

#!/bin/bash
# Search for circuit_path usage in runner context
rg -n "circuit_path" dsperse/src/run/ --type=py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 9698


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation or comments about circuit_path fields
rg -n "jstprove_circuit_path|ezkl_circuit_path" dsperse/src/analyzers/schema.py --type=py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 1606


🏁 Script executed:

#!/bin/bash
# Look at the full context around lines 180-188 in runner_utils.py
sed -n '170,220p' dsperse/src/run/utils/runner_utils.py

Repository: inference-labs-inc/dsperse

Length of output: 2562


🏁 Script executed:

#!/bin/bash
# Check where prepare_jstprove_metadata and prepare_ezkl_metadata are called
rg -n "prepare_jstprove_metadata|prepare_ezkl_metadata" dsperse/src/run/ --type=py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

#!/bin/bash
# Search for _prepare_jstprove_meta and _prepare_ezkl_meta calls in entire codebase
rg -n "_prepare_jstprove_meta|_prepare_ezkl_meta" --type=py

Repository: inference-labs-inc/dsperse

Length of output: 813


🏁 Script executed:

#!/bin/bash
# Let's see the full implementation of those prepare methods
sed -n '240,260p' dsperse/src/run/utils/runner_utils.py

Repository: inference-labs-inc/dsperse

Length of output: 1020


🏁 Script executed:

#!/bin/bash
# Check the run_onnx_slice, _run_jstprove_slice, and _run_ezkl_slice methods
rg -n "def run_onnx_slice|def _run_jstprove_slice|def _run_ezkl_slice" dsperse/src/run/ --type=py -A5

Repository: inference-labs-inc/dsperse

Length of output: 1904


Backend availability gating skips valid circuits with fallback paths.

Lines 180–181 check only meta.jstprove_circuit_path and meta.ezkl_circuit_path, but the backend preparation methods (_prepare_jstprove_meta and _prepare_ezkl_meta) explicitly fall back to meta.circuit_path when the specific paths are absent. This creates an inconsistency: backends won't be marked as available if only circuit_path is populated, causing the code to skip to ONNX even when ZK backends could be used. The verifier.py code already uses this fallback pattern.

🛠️ Proposed fix
-        has_jst = bool(meta.jstprove_circuit_path) and getattr(runner, "jstprove_runner", None)
-        has_ezkl = bool(meta.ezkl_circuit_path) and bool(meta.vk_path)
+        has_jst = bool(meta.jstprove_circuit_path or meta.circuit_path) and getattr(runner, "jstprove_runner", None)
+        has_ezkl = bool(meta.ezkl_circuit_path or meta.circuit_path) and bool(meta.vk_path)
🤖 Prompt for AI Agents
In `@dsperse/src/run/utils/runner_utils.py` around lines 180 - 188, The current
availability checks (has_jst / has_ezkl) only consider
meta.jstprove_circuit_path and meta.ezkl_circuit_path but the preparers
(_prepare_jstprove_meta and _prepare_ezkl_meta) accept meta.circuit_path as a
fallback; update the gating logic so backends are marked available when either
their specific circuit path OR meta.circuit_path is set and the corresponding
runner exists (e.g., include meta.circuit_path in the has_jst and has_ezkl
predicates alongside runner.jstprove_runner / runner.ezkl_runner), then preserve
adding Backend.JSTPROVE, Backend.EZKL, and Backend.ONNX to available as before.

Comment on lines +33 to 41
logger.info("Initializing analysis and tracing shapes...")
self.traced_shapes, self.traced_dtypes = OnnxUtils.trace_shapes(self.onnx_model)
OnnxUtils.apply_traced_shapes(self.onnx_model, self.traced_shapes, self.traced_dtypes)

# Save a temporary traced model for the analyzer and tensor graph
self._traced_model_path = self.onnx_path.replace(".onnx", "_traced.onnx")
onnx.save(self.onnx_model, self._traced_model_path)
self.onnx_path = self._traced_model_path

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cleanup leaves onnx_path pointing at a deleted file.

After slice_model finishes, the traced model is removed but self.onnx_path still points to it. Reusing the same OnnxSlicer instance will fail when it tries to read the deleted path. Restore the original path (and reset analysis) on cleanup, or keep the traced file for the object’s lifetime.

🛠️ Proposed fix
 class OnnxSlicer:
     def __init__(self, onnx_path, save_path=None):
         self.onnx_path = onnx_path
+        self._original_onnx_path = onnx_path
         self.save_path = save_path
@@
     def _ensure_analysis(self):
@@
-        self._traced_model_path = self.onnx_path.replace(".onnx", "_traced.onnx")
+        self._traced_model_path = self._original_onnx_path.replace(".onnx", "_traced.onnx")
         onnx.save(self.onnx_model, self._traced_model_path)
         self.onnx_path = self._traced_model_path
@@
         finally:
             if self._traced_model_path and os.path.exists(self._traced_model_path):
                 logger.info(f"Cleaning up temporary traced model: {self._traced_model_path}")
                 os.remove(self._traced_model_path)
+            # Restore original path and reset analysis for safe reuse
+            self.onnx_path = self._original_onnx_path
+            self.onnx_model = onnx.load(self._original_onnx_path)
+            self.analysis = None
+            self.onnx_analyzer = None

Also applies to: 189-201

🤖 Prompt for AI Agents
In `@dsperse/src/slice/onnx_slicer.py` around lines 33 - 41, The cleanup currently
deletes the temporary traced file but leaves self.onnx_path pointing to that
deleted path; update the cleanup logic in OnnxSlicer (e.g., in slice_model's
teardown or a dedicated cleanup/reset method) to restore the original path by
saving the original value before replacing it (store e.g.,
self._original_onnx_path when you set self._traced_model_path) and then after
removing the traced file set self.onnx_path = self._original_onnx_path and
clear/reset any traced analysis state (self.traced_shapes, self.traced_dtypes,
self._traced_model_path, and self._original_onnx_path), or alternatively avoid
deleting the traced file and keep it for the instance lifetime—pick one
consistent approach and implement it in the cleanup code that currently removes
self._traced_model_path so subsequent calls to slice_model can reuse the object
without file-not-found errors.

Comment on lines +66 to +75
def apply_symbolic_shape_inference(onnx_model: onnx.ModelProto) -> onnx.ModelProto:
"""Apply symbolic shape inference to the model."""
from onnxruntime.tools import symbolic_shape_infer
logger.info("Applying shape inference to model...")
try:
onnx_model = symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model)
logger.info("Shape inference applied successfully")
except Exception as e:
logger.warning(f"Shape inference failed: {e}, continuing with original model")
return onnx_model
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n dsperse/src/slice/utils/onnx_utils.py | sed -n '60,80p'

Repository: inference-labs-inc/dsperse

Length of output: 1229


🏁 Script executed:

rg "from onnxruntime" dsperse/src --type py -n

Repository: inference-labs-inc/dsperse

Length of output: 169


🏁 Script executed:

find dsperse -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | head -5

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

find . -maxdepth 2 -type f \( -name "*.txt" -o -name "*.toml" -o -name "*.cfg" -o -name "*.ini" \) | grep -E "(require|setup|pyproject|poetry)" | head -10

Repository: inference-labs-inc/dsperse

Length of output: 105


🏁 Script executed:

cat requirements.txt && echo "---" && cat pyproject.toml | grep -A 20 -B 5 "onnx"

Repository: inference-labs-inc/dsperse

Length of output: 50382


🏁 Script executed:

rg "logger\.(exception|error|warning)" dsperse/src --type py -B 2 -A 2 | head -80

Repository: inference-labs-inc/dsperse

Length of output: 4695


🏁 Script executed:

rg "logger\.exception" dsperse/src --type py

Repository: inference-labs-inc/dsperse

Length of output: 427


🏁 Script executed:

rg "logger\.exception" dsperse/src/backends -B 3 -A 1 --type py

Repository: inference-labs-inc/dsperse

Length of output: 1850


Handle missing ORT shape inference module and preserve exception stack traces.

If onnxruntime.tools.symbolic_shape_infer is unavailable, the import will fail before the try block and won't be caught. Additionally, using logger.warning for unexpected inference failures discards the stack trace. Consider moving the import inside the try block, handling ImportError explicitly with logger.warning (matching the pattern for optional tools), and using logger.exception to preserve stack context for actual inference failures.

Suggested fix
-        from onnxruntime.tools import symbolic_shape_infer
-        logger.info("Applying shape inference to model...")
-        try:
-            onnx_model = symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model)
-            logger.info("Shape inference applied successfully")
-        except Exception as e:
-            logger.warning(f"Shape inference failed: {e}, continuing with original model")
+        try:
+            from onnxruntime.tools import symbolic_shape_infer
+            logger.info("Applying shape inference to model...")
+            onnx_model = symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model)
+            logger.info("Shape inference applied successfully")
+        except ImportError as e:
+            logger.warning(f"Symbolic shape inference unavailable: {e}; continuing with original model")
+        except Exception:
+            logger.exception("Shape inference failed; continuing with original model")
🧰 Tools
🪛 Ruff (0.14.13)

73-73: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@dsperse/src/slice/utils/onnx_utils.py` around lines 66 - 75, Move the
onnxruntime.tools import into the try block inside
apply_symbolic_shape_inference so ImportError is caught; explicitly catch
ImportError and call logger.warning with a clear message that the optional ORT
shape inference tool is unavailable, and for other exceptions use
logger.exception to preserve stack traces when
symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model) fails; keep
returning the original onnx_model on failure.

Comment on lines +213 to +230
def isolate_conv(slice_points, model_metadata):
"""
Ensure each Conv layer gets its own isolated slice by adding slice points
immediately after each Conv node.
"""
updated_points = set(slice_points)
nodes_dict = model_metadata.get("nodes", {})
max_idx = max((n.get("index", 0) for n in nodes_dict.values()), default=0)

for _node_name, node_info in nodes_dict.items():
if node_info.get("node_type") == "Conv":
idx = node_info.get("index")
if idx is None:
continue
updated_points.add(idx)
if idx + 1 <= max_idx:
updated_points.add(idx + 1)
return updated_points
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return deterministic, sorted slice points.

set iteration order is nondeterministic; downstream slicing expects ordered indices. Return a sorted list to avoid unstable slice boundaries.

🔧 Suggested fix
-        return updated_points
+        return sorted(updated_points)
🤖 Prompt for AI Agents
In `@dsperse/src/slice/utils/onnx_utils.py` around lines 213 - 230, The
isolate_conv function currently returns an unordered set (updated_points) which
leads to nondeterministic slice ordering; change it to return a
deterministically ordered collection by converting updated_points to a sorted
list of integers (e.g., sorted(updated_points)) before returning so downstream
slicing receives stable, ascending slice indices; ensure the return type matches
callers expecting a list and that the sorting is numeric (use key=int if needed)
while keeping the logic around nodes_dict, max_idx and index additions intact.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/test_compile.py (1)

13-23: Silence unused fixture args (model_name, capfd).
These are currently unused and will trigger Ruff ARG002.

🧹 Proposed fix
         if not jstprove_available or not ezkl_available:
             pytest.skip("Skipping compile happy path: required backends unavailable")
+        _ = (model_name, capfd)
🤖 Fix all issues with AI agents
In `@dsperse/src/analyzers/runner_analyzer.py`:
- Around line 559-607: The function initialize_run_metadata has implicit
Optional types for parameters run_dir and output_path; update the signature to
use explicit optional types (e.g., run_dir: Optional[Path] or run_dir: Path |
None and output_path: Optional[str] or output_path: str | None depending on
project Python typing style), add the corresponding import from typing if using
Optional, and ensure all internal references to run_dir and output_path remain
unchanged; keep default values None and adjust any other annotated usages
(return type remains tuple[Path, bool, RunMetadata]) to eliminate implicit
Optional annotations.

In `@dsperse/src/backends/ezkl.py`:
- Around line 78-90: The os.makedirs call should be guarded against an empty
dirname to avoid calling makedirs("") when output_file is just a filename;
replace the direct os.makedirs(os.path.dirname(output_file), exist_ok=True) with
logic that computes dirname = os.path.dirname(output_file) and only calls
os.makedirs(dirname, exist_ok=True) if dirname is non-empty (apply the same fix
to the other occurrence around lines 126-128), referencing the output_file
variable and the existing os.path.dirname usage in dsperse/src/backends/ezkl.py.

In `@dsperse/src/backends/utils/jstprove_utils.py`:
- Around line 45-61: The compatibility check in is_compatible currently calls
onnx.load(...) which attempts to load external tensor sidecars and can raise a
LOAD_ERROR; update the onnx.load call in is_compatible (where model is assigned)
to disable external data loading (pass the parameter to turn off
external/sidecar loading, e.g., load_external_data=False or the equivalent
supported by the onnx API) so the function only inspects
model.graph.node/op_type against JSTPROVE_SUPPORTED_OPS and avoids false
negatives when sidecar files are missing; keep the existing exception handling
and return values intact.

In `@dsperse/src/cli/compile.py`:
- Around line 80-82: Remove the unused local variable by deleting the assignment
to layers (the line using getattr(args, 'layers', None)) in compile.py; keep the
backend assignment (backend = getattr(args, 'backend', None)) as-is or only
remove layers if you intend to use it elsewhere—ensure no other code relies on
the local variable layers (search for 'layers' in the same module) before
removal to avoid breaking references.
- Around line 89-99: The current logic allows compilation to proceed with "all
layers" when the user supplied layers but validation returned None; change the
flow so that if args.layers was provided (getattr(args, 'layers', None) is
truthy), target_path is a directory, and _check_layers(...) returned no valid
layers (validated_layers is falsy/None), the command aborts with a clear error
(raise/exit/log) instead of falling back to all layers; update the block that
sets validated_layers/effective_layers to check for this condition and surface a
descriptive error before attempting to combine backend into effective_layers.

In `@dsperse/src/compile/compiler.py`:
- Around line 34-52: The tuple parameter named slice_data in the static method
_compile_slice_worker is unused and triggers a RUF059 lint warning; rename it to
_slice_data (or _ ) in the function signature and update the local unpacking of
args (idx, _slice_data, base_path, slice_dir, backends_to_build, tiling_info,
channel_split_info, compilation_slice_data) so the rest of the logic (calls to
Compiler.compile_channel_split_groups and Compiler.compile_slice and the results
dict) remains unchanged.

In `@dsperse/src/run/runner.py`:
- Around line 286-291: The local binding input_name returned from
RunnerUtils.prepare_channel_split_config in run_channel_split_inference is
unused and triggers Ruff RUF059; rename it to a throwaway variable (e.g.,
_input_name or _) in the tuple assignment to silence the linter while keeping
the call and returned values intact, i.e., update the unpacking of
RunnerUtils.prepare_channel_split_config in run_channel_split_inference
accordingly.

In `@dsperse/src/run/utils/runner_utils.py`:
- Around line 785-786: The parameter annotation for backend in
get_tile_execution_config should use the explicit union syntax instead of an
implicit Optional; update the function signature (get_tile_execution_config) to
annotate backend as str | None (or typing.Union[str, None] if older typing is
required) and ensure any references or overloads using the same parameter (e.g.,
has_jst_runner usage) are updated accordingly to match the explicit union type.
- Around line 632-633: The signature of execute_channel_group uses an implicit
Optional by declaring backend: str = None; update the annotation to be explicit
by changing the backend parameter to either backend: str | None = None (Python
3.10+ union syntax) or backend: Optional[str] = None with a corresponding import
from typing, ensuring the function execute_channel_group and any callers remain
type-correct.

In `@tests/integration/test_compile.py`:
- Around line 468-471: The loop in tests/integration/test_compile.py uses an
unused loop variable g_idx which triggers Ruff B007; change the loop to not bind
the unused index by either iterating values directly (replace "for g_idx, files
in group_files.items():" with "for files in group_files.values():") or use a
throwaway name (e.g., "for _, files in group_files.items():") so only the used
symbol files remains; update the loop that asserts "compiled" in files
accordingly.
- Around line 565-577: The test currently defines a pytest parameter
"model_name" but the function test_compile_resume_tiled does not use it, causing
Ruff ARG002; update both the parametrize call and the test signature to use an
unused-name convention (e.g., change "model_name" to "_model_name" in the
pytest.mark.parametrize decorator and rename the function parameter to
_model_name) so the linter recognizes the parameter is intentionally unused
without changing test behavior.
- Around line 475-485: Rename the unused test parameters to start with an
underscore to silence Ruff ARG002: change the function signature in
test_compile_parallel from (self, model_name, model_dir, slices_output_dir,
jstprove_available, capfd) to (self, _model_name, model_dir, slices_output_dir,
_jstprove_available, capfd) and update the pytest.mark.parametrize decorator to
parametrize("_model_name", ["doom"]) so the param name matches; no other logic
changes are required in test_compile_parallel.

In `@tests/integration/test_pipeline.py`:
- Line 24: Assign or otherwise silence the unused test parameter `model_name` in
each affected test function (e.g., `test_sliced_output_matches_original`, and
the other tests at the listed locations) to avoid Ruff ARG002; for example add a
single statement like `_ = model_name` (or an inline `# noqa: ARG002` comment)
at the start of each test body so the parameter is referenced and the linter
warning is suppressed.
🧹 Nitpick comments (12)
dsperse/src/backends/utils/jstprove_utils.py (1)

102-108: Potential issue with non-numeric input data.

torch.tensor(data) will fail if data contains non-numeric types or is a nested structure that PyTorch cannot infer. Consider adding basic validation or wrapping in a try-except with a meaningful error message.

💡 Suggested defensive handling
     `@staticmethod`
     def convert_to_logits(data: Any) -> torch.Tensor:
         """Helper to convert data to logits tensor with batch dimension."""
-        logits = torch.tensor(data)
+        if not isinstance(data, (list, tuple, torch.Tensor)) and not hasattr(data, '__iter__'):
+            raise TypeError(f"Cannot convert {type(data).__name__} to logits tensor")
+        logits = torch.tensor(data) if not isinstance(data, torch.Tensor) else data
         if logits.dim() == 1:
             logits = logits.unsqueeze(0)
         return logits
dsperse/src/backends/jstprove.py (1)

133-133: Minor style: Multiple statements on one line.

Consider separating the raise statement for readability.

📝 Suggested fix
-            if not ok: raise RuntimeError(f"Circuit compilation failed: {err}")
+            if not ok:
+                raise RuntimeError(f"Circuit compilation failed: {err}")
dsperse/src/run/utils/runner_utils.py (2)

820-820: Multiple statements on one line.

For consistency and readability, separate the raise statement.

📝 Suggested fix
-            if tile_tensor is None: raise ValueError(f"Missing tile input tensor '{cache_input_name}'")
+            if tile_tensor is None:
+                raise ValueError(f"Missing tile input tensor '{cache_input_name}'")

736-767: Remove unused exec_info parameter.

The exec_info parameter is declared but never used in the method body. It can be safely removed since the caller (runner.py line 389) already retains access to exec_info from its own context and uses it after the method returns.

🔧 Proposed fix
     `@staticmethod`
-    def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, exec_info, tensor_cache: dict) -> torch.Tensor | None:
+    def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, tensor_cache: dict) -> torch.Tensor | None:

Update call site in dsperse/src/run/runner.py:389-390:

-            final_tensor = RunnerUtils.process_inference_result(current_slice_id, info, ok, result, exec_info,
+            final_tensor = RunnerUtils.process_inference_result(current_slice_id, info, ok, result,
                                                                 tensor_cache)
dsperse/src/analyzers/schema.py (1)

556-568: Consider adding error handling for JSON operations.

The load and save methods perform file I/O without error handling. Consider whether callers expect exceptions to propagate or if wrapping with meaningful error messages would improve debugging.

dsperse/src/analyzers/runner_analyzer.py (2)

94-98: Closure does not bind loop variable slice_key.

The _norm function captures slice_key by reference. If this function were to be called after the loop iteration changes slice_key, it would use the wrong value. In the current code, _norm is only called within the same iteration, so this is safe, but it's fragile.

💡 Suggested fix using default argument binding
-            def _norm(rel: Optional[str]) -> Optional[str]:
+            def _norm(rel: Optional[str], _slice_key: str = slice_key) -> Optional[str]:
                 if not rel:
                     return None
-                return RunnerAnalyzer.with_slice_prefix(RunnerAnalyzer.rel_from_payload(rel) or rel, slice_key)
+                return RunnerAnalyzer.with_slice_prefix(RunnerAnalyzer.rel_from_payload(rel) or rel, _slice_key)

206-208: Same closure binding issue with parent_dir.

The _join function captures parent_dir by reference. Apply the same fix pattern.

💡 Suggested fix
-            def _join(rel: Optional[str]) -> Optional[str]:
-                return os.path.join(parent_dir, rel) if rel else None
+            def _join(rel: Optional[str], _parent_dir: str = parent_dir) -> Optional[str]:
+                return os.path.join(_parent_dir, rel) if rel else None
tests/integration/test_full_run.py (1)

152-157: Suppress unused fixture parameters to satisfy Ruff ARG002.

The model_name, run_output_dir, and capfd parameters are provided via pytest fixtures/parametrize but unused in the test body. Add a no-op assignment to acknowledge intentional binding.

🧹 Suggested fix
     def test_tiled_full_run(self, model_name: str, model_dir: Path, slices_output_dir: Path, run_output_dir: Path, capfd, jstprove_available):
         """Verify end-to-end full cycle with tiling: slice -> compile -> run -> prove -> verify."""
         if not jstprove_available:
             pytest.skip("JSTprove unavailable")
+        _ = (model_name, run_output_dir, capfd)  # Unused but required by fixtures
dsperse/src/backends/utils/ezkl_utils.py (3)

35-59: Improve exception chaining and type hints in SRS check wrapper.

A few improvements would address static analysis hints and improve debuggability:

  1. Line 37: Use explicit None union type per PEP 604
  2. Line 58: Chain the exception with from e for proper traceback preservation
🧹 Suggested fix
     `@staticmethod`
     def run_ezkl_command_with_srs_check(
-        cmd_list: list, env: dict = None, check: bool = True, capture_output: bool = True, text: bool = True, **kwargs
+        cmd_list: list, env: dict | None = None, check: bool = True, capture_output: bool = True, text: bool = True, **kwargs
     ) -> subprocess.CompletedProcess:
         """Wrapper for subprocess.run that detects SRS errors and bubbles them up."""
         try:
             process = subprocess.run(
                 cmd_list, env=env, check=check, capture_output=capture_output, text=text, **kwargs
             )

             if process.stderr:
                 srs_error = EZKLUtils.detect_srs_error(process.stderr)
                 if srs_error:
                     logger.error(f"EZKL SRS Error in command '{' '.join(cmd_list)}': {srs_error}")
                     raise RuntimeError(f"DSperse detected SRS error: {srs_error}")

             return process

         except subprocess.CalledProcessError as e:
             if e.stderr:
                 srs_error = EZKLUtils.detect_srs_error(e.stderr)
                 if srs_error:
-                    logger.error(f"EZKL SRS Error in command '{' '.join(cmd_list)}': {srs_error}")
-                    raise RuntimeError(f"DSperse detected SRS error: {srs_error}")
+                    logger.exception(f"EZKL SRS Error in command '{' '.join(cmd_list)}': {srs_error}")
+                    raise RuntimeError(f"DSperse detected SRS error: {srs_error}") from e
             raise

61-85: Unused model_path parameter.

The model_path parameter is declared but never used in initialize_compilation_artifacts. If it's reserved for future use or API consistency, consider prefixing with underscore or adding a brief comment.

🧹 Suggested fix
     `@staticmethod`
-    def initialize_compilation_artifacts(model_path: Path, output_path: Path, input_file_path: Optional[str]) -> Dict[str, Any]:
+    def initialize_compilation_artifacts(_model_path: Path, output_path: Path, input_file_path: Optional[str]) -> Dict[str, Any]:
         """Initialize paths and metadata for EZKL compilation."""

87-97: Consider using logging.exception for automatic traceback.

When logging errors within an exception handler, logging.exception automatically includes the traceback, which aids debugging.

🧹 Suggested fix
         except (KeyError, TypeError, IndexError) as e:
-            logger.error(f"Could not process witness data: {e}")
+            logger.exception(f"Could not process witness data: {e}")
             return None
tests/integration/test_run.py (1)

399-462: Suppress unused model_name in multiple tiling tests.

Several parametrized tests (test_run_tiling_ezkl, test_run_tiling_onnx_fallback, test_run_resume_complex) have unused model_name parameters. Consider adding _ = model_name at the start of each test body to satisfy Ruff ARG002.

🧹 Batch fix for all affected tests
     def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available):
         """12. Verify that tiling works with EZKL backend."""
         if not ezkl_available:
             pytest.skip("EZKL unavailable")
+        _ = model_name

...

     def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
         """13. Verify that tiling falls back to ONNX if no circuits are available."""
+        _ = model_name

...

     def test_run_resume_complex(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
         """14. Verify that resuming works for tiled/channel-split runs."""
+        _ = model_name

Comment on lines +559 to +607
@staticmethod
def initialize_run_metadata(slices_path: Path, run_dir: Path = None, output_path: str = None, format: str = "dirs") -> tuple[Path, bool, RunMetadata]:
"""Set up the run directory and prepare the run metadata, handling resumes if applicable."""
import time
from dsperse.src.run.utils.runner_utils import RunnerUtils

# 1. Determine base run directory logic
requested_dir = Path(output_path) if output_path else run_dir
resume = False

if requested_dir:
if (requested_dir / "metadata.json").exists():
actual_run_dir = requested_dir
resume = True
elif requested_dir.exists() and not (requested_dir / "metadata.json").exists():
latest = RunnerUtils.find_most_recent_run(requested_dir)
if latest:
actual_run_dir = latest
resume = True
else:
ts = time.strftime('%Y%m%d_%H%M%S')
actual_run_dir = requested_dir / f"run_{ts}"
resume = False
else:
actual_run_dir = requested_dir
resume = actual_run_dir.exists() and (actual_run_dir / "metadata.json").exists()
else:
ts = time.strftime('%Y%m%d_%H%M%S')
base_dir = slices_path.parent
if base_dir.name == "slices":
base_dir = base_dir.parent
actual_run_dir = base_dir / "run" / f"run_{ts}"
resume = False

# 2. Load or generate metadata
if resume:
metadata_path = actual_run_dir / "metadata.json"
with open(metadata_path, 'r') as f:
run_metadata = RunMetadata.from_dict(json.load(f))
logger.info(f"Resuming from existing run directory: {actual_run_dir}")
else:
actual_run_dir.mkdir(parents=True, exist_ok=True)
save_path = actual_run_dir / "metadata.json"
run_metadata = RunMetadata.from_dict(
RunnerAnalyzer.generate_run_metadata(slices_path, save_path, format)
)
logger.info(f"Started new run in directory: {actual_run_dir}")

return actual_run_dir, resume, run_metadata
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type annotations.

PEP 484 prohibits implicit Optional. Both run_dir and output_path parameters should use explicit union syntax.

🔧 Proposed fix
     `@staticmethod`
-    def initialize_run_metadata(slices_path: Path, run_dir: Path = None, output_path: str = None, format: str = "dirs") -> tuple[Path, bool, RunMetadata]:
+    def initialize_run_metadata(slices_path: Path, run_dir: Path | None = None, output_path: str | None = None, format: str = "dirs") -> tuple[Path, bool, RunMetadata]:
🧰 Tools
🪛 Ruff (0.14.14)

560-560: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


560-560: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In `@dsperse/src/analyzers/runner_analyzer.py` around lines 559 - 607, The
function initialize_run_metadata has implicit Optional types for parameters
run_dir and output_path; update the signature to use explicit optional types
(e.g., run_dir: Optional[Path] or run_dir: Path | None and output_path:
Optional[str] or output_path: str | None depending on project Python typing
style), add the corresponding import from typing if using Optional, and ensure
all internal references to run_dir and output_path remain unchanged; keep
default values None and adjust any other annotated usages (return type remains
tuple[Path, bool, RunMetadata]) to eliminate implicit Optional annotations.

Comment on lines +78 to 90
input_file, model_path, output_file, vk_path = str(input_file), str(model_path), str(output_file), str(vk_path)
if not os.path.exists(input_file): raise FileNotFoundError(f"Input file not found: {input_file}")
if not os.path.exists(model_path): raise FileNotFoundError(f"Model file not found: {model_path}")
if not os.path.exists(vk_path): raise FileNotFoundError(f"Verification key file not found: {vk_path}")

# --- SRS Setup ---
if settings_path and os.path.exists(settings_path):
logrows = get_logrows_from_settings(settings_path)
if logrows:
if not ensure_srs(logrows):
return False, f"Failed to ensure SRS for logrows={logrows}"
logrows = get_logrows_from_settings(str(settings_path))
if logrows and not ensure_srs(logrows): return False, f"Failed to ensure SRS for logrows={logrows}"

# --- Execution ---
os.makedirs(os.path.dirname(output_file), exist_ok=True)
try:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard os.makedirs for empty dirname.
If the output path is just a filename, os.path.dirname(...) is empty and makedirs("") will fail.

🧹 Proposed fix
-        os.makedirs(os.path.dirname(output_file), exist_ok=True)
+        os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True)
...
-        os.makedirs(os.path.dirname(proof_path), exist_ok=True)
+        os.makedirs(os.path.dirname(proof_path) or ".", exist_ok=True)

Also applies to: 126-128

🧰 Tools
🪛 Ruff (0.14.14)

79-79: Multiple statements on one line (colon)

(E701)


79-79: Avoid specifying long messages outside the exception class

(TRY003)


80-80: Multiple statements on one line (colon)

(E701)


80-80: Avoid specifying long messages outside the exception class

(TRY003)


81-81: Multiple statements on one line (colon)

(E701)


81-81: Avoid specifying long messages outside the exception class

(TRY003)


86-86: Multiple statements on one line (colon)

(E701)

🤖 Prompt for AI Agents
In `@dsperse/src/backends/ezkl.py` around lines 78 - 90, The os.makedirs call
should be guarded against an empty dirname to avoid calling makedirs("") when
output_file is just a filename; replace the direct
os.makedirs(os.path.dirname(output_file), exist_ok=True) with logic that
computes dirname = os.path.dirname(output_file) and only calls
os.makedirs(dirname, exist_ok=True) if dirname is non-empty (apply the same fix
to the other occurrence around lines 126-128), referencing the output_file
variable and the existing os.path.dirname usage in dsperse/src/backends/ezkl.py.

Comment on lines +45 to +61
@staticmethod
def is_compatible(model_path: Union[str, Path]) -> Tuple[bool, set]:
"""Check if an ONNX model contains only JSTprove-supported operations."""
model_path = Path(model_path)
if not model_path.exists():
return False, {"FILE_NOT_FOUND"}

try:
model = onnx.load(str(model_path))
ops = {node.op_type for node in model.graph.node}
unsupported = ops - JSTPROVE_SUPPORTED_OPS
if unsupported:
return False, unsupported
return True, set()
except Exception as e:
logger.warning(f"Failed to check JSTprove compatibility: {e}")
return False, {"LOAD_ERROR"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid false negatives when external data sidecars are missing.

The onnx.load() call attempts to load external tensor data by default. Since this compatibility check only examines graph structure (operation types), missing external data files cause unnecessary LOAD_ERROR returns. Disable external data loading to prevent false negatives.

🛠️ Suggested fix
         try:
-            model = onnx.load(str(model_path))
+            model = onnx.load(str(model_path), load_external_data=False)
             ops = {node.op_type for node in model.graph.node}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def is_compatible(model_path: Union[str, Path]) -> Tuple[bool, set]:
"""Check if an ONNX model contains only JSTprove-supported operations."""
model_path = Path(model_path)
if not model_path.exists():
return False, {"FILE_NOT_FOUND"}
try:
model = onnx.load(str(model_path))
ops = {node.op_type for node in model.graph.node}
unsupported = ops - JSTPROVE_SUPPORTED_OPS
if unsupported:
return False, unsupported
return True, set()
except Exception as e:
logger.warning(f"Failed to check JSTprove compatibility: {e}")
return False, {"LOAD_ERROR"}
`@staticmethod`
def is_compatible(model_path: Union[str, Path]) -> Tuple[bool, set]:
"""Check if an ONNX model contains only JSTprove-supported operations."""
model_path = Path(model_path)
if not model_path.exists():
return False, {"FILE_NOT_FOUND"}
try:
model = onnx.load(str(model_path), load_external_data=False)
ops = {node.op_type for node in model.graph.node}
unsupported = ops - JSTPROVE_SUPPORTED_OPS
if unsupported:
return False, unsupported
return True, set()
except Exception as e:
logger.warning(f"Failed to check JSTprove compatibility: {e}")
return False, {"LOAD_ERROR"}
🧰 Tools
🪛 Ruff (0.14.14)

58-58: Consider moving this statement to an else block

(TRY300)


59-59: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@dsperse/src/backends/utils/jstprove_utils.py` around lines 45 - 61, The
compatibility check in is_compatible currently calls onnx.load(...) which
attempts to load external tensor sidecars and can raise a LOAD_ERROR; update the
onnx.load call in is_compatible (where model is assigned) to disable external
data loading (pass the parameter to turn off external/sidecar loading, e.g.,
load_external_data=False or the equivalent supported by the onnx API) so the
function only inspects model.graph.node/op_type against JSTPROVE_SUPPORTED_OPS
and avoids false negatives when sidecar files are missing; keep the existing
exception handling and return values intact.

Comment on lines 80 to +82
backend = getattr(args, 'backend', None)
layers = getattr(args, 'layers', None)

if not layers:
print(f"{Fore.CYAN}No layers specified. Will compile all layers with default fallback (jstprove -> ezkl -> onnx)...{Style.RESET_ALL}")
logger.info("No layers specified - compiling all layers with default fallback")
elif backend:
if ':' in backend:
print(f"{Fore.CYAN}Compiling specified layers with mixed backends...{Style.RESET_ALL}")
else:
backend_name = 'JSTprove' if backend == 'jstprove' else 'EZKL'
print(f"{Fore.CYAN}Compiling specified layers with {backend_name}...{Style.RESET_ALL}")
else:
print(f"{Fore.CYAN}Compiling specified layers (trying jstprove & ezkl, fallback to onnx)...{Style.RESET_ALL}")
logger.info(f"Starting slices compilation")

# Resolve path (slices dir or .dsperse/.dslice file)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the unused layers local.
This triggers Ruff F841 and adds noise.

🧹 Proposed fix
-    layers = getattr(args, 'layers', None)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backend = getattr(args, 'backend', None)
layers = getattr(args, 'layers', None)
if not layers:
print(f"{Fore.CYAN}No layers specified. Will compile all layers with default fallback (jstprove -> ezkl -> onnx)...{Style.RESET_ALL}")
logger.info("No layers specified - compiling all layers with default fallback")
elif backend:
if ':' in backend:
print(f"{Fore.CYAN}Compiling specified layers with mixed backends...{Style.RESET_ALL}")
else:
backend_name = 'JSTprove' if backend == 'jstprove' else 'EZKL'
print(f"{Fore.CYAN}Compiling specified layers with {backend_name}...{Style.RESET_ALL}")
else:
print(f"{Fore.CYAN}Compiling specified layers (trying jstprove & ezkl, fallback to onnx)...{Style.RESET_ALL}")
logger.info(f"Starting slices compilation")
# Resolve path (slices dir or .dsperse/.dslice file)
backend = getattr(args, 'backend', None)
🧰 Tools
🪛 Ruff (0.14.14)

81-81: Local variable layers is assigned to but never used

Remove assignment to unused variable layers

(F841)

🤖 Prompt for AI Agents
In `@dsperse/src/cli/compile.py` around lines 80 - 82, Remove the unused local
variable by deleting the assignment to layers (the line using getattr(args,
'layers', None)) in compile.py; keep the backend assignment (backend =
getattr(args, 'backend', None)) as-is or only remove layers if you intend to use
it elsewhere—ensure no other code relies on the local variable layers (search
for 'layers' in the same module) before removal to avoid breaking references.

Comment on lines +89 to +99
# --- Layer Validation ---
validated_layers = _check_layers(target_path, args.layers) if (getattr(args, 'layers', None) and os.path.isdir(target_path)) else getattr(args, 'layers', None)

# Combine layers and backend into a single spec for Compiler.compile
effective_layers = validated_layers
if backend:
if not effective_layers:
effective_layers = backend
elif ':' not in str(effective_layers) and effective_layers not in ['ezkl', 'jstprove', 'onnx']:
effective_layers = f"{effective_layers}:{backend}"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Abort when validation yields no valid layers.
If a user supplies only missing layers, validated_layers becomes None and the compile falls back to “all layers,” which is surprising and potentially expensive.

🐛 Proposed fix
-    validated_layers = _check_layers(target_path, args.layers) if (getattr(args, 'layers', None) and os.path.isdir(target_path)) else getattr(args, 'layers', None)
+    validated_layers = _check_layers(target_path, args.layers) if (getattr(args, 'layers', None) and os.path.isdir(target_path)) else getattr(args, 'layers', None)
+    if getattr(args, 'layers', None) and validated_layers is None:
+        print(f"{Fore.RED}No valid layers found in --layers. Aborting.{Style.RESET_ALL}")
+        return
🤖 Prompt for AI Agents
In `@dsperse/src/cli/compile.py` around lines 89 - 99, The current logic allows
compilation to proceed with "all layers" when the user supplied layers but
validation returned None; change the flow so that if args.layers was provided
(getattr(args, 'layers', None) is truthy), target_path is a directory, and
_check_layers(...) returned no valid layers (validated_layers is falsy/None),
the command aborts with a clear error (raise/exit/log) instead of falling back
to all layers; update the block that sets validated_layers/effective_layers to
check for this condition and surface a descriptive error before attempting to
combine backend into effective_layers.

Comment on lines +468 to +471
group_files = comp.get("group_files", {})
assert len(group_files) > 0
for g_idx, files in group_files.items():
assert "compiled" in files
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename unused loop var g_idx.
Ruff B007 will flag this.

🧹 Proposed fix
-        for g_idx, files in group_files.items():
+        for _g_idx, files in group_files.items():
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
group_files = comp.get("group_files", {})
assert len(group_files) > 0
for g_idx, files in group_files.items():
assert "compiled" in files
group_files = comp.get("group_files", {})
assert len(group_files) > 0
for _g_idx, files in group_files.items():
assert "compiled" in files
🧰 Tools
🪛 Ruff (0.14.14)

470-470: Loop control variable g_idx not used within loop body

Rename unused g_idx to _g_idx

(B007)

🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 468 - 471, The loop in
tests/integration/test_compile.py uses an unused loop variable g_idx which
triggers Ruff B007; change the loop to not bind the unused index by either
iterating values directly (replace "for g_idx, files in group_files.items():"
with "for files in group_files.values():") or use a throwaway name (e.g., "for
_, files in group_files.items():") so only the used symbol files remains; update
the loop that asserts "compiled" in files accordingly.

Comment on lines +475 to +485
@pytest.mark.parametrize("model_name", ["doom"])
def test_compile_parallel(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
"""Verify that parallel compilation works without errors."""
slice_model(SimpleNamespace(
model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs"
))

# Compile with 2 parallel processes
compile_model(SimpleNamespace(
path=str(slices_output_dir), input_file=None, layers=None, backend=None, parallel=2
))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silence unused args (model_name, jstprove_available).
These are unused and will trigger Ruff ARG002.

🧹 Proposed fix
         slice_model(SimpleNamespace(
             model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs"
         ))
+        _ = (model_name, jstprove_available)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize("model_name", ["doom"])
def test_compile_parallel(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
"""Verify that parallel compilation works without errors."""
slice_model(SimpleNamespace(
model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs"
))
# Compile with 2 parallel processes
compile_model(SimpleNamespace(
path=str(slices_output_dir), input_file=None, layers=None, backend=None, parallel=2
))
`@pytest.mark.parametrize`("model_name", ["doom"])
def test_compile_parallel(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
"""Verify that parallel compilation works without errors."""
slice_model(SimpleNamespace(
model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs"
))
_ = (model_name, jstprove_available)
# Compile with 2 parallel processes
compile_model(SimpleNamespace(
path=str(slices_output_dir), input_file=None, layers=None, backend=None, parallel=2
))
🧰 Tools
🪛 Ruff (0.14.14)

476-476: Unused method argument: model_name

(ARG002)


476-476: Unused method argument: jstprove_available

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 475 - 485, Rename the unused
test parameters to start with an underscore to silence Ruff ARG002: change the
function signature in test_compile_parallel from (self, model_name, model_dir,
slices_output_dir, jstprove_available, capfd) to (self, _model_name, model_dir,
slices_output_dir, _jstprove_available, capfd) and update the
pytest.mark.parametrize decorator to parametrize("_model_name", ["doom"]) so the
param name matches; no other logic changes are required in
test_compile_parallel.

Comment on lines +565 to +577
@pytest.mark.parametrize("model_name", ["doom"])
def test_compile_resume_tiled(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
"""Verify that resume mode works for tiled slices."""
if not jstprove_available:
pytest.skip("JSTprove unavailable")

slice_model(SimpleNamespace(
model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs", tile_size=1000
))

# 1. Compile tiled slice 0
compile_model(SimpleNamespace(
path=str(slices_output_dir), input_file=None, layers="0", backend="jstprove"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silence unused model_name.
This triggers Ruff ARG002.

🧹 Proposed fix
         slice_model(SimpleNamespace(
             model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs", tile_size=1000
         ))
+        _ = model_name
🧰 Tools
🪛 Ruff (0.14.14)

566-566: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 565 - 577, The test currently
defines a pytest parameter "model_name" but the function
test_compile_resume_tiled does not use it, causing Ruff ARG002; update both the
parametrize call and the test signature to use an unused-name convention (e.g.,
change "model_name" to "_model_name" in the pytest.mark.parametrize decorator
and rename the function parameter to _model_name) so the linter recognizes the
parameter is intentionally unused without changing test behavior.

"""Verify that slicing preserves model semantics."""

@pytest.mark.parametrize("model_name", ["net", "doom"])
def test_sliced_output_matches_original(self, model_name: str, model_dir: Path, slices_output_dir: Path):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silence unused model_name parameters.
These are unused in the bodies and will trigger Ruff ARG002. Consider _ = model_name (or # noqa: ARG002) inside each test.

🧹 Example fix pattern
     def test_sliced_output_matches_original(self, model_name: str, model_dir: Path, slices_output_dir: Path):
         """Sliced model chain should produce same output as original model."""
+        _ = model_name

Also applies to: 60-60, 100-100, 130-130, 172-172, 228-228, 288-288

🧰 Tools
🪛 Ruff (0.14.14)

24-24: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_pipeline.py` at line 24, Assign or otherwise silence
the unused test parameter `model_name` in each affected test function (e.g.,
`test_sliced_output_matches_original`, and the other tests at the listed
locations) to avoid Ruff ARG002; for example add a single statement like `_ =
model_name` (or an inline `# noqa: ARG002` comment) at the start of each test
body so the parameter is referenced and the linter warning is suppressed.

Updated test cases to improve channel splitting and tiling workflows by adjusting spatial dimensions, tile sizes, and test parameters. Enhanced logging in parallel tiling tests and refined integration with inference methods to improve coverage and maintainability.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@tests/integration/test_compile.py`:
- Around line 491-492: The test parameter model_name in test_compile_resume is
unused causing Ruff ARG002; rename it to _model_name (or prefix with underscore)
in the test signature and any fixture references (i.e., change def
test_compile_resume(self, model_name, ...) to def test_compile_resume(self,
_model_name, ...)) so the `@pytest.mark.parametrize`("model_name", ["doom"]) still
supplies values but the linter recognizes it as intentionally unused.

In `@tests/integration/test_run.py`:
- Around line 440-442: The test function test_run_tiling_onnx_fallback declares
a pytest parameter model_name that is never used; rename the parameter to
_model_name (e.g., def test_run_tiling_onnx_fallback(self, _model_name: str,
model_dir: Path, slices_output_dir: Path, capfd):) so Ruff ARG002 is suppressed
and the pytest.mark.parametrize("model_name", ["doom"]) can remain unchanged;
update only the function signature (and any internal references if you later use
it) to avoid the unused-argument warning.
- Around line 468-470: The test function test_run_resume_complex declares an
unused parameter model_name which triggers Ruff ARG002; to fix it either remove
the parameter from the function signature and the
`@pytest.mark.parametrize`("model_name", ["doom"]) decorator, or keep the
parametrize but rename the parameter to _model_name (prefix with underscore) so
linters treat it as intentionally unused; update references in the local scope
if any and ensure the decorator and function signature remain consistent (look
for test_run_resume_complex and the `@pytest.mark.parametrize` entry).
- Around line 403-408: The test function test_run_tiling_ezkl currently declares
an unused parameter model_name (triggering Ruff ARG002); fix by renaming the
parameter to _model_name (or prefixing with an underscore) in the test signature
and in the pytest.mark.parametrize decorator to match (e.g.,
`@pytest.mark.parametrize`("_model_name", ["doom"])) so the parameter is
intentionally marked unused and the linter warning is suppressed.
- Around line 361-366: The test function test_run_parallel_tiling declares an
unused parameter model_name which triggers Ruff ARG002; fix it by renaming the
function parameter to _model_name (or prefix it with an underscore) so the
pytest.mark.parametrize decorator can stay intact while signaling the parameter
is intentionally unused—update the function signature for
test_run_parallel_tiling accordingly and ensure no other references to
model_name remain inside the function.
🧹 Nitpick comments (1)
tests/integration/test_run.py (1)

14-17: Consider consolidating _get_run_dir helper across test files.

This helper method is duplicated in tests/integration/test_prove.py and tests/integration/test_verify.py (as shown in relevant code snippets). The version here lacks the fallback regex pattern that those files have.

♻️ Suggested improvement
     def _get_run_dir(self, output: str) -> Path:
         match = re.search(r"Run data saved to (.+)", output)
+        if not match:
+            match = re.search(r"within the run directory (.+)", output)
         assert match, f"Could not find run directory in output: {output}"
-        return Path(match.group(1).strip())
+        path_str = match.group(1).strip()
+        path_str = re.sub(r'\x1b\[[0-9;]*m', '', path_str).split()[0]
+        return Path(path_str)

Comment on lines +491 to +492
@pytest.mark.parametrize("model_name", ["doom"])
def test_compile_resume(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suppress unused model_name parameter (Ruff ARG002).

🧹 Proposed fix
     def test_compile_resume(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
         """Verify that resume mode skips already compiled slices."""
+        _ = model_name
         slice_model(SimpleNamespace(
🧰 Tools
🪛 Ruff (0.14.14)

492-492: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 491 - 492, The test parameter
model_name in test_compile_resume is unused causing Ruff ARG002; rename it to
_model_name (or prefix with underscore) in the test signature and any fixture
references (i.e., change def test_compile_resume(self, model_name, ...) to def
test_compile_resume(self, _model_name, ...)) so the
`@pytest.mark.parametrize`("model_name", ["doom"]) still supplies values but the
linter recognizes it as intentionally unused.

Comment on lines +361 to +366
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available):
"""11. Verify that parallel execution (threads > 1) works for tiling."""
if not jstprove_available:
pytest.skip("JSTprove unavailable")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suppress unused model_name parameter (Ruff ARG002).

Same issue as test_run_with_tiling.

🧹 Proposed fix
     def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available):
         """11. Verify that parallel execution (threads > 1) works for tiling."""
         if not jstprove_available:
             pytest.skip("JSTprove unavailable")
+        _ = model_name
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available):
"""11. Verify that parallel execution (threads > 1) works for tiling."""
if not jstprove_available:
pytest.skip("JSTprove unavailable")
`@pytest.mark.parametrize`("model_name", ["doom"])
def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available):
"""11. Verify that parallel execution (threads > 1) works for tiling."""
if not jstprove_available:
pytest.skip("JSTprove unavailable")
_ = model_name
🧰 Tools
🪛 Ruff (0.14.14)

362-362: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 361 - 366, The test function
test_run_parallel_tiling declares an unused parameter model_name which triggers
Ruff ARG002; fix it by renaming the function parameter to _model_name (or prefix
it with an underscore) so the pytest.mark.parametrize decorator can stay intact
while signaling the parameter is intentionally unused—update the function
signature for test_run_parallel_tiling accordingly and ensure no other
references to model_name remain inside the function.

Comment on lines +403 to +408
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available):
"""12. Verify that tiling works with EZKL backend."""
if not ezkl_available:
pytest.skip("EZKL unavailable")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suppress unused model_name parameter (Ruff ARG002).

🧹 Proposed fix
     def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available):
         """12. Verify that tiling works with EZKL backend."""
         if not ezkl_available:
             pytest.skip("EZKL unavailable")
+        _ = model_name
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available):
"""12. Verify that tiling works with EZKL backend."""
if not ezkl_available:
pytest.skip("EZKL unavailable")
`@pytest.mark.parametrize`("model_name", ["doom"])
def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available):
"""12. Verify that tiling works with EZKL backend."""
if not ezkl_available:
pytest.skip("EZKL unavailable")
_ = model_name
🧰 Tools
🪛 Ruff (0.14.14)

404-404: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 403 - 408, The test function
test_run_tiling_ezkl currently declares an unused parameter model_name
(triggering Ruff ARG002); fix by renaming the parameter to _model_name (or
prefixing with an underscore) in the test signature and in the
pytest.mark.parametrize decorator to match (e.g.,
`@pytest.mark.parametrize`("_model_name", ["doom"])) so the parameter is
intentionally marked unused and the linter warning is suppressed.

Comment on lines +440 to +442
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
"""13. Verify that tiling falls back to ONNX if no circuits are available."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suppress unused model_name parameter (Ruff ARG002).

🧹 Proposed fix
     def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
         """13. Verify that tiling falls back to ONNX if no circuits are available."""
+        _ = model_name
         slice_model(SimpleNamespace(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
"""13. Verify that tiling falls back to ONNX if no circuits are available."""
`@pytest.mark.parametrize`("model_name", ["doom"])
def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
"""13. Verify that tiling falls back to ONNX if no circuits are available."""
_ = model_name
🧰 Tools
🪛 Ruff (0.14.14)

441-441: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 440 - 442, The test function
test_run_tiling_onnx_fallback declares a pytest parameter model_name that is
never used; rename the parameter to _model_name (e.g., def
test_run_tiling_onnx_fallback(self, _model_name: str, model_dir: Path,
slices_output_dir: Path, capfd):) so Ruff ARG002 is suppressed and the
pytest.mark.parametrize("model_name", ["doom"]) can remain unchanged; update
only the function signature (and any internal references if you later use it) to
avoid the unused-argument warning.

Comment on lines +468 to +470
@pytest.mark.parametrize("model_name", ["doom"])
def test_run_resume_complex(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
"""14. Verify that resuming works for tiled/channel-split runs."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suppress unused model_name parameter (Ruff ARG002).

🧹 Proposed fix
     def test_run_resume_complex(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
         """14. Verify that resuming works for tiled/channel-split runs."""
+        _ = model_name
         slice_model(SimpleNamespace(
🧰 Tools
🪛 Ruff (0.14.14)

469-469: Unused method argument: model_name

(ARG002)

🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 468 - 470, The test function
test_run_resume_complex declares an unused parameter model_name which triggers
Ruff ARG002; to fix it either remove the parameter from the function signature
and the `@pytest.mark.parametrize`("model_name", ["doom"]) decorator, or keep the
parametrize but rename the parameter to _model_name (prefix with underscore) so
linters treat it as intentionally unused; update references in the local scope
if any and ensure the decorator and function signature remain consistent (look
for test_run_resume_complex and the `@pytest.mark.parametrize` entry).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@dsperse/src/run/channel_split_executor.py`:
- Around line 72-76: The code calls ort.InferenceSession with
str(group_onnx_path) without checking that _resolve_path(group.path) returned a
valid path; add a guard after calling self._resolve_path(group.path) in both
ONNX branches (the block that sets group_onnx_path and the other ONNX branch
around outputs/session creation) to verify group_onnx_path is not None/empty,
and if it is, raise a clear exception or log and return an appropriate
error/ExecutionMethod instead of proceeding to ort.InferenceSession; update the
blocks that reference group_onnx_path and session to use this validation and
avoid constructing a session with "None".

In `@dsperse/src/run/runner.py`:
- Around line 68-77: The tiled and channel-split branches return None as the
result tensor, causing downstream functions (process_inference_result /
finalize_run_results) to use the wrong tensor; change the return values in both
run_channel_split_inference and run_tiled_inference call sites to return the
actual output tensor carried by exec_info (e.g., exec_info.output_tensor)
instead of None, i.e., return exec_info.success, exec_info.output_tensor,
exec_info so process_inference_result receives the real slice output.

In `@dsperse/src/run/utils/runner_utils.py`:
- Around line 350-352: _count_tile_method currently calls .startswith() on
tile.method which may be None; update the function to coalesce the method value
to an empty string before calling startswith for both TileResult instances and
dict tiles (i.e., use (t.method or "") for TileResult and t.get("method", "") or
"" for dicts) so the check is safe and save_run_results won't abort on None.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dsperse/src/run/channel_split_executor.py`:
- Around line 108-114: In apply_bias, guard against _resolve_path returning None
before calling Path(...).exists(): ensure you check that channel_split.bias_path
is truthy and that bias_path returned from _resolve_path is not None (or
non-empty) before constructing Path(bias_path) and calling exists(); if the
resolved path is None or empty, skip loading the bias. This references the
apply_bias method, ChannelSplitInfo.bias_path, and the _resolve_path helper—add
a simple None/empty check around the Path(...).exists() usage and early-return
or skip the bias load when absent.
🧹 Nitpick comments (8)
dsperse/src/run/channel_split_executor.py (2)

44-47: Use explicit Optional type for backend parameter.

The backend: str = None parameter uses implicit Optional which violates PEP 484.

♻️ Proposed fix
     def execute_group(
         self, group: ChannelGroupInfo, group_input: torch.Tensor, run_dir: Path,
-        output_shape: tuple, backend: str = None
+        output_shape: tuple, backend: str | None = None
     ) -> tuple[torch.Tensor, str]:

124-133: Remove duplicate _extract_output_tensor - delegate to RunnerUtils.

This method is identical to RunnerUtils.extract_output_tensor (lines 20-33 in runner_utils.py) and also duplicated in tile_executor.py. Consider delegating to the centralized utility.

♻️ Proposed fix
-    `@staticmethod`
-    def _extract_output_tensor(result):
-        if result is None:
-            return None
-        if isinstance(result, dict):
-            for key in ['output', 'logits', 'output_tensor']:
-                if key in result and result[key] is not None:
-                    return result[key]
-            return None
-        return result
+    `@staticmethod`
+    def _extract_output_tensor(result):
+        from dsperse.src.run.utils.runner_utils import RunnerUtils
+        return RunnerUtils.extract_output_tensor(result)
dsperse/src/run/runner.py (4)

29-46: Use explicit Optional type for run_dir parameter.

The run_dir: str = None parameter uses implicit Optional which violates PEP 484. Same applies to line 33's on_slice_complete which already uses Optional.

♻️ Proposed fix
     def __init__(
         self,
-        run_dir: str = None,
+        run_dir: str | None = None,
         threads: int = 1,
         on_slice_complete: Optional[SliceCompleteCallback] = None
     ):

100-100: Use explicit Optional type for backend parameter.

♻️ Proposed fix
-    def run_circuit_with_fallback(self, meta: RunSliceMetadata, in_file: Path, out_file: Path, slice_dir: Path, backend: str = None):
+    def run_circuit_with_fallback(self, meta: RunSliceMetadata, in_file: Path, out_file: Path, slice_dir: Path, backend: str | None = None):

164-164: Use explicit Optional types for slice_dir parameters.

Both run_ezkl_inference (line 164) and run_jstprove_inference (line 194) use implicit Optional for slice_dir.

♻️ Proposed fix
-    def run_ezkl_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path = None):
+    def run_ezkl_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path | None = None):
...
-    def run_jstprove_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path = None):
+    def run_jstprove_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path | None = None):

Also applies to: 194-194


393-393: Split multiple statements on one line for readability.

Line 393 has multiple statements on one line which reduces readability.

♻️ Proposed fix
-                    for oname in info.dependencies.output: tensor_cache[oname] = cached_tensor
+                    for oname in info.dependencies.output:
+                        tensor_cache[oname] = cached_tensor
dsperse/src/run/utils/runner_utils.py (2)

406-418: Log exceptions in check_slice_completion_status instead of silently passing.

The try-except-pass pattern at lines 416-417 silently swallows all exceptions, which can hide bugs. At minimum, log the exception.

♻️ Proposed fix
             try:
                 with open(output_file, 'r') as f:
                     data = json.load(f)
                 return True, data
-            except Exception:
-                pass
+            except (json.JSONDecodeError, KeyError) as e:
+                logger.debug(f"Failed to load slice completion data for {slice_id}: {e}")
         return False, None

452-465: Address unused parameter and use efficient iterator access.

  1. The exec_info parameter (line 453) is unused
  2. Line 465 uses list(...)[0] which is less efficient than next(iter(...))
♻️ Proposed fix
     `@staticmethod`
-    def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, exec_info, tensor_cache: dict) -> torch.Tensor | None:
+    def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, _exec_info, tensor_cache: dict) -> torch.Tensor | None:
         """Handles reshaping, tensor cache updates, and final tensor extraction from inference results."""
         ...
         if isinstance(result, dict) and 'output_tensors' in result:
             for oname, tensor in result['output_tensors'].items():
                 tensor_cache[oname] = tensor
-            first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+            first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In `@dsperse/src/prove/prover.py`:
- Around line 71-104: The tiled-proof branch in _prepare_work_items drops the
output_path so tiled proofs always write under run_path; update
_prepare_work_items to include the provided output_path in the work_items tuple
(alongside the existing entries) so downstream callers know the override, and
then update ProverUtils.execute_slice_proving and ProverUtils.prove_tile_batch
to accept that output_path parameter and use it when constructing tile proof
output locations (instead of defaulting to run_path). Refer to the symbols
_prepare_work_items, work_items (the tuple appended),
ProverUtils.execute_slice_proving, and ProverUtils.prove_tile_batch when making
the changes.
- Around line 34-44: The JSTprove initialization currently catches all Exception
and swallows unexpected errors; change the except block in the prover
initialization to only catch RuntimeError (matching the EZKL pattern) and call
logger.exception(...) when catching it so the stack trace is preserved;
specifically update the try/except around creating self.jstprove_runner =
JSTprove() to except RuntimeError as e, set self.jstprove_runner = None, and log
the failure with logger.exception("JSTprove CLI not available. JSTprove backend
will be disabled.", exc_info=True) or equivalent to retain error details.
- Around line 199-213: The temporary dirs created when converting model_dir in
_handle_single_slice_proving aren't guaranteed to be cleaned up if
_prove_single_slice raises; wrap the call to self._prove_single_slice in a
try/finally so that if detected != "dirs" you always call
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))),
output_type=detected, cleanup=True) in the finally block; ensure you still
return the result when no exception and re-raise any exception after cleanup so
behavior of _handle_single_slice_proving, _prove_single_slice, and variables
dirs_model_path/detected remain consistent.

In `@dsperse/src/prove/utils/prover_utils.py`:
- Around line 14-43: In resolve_prove_artifacts, backend-specific PK/settings
fields are not being used; update the selection logic so when b == Backend.EZKL
you use meta.ezkl_pk_path (fallback to meta.pk_path) and meta.ezkl_settings_path
(fallback to meta.settings_path), and when b == Backend.JSTPROVE use
meta.jstprove_pk_path and meta.jstprove_settings_path similarly; then call
Utils.resolve_under_slice for pk_path and settings_path as already done and keep
the existing existence check for settings_path. Reference:
resolve_prove_artifacts, Backend.EZKL, Backend.JSTPROVE, meta.ezkl_pk_path,
meta.ezkl_settings_path, meta.jstprove_pk_path, meta.jstprove_settings_path,
Utils.resolve_under_slice.

In `@dsperse/src/run/channel_split_executor.py`:
- Around line 51-52: The current has_ezkl check (has_ezkl =
bool(group.ezkl_circuit_path) and (group.vk_path or group.ezkl_vk_path)) doesn’t
verify that an EZKL runner exists, so calls to
self.ezkl_runner.generate_witness() can hit None; update the condition to also
require self.ezkl_runner (e.g., include "and self.ezkl_runner") or add an
explicit guard before invoking generate_witness() so groups with EZKL files but
no runner fall back safely; locate uses of has_ezkl and the call to
generate_witness() to apply the fix (symbols: has_ezkl, self.ezkl_runner,
generate_witness).
- Around line 108-113: In apply_bias, the bias tensor loaded from numpy is
created on CPU with default dtype which can cause device/dtype mismatches when
adding to summed; modify the creation of bias_tensor in apply_bias
(ChannelSplitInfo.bias_path handling) to construct it on the same device and
dtype as summed (e.g., use torch.as_tensor or torch.from_numpy followed by
to(device=summed.device, dtype=summed.dtype)) and then reshape to (1, -1, 1, 1)
before adding so the addition uses matching device and dtype.

In `@dsperse/src/verify/utils/verifier_utils.py`:
- Around line 237-260: In verify_slice_logic, the tiled branch reconstructs
RunSliceMetadata.from_dict({'tiling': tiling_info}) losing resolved artifact
paths so downstream verify_tile/verify_tile_batch get None; fix by constructing
RunSliceMetadata with the full metadata (include circuit_path, settings_path,
vk_path, input_path, output_path, witness_path, proof_path or their resolved
filenames) or pass the already-resolved Path values directly into
VerifierUtils.verify_tile_batch/verify_tile so tile verification receives the
correct artifact paths; update the call site in verify_slice_logic (and the
tiling variable usage) to supply the complete metadata or direct paths rather
than only tiling_info.

In `@dsperse/src/verify/verifier.py`:
- Around line 198-213: The call to _verify_single_slice in
_handle_single_slice_verifying can raise and currently skips the cleanup
Converter.convert(..., cleanup=True); wrap the _verify_single_slice invocation
in a try/finally block and move the Converter.convert cleanup call into the
finally so temporary "dirs" output is always removed when detected != "dirs";
keep the initial conversion that sets dirs_model_path as-is and use
Utils.dirs_root_from(Path(dirs_model_path)) in the finally to compute the path
for Converter.convert.
- Around line 27-45: The JSTprove initialization in Verifier.__init__ is
catching a broad Exception and swallowing tracebacks; change the second
try/except to catch RuntimeError (like the EZKL block) by using "except
RuntimeError as e" for JSTprove(), set self.jstprove_runner = None, and call
logger.exception("JSTprove CLI not available. JSTprove backend will be
disabled.") so the stack trace is preserved and unexpected errors aren't hidden
(refer to jstprove_runner and Verifier.__init__).
- Around line 244-255: The single-slice verifier (_verify_single_slice) resolves
settings_path and vk_path but doesn't validate they exist before calling
VerifierUtils.verify_with_backend; add the same pre-checks used in
_prepare_work_items: after computing settings_path and vk_path (via
Utils.resolve_under_slice using slice_dir and meta.settings_path/meta.vk_path),
if preferred indicates the EZKL backend (check the preferred variable or backend
string used elsewhere) then verify settings_path.exists() and vk_path.exists()
and raise FileNotFoundError with clear messages (or fail fast) if missing so
verification errors are surfaced early before calling
VerifierUtils.verify_with_backend.
🧹 Nitpick comments (8)
dsperse/src/prove/prover.py (3)

107-124: Align sequential error handling with the parallel path and keep stack traces.

Sequential mode currently aborts on the first failure, while parallel mode records failures and continues. Also, logger.error drops the stack trace.

🛠️ Suggested change
-                for future in as_completed(futures):
+                for future in as_completed(futures):
                     slice_id = futures[future]
                     try:
                         results.append(future.result())
                     except Exception as e:
-                        logger.error(f"Slice {slice_id} proving failed: {e}")
+                        logger.exception(f"Slice {slice_id} proving failed: {e}")
                         results.append({'slice_id': slice_id, 'success': False, 'method': None, 'error': str(e)})
             return results
         else:
-            return [ProverUtils.prove_slice_logic(item) for item in work_items]
+            results = []
+            for item in work_items:
+                slice_id = item[0]
+                try:
+                    results.append(ProverUtils.prove_slice_logic(item))
+                except Exception as e:
+                    logger.exception(f"Slice {slice_id} proving failed: {e}")
+                    results.append({'slice_id': slice_id, 'success': False, 'method': None, 'error': str(e)})
+            return results

215-217: Drop or underscore the unused detected parameter.

🛠️ Suggested change
-    def _prove_single_slice(run_path: Path, model_dir: str | Path, detected: str, backend: str,
+    def _prove_single_slice(run_path: Path, model_dir: str | Path, _detected: str, backend: str,
                             tiles_range: range | list[int] | None = None) -> dict:

256-306: Consider moving the ad-hoc test harness to a dedicated script.

Keeping the demo block in a separate script makes the module slimmer and avoids shipping debug paths.

dsperse/src/verify/utils/verifier_utils.py (2)

16-36: Don't silently swallow run_results failures.

Returning None on any exception makes root-cause debugging hard; log the exception at least once.

🛠️ Suggested change
-        except Exception:
-            return None
+        except Exception:
+            logger.exception("Failed to load run_results for %s", run_path)
+            return None

90-100: Log failures when proof-path extraction fails.

The empty except ...: pass hides malformed run_results.json.

🛠️ Suggested change
-        except Exception:
-            pass
+        except Exception:
+            logger.exception("Failed to extract proof paths from run results")
dsperse/src/verify/verifier.py (2)

114-131: Align sequential error handling with the parallel path and keep stack traces.

Sequential mode currently aborts on the first failure, while parallel mode records failures and continues. Also, logger.error drops the stack trace.

🛠️ Suggested change
-                for future in as_completed(futures):
+                for future in as_completed(futures):
                     slice_id = futures[future]
                     try:
                         results.append(future.result())
                     except Exception as e:
-                        logger.error(f"Slice {slice_id} verification failed: {e}")
+                        logger.exception(f"Slice {slice_id} verification failed: {e}")
                         results.append(SliceResult(slice_id=slice_id, success=False, error=str(e)).to_dict())
             return results
         else:
-            return [VerifierUtils.verify_slice_logic(item) for item in work_items]
+            results = []
+            for item in work_items:
+                slice_id = item[0]
+                try:
+                    results.append(VerifierUtils.verify_slice_logic(item))
+                except Exception as e:
+                    logger.exception(f"Slice {slice_id} verification failed: {e}")
+                    results.append(SliceResult(slice_id=slice_id, success=False, error=str(e)).to_dict())
+            return results

272-320: Consider moving the ad-hoc test harness to a dedicated script.

dsperse/src/prove/utils/prover_utils.py (1)

282-287: Prefix the unused slice_dir binding.

🛠️ Suggested change
-        (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, run_path, slice_dir) = args
+        (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, run_path, _slice_dir) = args

Comment on lines +34 to +44
try:
self.ezkl_runner = EZKL()
except RuntimeError:
self.ezkl_runner = None
logger.warning("EZKL CLI not available. EZKL backend will be disabled.")

try:
self.jstprove_runner = JSTprove()
except Exception:
self.jstprove_runner = None
logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find and examine the prover.py file
find . -name "prover.py" -type f | head -20

Repository: inference-labs-inc/dsperse

Length of output: 99


🏁 Script executed:

# Look for JSTprove import and definition
rg "JSTprove" -A 5 -B 5 | head -100

Repository: inference-labs-inc/dsperse

Length of output: 7920


🏁 Script executed:

# Check the full __init__ method context
fd "prover.py" --exec cat -n {} \;

Repository: inference-labs-inc/dsperse

Length of output: 16372


🏁 Script executed:

# Find the JSTprove implementation
find . -path "*/backends/jstprove*" -type f

Repository: inference-labs-inc/dsperse

Length of output: 104


🏁 Script executed:

# Search for JSTprove class definition
rg "class JSTprove" -A 30

Repository: inference-labs-inc/dsperse

Length of output: 5013


🏁 Script executed:

# Look for what exceptions could be raised during initialization
rg "def __init__" -A 20 | grep -A 20 "class JSTprove" | head -50

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Get the full JSTprove.__init__ method
cat -n ./dsperse/src/backends/jstprove.py | head -80

Repository: inference-labs-inc/dsperse

Length of output: 3269


🏁 Script executed:

# Look for the exception handling in JSTprove.__init__
rg "subprocess.run" -A 15 dsperse/src/backends/jstprove.py | head -40

Repository: inference-labs-inc/dsperse

Length of output: 1377


Avoid swallowing unexpected JSTprove initialization failures.

Catching Exception here hides real setup bugs. JSTprove's __init__ documents it raises RuntimeError for missing CLI. Catch only RuntimeError to align with the EZKL pattern above, and use logger.exception() to preserve the stack trace for debugging.

🛠️ Suggested change
-        try:
-            self.jstprove_runner = JSTprove()
-        except Exception:
-            self.jstprove_runner = None
-            logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
+        try:
+            self.jstprove_runner = JSTprove()
+        except RuntimeError:
+            self.jstprove_runner = None
+            logger.exception("JSTprove CLI not available. JSTprove backend will be disabled.")
🧰 Tools
🪛 Ruff (0.14.14)

42-42: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@dsperse/src/prove/prover.py` around lines 34 - 44, The JSTprove
initialization currently catches all Exception and swallows unexpected errors;
change the except block in the prover initialization to only catch RuntimeError
(matching the EZKL pattern) and call logger.exception(...) when catching it so
the stack trace is preserved; specifically update the try/except around creating
self.jstprove_runner = JSTprove() to except RuntimeError as e, set
self.jstprove_runner = None, and log the failure with logger.exception("JSTprove
CLI not available. JSTprove backend will be disabled.", exc_info=True) or
equivalent to retain error details.

Comment on lines +71 to +104
@staticmethod
def _prepare_work_items(slices_iter: list, dirs_path: Path, run_path: Path, output_path: str | Path | None) -> list:
"""Prepare proving tasks for each slice."""
work_items = []
for slice_id, meta in slices_iter:
slice_dir = Utils.slice_dirs_path(dirs_path, slice_id)
preferred = ProverUtils.select_backend(run_path, slice_id, meta)
circuit_path, pk_path, settings_path = ProverUtils.resolve_prove_artifacts(slice_dir, meta, preferred)

tiling = meta.tiling
if tiling:
witness_path = None
proof_path = None
else:
if preferred == Backend.JSTPROVE:
wf = ProverUtils.get_witness_file(run_path, slice_id)
witness_path = Path(wf) if wf else (run_path / slice_id / "output_witness.bin")
else:
witness_path = Utils.witness_path_for(run_path, slice_id)

if not witness_path.exists():
logger.warning(f"Skipping {slice_id}: witness not found at {witness_path}")
continue

proof_path = Utils.proof_output_path(run_path, slice_id, output_path)

if not circuit_path or not os.path.exists(circuit_path):
logger.warning(f"Skipping {slice_id}: compiled circuit not found ({circuit_path})")
continue

work_items.append((
slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path,
tiling.to_dict() if tiling else None, str(run_path), str(slice_dir)
))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tiled proofs ignore the output_path override.

For tiled slices, output_path is dropped, so tile proofs always land under run_path. Pass output_path through the work-item tuple and into ProverUtils.execute_slice_proving/prove_tile_batch so custom output locations are honored.

🛠️ Suggested change (plus propagate to ProverUtils)
-            work_items.append((
-                slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path,
-                tiling.to_dict() if tiling else None, str(run_path), str(slice_dir)
-            ))
+            work_items.append((
+                slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path,
+                tiling.to_dict() if tiling else None, str(run_path), str(slice_dir), output_path
+            ))
🤖 Prompt for AI Agents
In `@dsperse/src/prove/prover.py` around lines 71 - 104, The tiled-proof branch in
_prepare_work_items drops the output_path so tiled proofs always write under
run_path; update _prepare_work_items to include the provided output_path in the
work_items tuple (alongside the existing entries) so downstream callers know the
override, and then update ProverUtils.execute_slice_proving and
ProverUtils.prove_tile_batch to accept that output_path parameter and use it
when constructing tile proof output locations (instead of defaulting to
run_path). Refer to the symbols _prepare_work_items, work_items (the tuple
appended), ProverUtils.execute_slice_proving, and ProverUtils.prove_tile_batch
when making the changes.

Comment on lines +199 to +213
def _handle_single_slice_proving(self, run_path: Path, model_dir: str | Path, detected: str,
backend: str | None, tiles_range: range | list[int] | None) -> dict:
"""Internal: Handle proving for a single slice run directory."""
if backend not in (Backend.JSTPROVE, Backend.EZKL):
raise ValueError("Single-slice proving requires explicit backend: 'jstprove' or 'ezkl'.")

dirs_model_path = model_dir
if detected != "dirs":
dirs_model_path = Converter.convert(str(model_dir), output_type="dirs", cleanup=False)

result = self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)

if detected != "dirs":
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "prover.py" | head -20

Repository: inference-labs-inc/dsperse

Length of output: 97


🏁 Script executed:

cat -n dsperse/src/prove/prover.py | sed -n '199,213p'

Repository: inference-labs-inc/dsperse

Length of output: 1080


🏁 Script executed:

cat -n dsperse/src/prove/prover.py | grep -A 30 "def _prove_single_slice"

Repository: inference-labs-inc/dsperse

Length of output: 1922


🏁 Script executed:

cat -n dsperse/src/prove/prover.py | sed -n '216,260p'

Repository: inference-labs-inc/dsperse

Length of output: 2524


🏁 Script executed:

rg "raise|except" dsperse/src/prove/prover.py -B 2 -A 2 | head -60

Repository: inference-labs-inc/dsperse

Length of output: 1836


🏁 Script executed:

rg "ProverUtils\.|execute_slice_proving|initialize_prove_metadata" dsperse/src/prove/prover.py -A 3 | head -40

Repository: inference-labs-inc/dsperse

Length of output: 1975


🏁 Script executed:

rg "def initialize_prove_metadata|def execute_slice_proving" dsperse -A 10 | head -80

Repository: inference-labs-inc/dsperse

Length of output: 1982


🏁 Script executed:

cat -n dsperse/src/prove/utils/prover_utils.py | grep -A 60 "def execute_slice_proving"

Repository: inference-labs-inc/dsperse

Length of output: 2238


🏁 Script executed:

rg "Converter.convert" dsperse -B 2 -A 2 | head -50

Repository: inference-labs-inc/dsperse

Length of output: 3556


🏁 Script executed:

cat -n dsperse/src/verify/verifier.py | grep -B 5 -A 15 "_verify_packaged"

Repository: inference-labs-inc/dsperse

Length of output: 1860


🏁 Script executed:

cat -n dsperse/src/verify/verifier.py | grep -B 5 -A 15 "_handle_single_slice_verifying"

Repository: inference-labs-inc/dsperse

Length of output: 2173


🏁 Script executed:

cat -n dsperse/src/prove/prover.py | sed -n '216,254p'

Repository: inference-labs-inc/dsperse

Length of output: 2340


🏁 Script executed:

cat -n dsperse/src/prove/prover.py | sed -n '183,215p'

Repository: inference-labs-inc/dsperse

Length of output: 2105


Wrap conversion cleanup in try/finally to prevent resource leaks.

If _prove_single_slice raises an exception during execution (possible at line 220-230), the converted temporary directories will not be cleaned up. Use try/finally to ensure cleanup always runs, matching the pattern already used in _prove_packaged at lines 158-166.

🛠️ Suggested change
-        result = self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
-        
-        if detected != "dirs":
-            Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
-        return result
+        try:
+            return self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
+        finally:
+            if detected != "dirs":
+                Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _handle_single_slice_proving(self, run_path: Path, model_dir: str | Path, detected: str,
backend: str | None, tiles_range: range | list[int] | None) -> dict:
"""Internal: Handle proving for a single slice run directory."""
if backend not in (Backend.JSTPROVE, Backend.EZKL):
raise ValueError("Single-slice proving requires explicit backend: 'jstprove' or 'ezkl'.")
dirs_model_path = model_dir
if detected != "dirs":
dirs_model_path = Converter.convert(str(model_dir), output_type="dirs", cleanup=False)
result = self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
if detected != "dirs":
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
return result
def _handle_single_slice_proving(self, run_path: Path, model_dir: str | Path, detected: str,
backend: str | None, tiles_range: range | list[int] | None) -> dict:
"""Internal: Handle proving for a single slice run directory."""
if backend not in (Backend.JSTPROVE, Backend.EZKL):
raise ValueError("Single-slice proving requires explicit backend: 'jstprove' or 'ezkl'.")
dirs_model_path = model_dir
if detected != "dirs":
dirs_model_path = Converter.convert(str(model_dir), output_type="dirs", cleanup=False)
try:
return self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
finally:
if detected != "dirs":
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
🧰 Tools
🪛 Ruff (0.14.14)

203-203: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@dsperse/src/prove/prover.py` around lines 199 - 213, The temporary dirs
created when converting model_dir in _handle_single_slice_proving aren't
guaranteed to be cleaned up if _prove_single_slice raises; wrap the call to
self._prove_single_slice in a try/finally so that if detected != "dirs" you
always call Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))),
output_type=detected, cleanup=True) in the finally block; ensure you still
return the result when no exception and re-raise any exception after cleanup so
behavior of _handle_single_slice_proving, _prove_single_slice, and variables
dirs_model_path/detected remain consistent.

Comment on lines +14 to +43
@staticmethod
def resolve_prove_artifacts(
slice_dir: Path,
meta: RunSliceMetadata,
backend: str | None = None,
) -> tuple[str | None, str | None, str | None]:
"""Resolve circuit, pk, and settings paths under a given slice directory."""
b = (backend or "").lower()
if b == Backend.EZKL:
circuit_rel = meta.ezkl_circuit_path or meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path
elif b == Backend.JSTPROVE:
circuit_rel = meta.jstprove_circuit_path or meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path
else:
circuit_rel = meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path

circuit_path = Utils.resolve_under_slice(slice_dir, circuit_rel)
pk_path = Utils.resolve_under_slice(slice_dir, pk_rel)
settings_path = Utils.resolve_under_slice(slice_dir, settings_rel)

if settings_path and not os.path.exists(settings_path):
logger.warning(f"Settings file not found at {settings_path}; proceeding without it.")
settings_path = None

return circuit_path, pk_path, settings_path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use backend-specific settings/PK paths when available.

resolve_prove_artifacts ignores ezkl_pk_path / ezkl_settings_path (and jstprove_settings_path), which can force pk_path=None even when backend-specific artifacts exist.

🛠️ Suggested change
         if b == Backend.EZKL:
             circuit_rel = meta.ezkl_circuit_path or meta.circuit_path
-            pk_rel = meta.pk_path
-            settings_rel = meta.settings_path
+            pk_rel = meta.ezkl_pk_path or meta.pk_path
+            settings_rel = meta.ezkl_settings_path or meta.settings_path
         elif b == Backend.JSTPROVE:
             circuit_rel = meta.jstprove_circuit_path or meta.circuit_path
             pk_rel = meta.pk_path
-            settings_rel = meta.settings_path
+            settings_rel = meta.jstprove_settings_path or meta.settings_path
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def resolve_prove_artifacts(
slice_dir: Path,
meta: RunSliceMetadata,
backend: str | None = None,
) -> tuple[str | None, str | None, str | None]:
"""Resolve circuit, pk, and settings paths under a given slice directory."""
b = (backend or "").lower()
if b == Backend.EZKL:
circuit_rel = meta.ezkl_circuit_path or meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path
elif b == Backend.JSTPROVE:
circuit_rel = meta.jstprove_circuit_path or meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path
else:
circuit_rel = meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path
circuit_path = Utils.resolve_under_slice(slice_dir, circuit_rel)
pk_path = Utils.resolve_under_slice(slice_dir, pk_rel)
settings_path = Utils.resolve_under_slice(slice_dir, settings_rel)
if settings_path and not os.path.exists(settings_path):
logger.warning(f"Settings file not found at {settings_path}; proceeding without it.")
settings_path = None
return circuit_path, pk_path, settings_path
`@staticmethod`
def resolve_prove_artifacts(
slice_dir: Path,
meta: RunSliceMetadata,
backend: str | None = None,
) -> tuple[str | None, str | None, str | None]:
"""Resolve circuit, pk, and settings paths under a given slice directory."""
b = (backend or "").lower()
if b == Backend.EZKL:
circuit_rel = meta.ezkl_circuit_path or meta.circuit_path
pk_rel = meta.ezkl_pk_path or meta.pk_path
settings_rel = meta.ezkl_settings_path or meta.settings_path
elif b == Backend.JSTPROVE:
circuit_rel = meta.jstprove_circuit_path or meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.jstprove_settings_path or meta.settings_path
else:
circuit_rel = meta.circuit_path
pk_rel = meta.pk_path
settings_rel = meta.settings_path
circuit_path = Utils.resolve_under_slice(slice_dir, circuit_rel)
pk_path = Utils.resolve_under_slice(slice_dir, pk_rel)
settings_path = Utils.resolve_under_slice(slice_dir, settings_rel)
if settings_path and not os.path.exists(settings_path):
logger.warning(f"Settings file not found at {settings_path}; proceeding without it.")
settings_path = None
return circuit_path, pk_path, settings_path
🤖 Prompt for AI Agents
In `@dsperse/src/prove/utils/prover_utils.py` around lines 14 - 43, In
resolve_prove_artifacts, backend-specific PK/settings fields are not being used;
update the selection logic so when b == Backend.EZKL you use meta.ezkl_pk_path
(fallback to meta.pk_path) and meta.ezkl_settings_path (fallback to
meta.settings_path), and when b == Backend.JSTPROVE use meta.jstprove_pk_path
and meta.jstprove_settings_path similarly; then call Utils.resolve_under_slice
for pk_path and settings_path as already done and keep the existing existence
check for settings_path. Reference: resolve_prove_artifacts, Backend.EZKL,
Backend.JSTPROVE, meta.ezkl_pk_path, meta.ezkl_settings_path,
meta.jstprove_pk_path, meta.jstprove_settings_path, Utils.resolve_under_slice.

Comment on lines +51 to +52
has_ezkl = bool(group.ezkl_circuit_path) and (group.vk_path or group.ezkl_vk_path)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard EZKL execution when no runner is configured.

has_ezkl does not check self.ezkl_runner, so a group with EZKL paths but no runner will hit None.generate_witness() and crash instead of falling back.

🛠️ Proposed fix
-        has_ezkl = bool(group.ezkl_circuit_path) and (group.vk_path or group.ezkl_vk_path)
+        has_ezkl = (
+            bool(group.ezkl_circuit_path)
+            and (group.vk_path or group.ezkl_vk_path)
+            and self.ezkl_runner is not None
+        )
🤖 Prompt for AI Agents
In `@dsperse/src/run/channel_split_executor.py` around lines 51 - 52, The current
has_ezkl check (has_ezkl = bool(group.ezkl_circuit_path) and (group.vk_path or
group.ezkl_vk_path)) doesn’t verify that an EZKL runner exists, so calls to
self.ezkl_runner.generate_witness() can hit None; update the condition to also
require self.ezkl_runner (e.g., include "and self.ezkl_runner") or add an
explicit guard before invoking generate_witness() so groups with EZKL files but
no runner fall back safely; locate uses of has_ezkl and the call to
generate_witness() to apply the fix (symbols: has_ezkl, self.ezkl_runner,
generate_witness).

Comment on lines +108 to +113
def apply_bias(self, summed: torch.Tensor, channel_split: ChannelSplitInfo) -> torch.Tensor:
if channel_split.bias_path:
bias_path = self._resolve_path(channel_split.bias_path)
if bias_path and Path(bias_path).exists():
bias_tensor = torch.from_numpy(np.load(bias_path)).reshape(1, -1, 1, 1)
summed = summed + bias_tensor
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align bias tensor device/dtype with summed.

If summed is on GPU or uses a non-default dtype, adding a CPU float tensor will error or upcast unexpectedly.

🛠️ Proposed fix
             if bias_path and Path(bias_path).exists():
                 bias_tensor = torch.from_numpy(np.load(bias_path)).reshape(1, -1, 1, 1)
-                summed = summed + bias_tensor
+                bias_tensor = bias_tensor.to(device=summed.device, dtype=summed.dtype)
+                summed = summed + bias_tensor
🤖 Prompt for AI Agents
In `@dsperse/src/run/channel_split_executor.py` around lines 108 - 113, In
apply_bias, the bias tensor loaded from numpy is created on CPU with default
dtype which can cause device/dtype mismatches when adding to summed; modify the
creation of bias_tensor in apply_bias (ChannelSplitInfo.bias_path handling) to
construct it on the same device and dtype as summed (e.g., use torch.as_tensor
or torch.from_numpy followed by to(device=summed.device, dtype=summed.dtype))
and then reshape to (1, -1, 1, 1) before adding so the addition uses matching
device and dtype.

Comment on lines +237 to +260
def verify_slice_logic(args: tuple) -> dict:
"""Core logic for verifying a single slice (handles tuple for pool compatibility)."""
(slice_id, preferred, proof_path, circuit_path, settings_path, vk_path, input_path, output_path, witness_path, tiling_info, run_path, slice_dir) = args

result = SliceResult(slice_id=slice_id, success=False)
start_time = time.time()

if tiling_info:
tiling = TilingInfo.from_dict(tiling_info)
success, tile_verifs = VerifierUtils.verify_tile_batch(slice_id, Path(run_path), tiling.num_tiles, preferred, Path(slice_dir), RunSliceMetadata.from_dict({'tiling': tiling_info}))
result.success = success
result.method = ExecutionMethod.JSTPROVE_VERIFY if preferred == Backend.JSTPROVE else ExecutionMethod.EZKL_VERIFY
result.tiles = [TileResult.from_dict(v) for v in tile_verifs]
result.error = None if success else "One or more tiles failed verification"
else:
try:
ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), circuit_path, input_path, output_path, witness_path, settings_path, vk_path)
result.success = ok
result.method = ExecutionMethod.JSTPROVE_VERIFY if preferred == Backend.JSTPROVE else ExecutionMethod.EZKL_VERIFY
result.error = None if ok else "verification_failed"
except Exception as e:
result.error = str(e)
result.method = ExecutionMethod.JSTPROVE_VERIFY if preferred == Backend.JSTPROVE else ExecutionMethod.EZKL_VERIFY

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tiled verification drops resolved artifact paths.

In the tiled branch, RunSliceMetadata is rebuilt with only tiling_info, so circuit/settings/vk paths are lost and verify_tile sees None. Use the already-resolved paths from args or pass full metadata.

🛠️ Suggested change
-            success, tile_verifs = VerifierUtils.verify_tile_batch(slice_id, Path(run_path), tiling.num_tiles, preferred, Path(slice_dir), RunSliceMetadata.from_dict({'tiling': tiling_info}))
+            meta_dict = {
+                "tiling": tiling_info,
+                "circuit_path": circuit_path,
+                "jstprove_circuit_path": circuit_path,
+                "settings_path": settings_path,
+                "vk_path": vk_path,
+            }
+            success, tile_verifs = VerifierUtils.verify_tile_batch(
+                slice_id,
+                Path(run_path),
+                tiling.num_tiles,
+                preferred,
+                Path(slice_dir),
+                RunSliceMetadata.from_dict(meta_dict),
+            )
🧰 Tools
🪛 Ruff (0.14.14)

257-257: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@dsperse/src/verify/utils/verifier_utils.py` around lines 237 - 260, In
verify_slice_logic, the tiled branch reconstructs
RunSliceMetadata.from_dict({'tiling': tiling_info}) losing resolved artifact
paths so downstream verify_tile/verify_tile_batch get None; fix by constructing
RunSliceMetadata with the full metadata (include circuit_path, settings_path,
vk_path, input_path, output_path, witness_path, proof_path or their resolved
filenames) or pass the already-resolved Path values directly into
VerifierUtils.verify_tile_batch/verify_tile so tile verification receives the
correct artifact paths; update the call site in verify_slice_logic (and the
tiling variable usage) to supply the complete metadata or direct paths rather
than only tiling_info.

Comment on lines +27 to +45
def __init__(self, parallel: int = 1):
"""
Initialize the verifier.

Args:
parallel: Number of parallel processes for verification (default: 1)
"""
self.parallel = max(1, parallel)
try:
self.ezkl_runner = EZKL()
except RuntimeError:
self.ezkl_runner = None
logger.warning("EZKL CLI not available. EZKL backend will be disabled.")

try:
self.jstprove_runner = JSTprove()
except Exception:
self.jstprove_runner = None
logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the JSTprove class definition
find . -type f -name "*.py" | xargs grep -l "class JSTprove"

Repository: inference-labs-inc/dsperse

Length of output: 151


🏁 Script executed:

#!/bin/bash
# Search for imports and usage of JSTprove to understand the exception types
rg "JSTprove" --type py -A 3 -B 3

Repository: inference-labs-inc/dsperse

Length of output: 50382


🏁 Script executed:

#!/bin/bash
# Check the verifier.py file to see the full context
cat -n dsperse/src/verify/verifier.py | head -50

Repository: inference-labs-inc/dsperse

Length of output: 2052


🏁 Script executed:

cat -n dsperse/src/backends/jstprove.py | sed -n '35,55p'

Repository: inference-labs-inc/dsperse

Length of output: 946


🏁 Script executed:

# Check what exceptions JSTprove.__init__ can raise
rg "class JSTprove" -A 40 dsperse/src/backends/jstprove.py | head -50

Repository: inference-labs-inc/dsperse

Length of output: 1602


🏁 Script executed:

# Compare exception handling in Prover (similar pattern)
cat -n dsperse/src/prove/prover.py | sed -n '27,46p'

Repository: inference-labs-inc/dsperse

Length of output: 935


🏁 Script executed:

# Check Runner to see if it has different exception handling for JSTprove
cat -n dsperse/src/run/runner.py | sed -n '27,46p'

Repository: inference-labs-inc/dsperse

Length of output: 1011


🏁 Script executed:

# Verify what exceptions can actually be raised by JSTprove.__init__
cat -n dsperse/src/backends/jstprove.py | sed -n '36,65p'

Repository: inference-labs-inc/dsperse

Length of output: 1385


Avoid swallowing unexpected JSTprove init failures.

Catching broad Exception hides real setup bugs. JSTprove only documents RuntimeError, so catch specific exception types to surface unexpected failures, and use logger.exception() to preserve the stack trace.

🛠️ Suggested change
-        try:
-            self.jstprove_runner = JSTprove()
-        except Exception:
-            self.jstprove_runner = None
-            logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
+        try:
+            self.jstprove_runner = JSTprove()
+        except (RuntimeError, OSError):
+            self.jstprove_runner = None
+            logger.exception("JSTprove CLI not available. JSTprove backend will be disabled.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, parallel: int = 1):
"""
Initialize the verifier.
Args:
parallel: Number of parallel processes for verification (default: 1)
"""
self.parallel = max(1, parallel)
try:
self.ezkl_runner = EZKL()
except RuntimeError:
self.ezkl_runner = None
logger.warning("EZKL CLI not available. EZKL backend will be disabled.")
try:
self.jstprove_runner = JSTprove()
except Exception:
self.jstprove_runner = None
logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
def __init__(self, parallel: int = 1):
"""
Initialize the verifier.
Args:
parallel: Number of parallel processes for verification (default: 1)
"""
self.parallel = max(1, parallel)
try:
self.ezkl_runner = EZKL()
except RuntimeError:
self.ezkl_runner = None
logger.warning("EZKL CLI not available. EZKL backend will be disabled.")
try:
self.jstprove_runner = JSTprove()
except (RuntimeError, OSError):
self.jstprove_runner = None
logger.exception("JSTprove CLI not available. JSTprove backend will be disabled.")
🧰 Tools
🪛 Ruff (0.14.14)

43-43: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@dsperse/src/verify/verifier.py` around lines 27 - 45, The JSTprove
initialization in Verifier.__init__ is catching a broad Exception and swallowing
tracebacks; change the second try/except to catch RuntimeError (like the EZKL
block) by using "except RuntimeError as e" for JSTprove(), set
self.jstprove_runner = None, and call logger.exception("JSTprove CLI not
available. JSTprove backend will be disabled.") so the stack trace is preserved
and unexpected errors aren't hidden (refer to jstprove_runner and
Verifier.__init__).

Comment on lines +198 to +213
def _handle_single_slice_verifying(self, run_path: Path, model_path: str | Path, detected: str,
backend: str | None, tiles_range: range | list[int] | None) -> dict:
"""Internal: Handle verification for a single slice run directory."""
if backend not in (Backend.JSTPROVE, Backend.EZKL):
raise ValueError("Single-slice verification requires explicit backend.")

dirs_model_path = model_path
if detected != "dirs":
dirs_model_path = Converter.convert(str(model_path), output_type="dirs", cleanup=False)

result = self._verify_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)

if detected != "dirs":
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
return result

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code in verifier.py
wc -l dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 104


🏁 Script executed:

# Read the specific lines mentioned in the review
sed -n '195,220p' dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 1577


🏁 Script executed:

# Let's also check the _verify_single_slice method to understand if it raises exceptions
rg -A 10 "def _verify_single_slice" dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 673


🏁 Script executed:

# Check for similar try/finally patterns in the same file for error handling
rg -B 2 -A 5 "try:" dsperse/src/verify/verifier.py | head -50

Repository: inference-labs-inc/dsperse

Length of output: 1347


Wrap _verify_single_slice call in try/finally to ensure temp dirs are cleaned up on failures.

The conversion creates temporary directories with cleanup=False, but if _verify_single_slice() raises an exception, the cleanup code never executes, leaving temporary files behind. The codebase already uses this try/finally pattern in a similar verification method.

🛠️ Suggested change
-        result = self._verify_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
-        
-        if detected != "dirs":
-            Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
-        return result
+        try:
+            return self._verify_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
+        finally:
+            if detected != "dirs":
+                Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
🧰 Tools
🪛 Ruff (0.14.14)

202-202: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@dsperse/src/verify/verifier.py` around lines 198 - 213, The call to
_verify_single_slice in _handle_single_slice_verifying can raise and currently
skips the cleanup Converter.convert(..., cleanup=True); wrap the
_verify_single_slice invocation in a try/finally block and move the
Converter.convert cleanup call into the finally so temporary "dirs" output is
always removed when detected != "dirs"; keep the initial conversion that sets
dirs_model_path as-is and use Utils.dirs_root_from(Path(dirs_model_path)) in the
finally to compute the path for Converter.convert.

Comment on lines +244 to +255
proof_path = run_path / "proof.json"
if not proof_path.exists(): raise FileNotFoundError(f"Proof file not found at {proof_path}")

input_path, output_path = run_path / "input.json", run_path / "output.json"
wf = VerifierUtils.get_witness_file(run_path, slice_id)
witness_path = Path(wf) if wf else (run_path / "output_witness.bin")

circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path)
settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path)
vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path)

ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path), settings_path, vk_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "verifier.py" -type f

Repository: inference-labs-inc/dsperse

Length of output: 102


🏁 Script executed:

find . -path "*/verify/*" -name "*.py" -type f | head -20

Repository: inference-labs-inc/dsperse

Length of output: 219


🏁 Script executed:

wc -l dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 104


🏁 Script executed:

sed -n '240,260p' dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 1336


🏁 Script executed:

# Search for run-root verification pattern
rg -n "verify_run_root" dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Search for settings_path and vk_path validation patterns
rg -n "settings_path\|vk_path" dsperse/src/verify/verifier.py -A 2

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Get the full context of the verifier.py file structure
head -100 dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 4311


🏁 Script executed:

# Search for run-root or root verification
rg -n "run.root\|run_root\|root_path" dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Search for Backend.EZKL references
rg -n "Backend\|EZKL" dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Get the full context around lines 244-255
sed -n '200,280p' dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 4164


🏁 Script executed:

# Find all method definitions to understand code structure
rg -n "^\s*def " dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 1268


🏁 Script executed:

# Look at the full _prepare_work_items to see the validation
sed -n '75,112p' dsperse/src/verify/verifier.py

Repository: inference-labs-inc/dsperse

Length of output: 2198


🏁 Script executed:

# Find verify_with_backend definition
rg -n "def verify_with_backend" dsperse/src/verify/

Repository: inference-labs-inc/dsperse

Length of output: 145


🏁 Script executed:

# Check verify_with_backend implementation
sed -n '115,160p' dsperse/src/verify/utils/verifier_utils.py

Repository: inference-labs-inc/dsperse

Length of output: 1736


🏁 Script executed:

# Check if there are any other calls to verify_with_backend
rg -n "verify_with_backend" dsperse/src/verify/

Repository: inference-labs-inc/dsperse

Length of output: 979


🏁 Script executed:

# Check EZKL backend verify method to see how it uses settings_path and vk_path
find . -path "*/backends/ezkl*" -name "*.py" -type f

Repository: inference-labs-inc/dsperse

Length of output: 100


🏁 Script executed:

# Get more details on EZKL verify method
rg -n "def verify" dsperse/src/backends/ezkl.py -A 15

Repository: inference-labs-inc/dsperse

Length of output: 1047


Add explicit settings/vk file validation to single-slice EZKL verification.

The run-root verification path (_prepare_work_items) validates settings and vk files for EZKL before verification, skipping gracefully with warnings. The single-slice path (_verify_single_slice) skips these checks, relying on the backend to fail later. Add explicit validation to match the run-root pattern and fail fast with clear errors.

🛠️ Suggested change
            circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path)
            settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path)
            vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path)

+            if preferred == Backend.EZKL:
+                if not settings_path or not os.path.exists(settings_path):
+                    raise FileNotFoundError(f"Settings file not found at {settings_path}")
+                if not vk_path or not os.path.exists(vk_path):
+                    raise FileNotFoundError(f"Verification key not found at {vk_path}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
proof_path = run_path / "proof.json"
if not proof_path.exists(): raise FileNotFoundError(f"Proof file not found at {proof_path}")
input_path, output_path = run_path / "input.json", run_path / "output.json"
wf = VerifierUtils.get_witness_file(run_path, slice_id)
witness_path = Path(wf) if wf else (run_path / "output_witness.bin")
circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path)
settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path)
vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path)
ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path), settings_path, vk_path)
proof_path = run_path / "proof.json"
if not proof_path.exists(): raise FileNotFoundError(f"Proof file not found at {proof_path}")
input_path, output_path = run_path / "input.json", run_path / "output.json"
wf = VerifierUtils.get_witness_file(run_path, slice_id)
witness_path = Path(wf) if wf else (run_path / "output_witness.bin")
circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path)
settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path)
vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path)
if preferred == Backend.EZKL:
if not settings_path or not os.path.exists(settings_path):
raise FileNotFoundError(f"Settings file not found at {settings_path}")
if not vk_path or not os.path.exists(vk_path):
raise FileNotFoundError(f"Verification key not found at {vk_path}")
ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path), settings_path, vk_path)
🧰 Tools
🪛 Ruff (0.14.14)

245-245: Multiple statements on one line (colon)

(E701)


245-245: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@dsperse/src/verify/verifier.py` around lines 244 - 255, The single-slice
verifier (_verify_single_slice) resolves settings_path and vk_path but doesn't
validate they exist before calling VerifierUtils.verify_with_backend; add the
same pre-checks used in _prepare_work_items: after computing settings_path and
vk_path (via Utils.resolve_under_slice using slice_dir and
meta.settings_path/meta.vk_path), if preferred indicates the EZKL backend (check
the preferred variable or backend string used elsewhere) then verify
settings_path.exists() and vk_path.exists() and raise FileNotFoundError with
clear messages (or fail fast) if missing so verification errors are surfaced
early before calling VerifierUtils.verify_with_backend.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dsperse/src/backends/jstprove.py`:
- Around line 383-389: The code redundantly calls .long() when building scaled
and fails to validate untrusted scale parameters; remove the extra conversion by
changing the final conversion to use flat.tolist() (since flat is already long
after rounding) and add validation checks for scale_base and scale_exponent
before computing scale = scale_base ** scale_exponent (e.g., ensure scale_base
is within a safe positive range like [0.1, 1e6] and scale_exponent is an integer
within a bounded range like [-100, 100], reject or clamp values outside these
bounds), then only perform the scaling/rounding when parameters pass validation
to prevent overflow/underflow/precision loss (references: variables/functions
raw_output, flat, scale_base, scale_exponent, scaled).
🧹 Nitpick comments (19)
dsperse/src/backends/jstprove.py (5)

112-114: Unused parameters should be prefixed with underscore or removed.

The vk_path and settings_path parameters are declared but never used in generate_witness. If kept for API compatibility, prefix them with underscore to indicate intentional non-use.

♻️ Suggested fix
     def generate_witness(
         self,
         input_file: Union[str, Path],
         model_path: Union[str, Path],  # This is the circuit path in JSTprove context
         output_file: Union[str, Path],
-        vk_path: Optional[Union[str, Path]] = None,
-        settings_path: Optional[Union[str, Path]] = None
+        _vk_path: Optional[Union[str, Path]] = None,
+        _settings_path: Optional[Union[str, Path]] = None
     ) -> Tuple[bool, Any]:

133-133: Multiple statements on one line reduces readability.

Split the conditional and raise onto separate lines per PEP 8.

♻️ Suggested fix
-            if not ok: raise RuntimeError(f"Circuit compilation failed: {err}")
+            if not ok:
+                raise RuntimeError(f"Circuit compilation failed: {err}")

147-148: Use logger.exception instead of logger.error when logging in exception handlers.

logger.exception automatically includes the traceback, providing better diagnostics for debugging failures.

♻️ Suggested fix
         except RuntimeError as e:
-            logger.error(f"Witness generation failed: {e}")
+            logger.exception(f"Witness generation failed: {e}")
             return False, str(e)
@@
         except Exception as e:
-            logger.error(f"Failed to process witness output: {e}")
+            logger.exception(f"Failed to process witness output: {e}")
             return False, str(e)

Also applies to: 156-158


221-222: Silently swallowing exceptions in batch processing loses error context.

The inner try-except in the batch result processing loop catches all exceptions and converts them to (False, str(e)). Consider logging these errors to aid debugging when batch jobs fail partially.

♻️ Suggested improvement
             except Exception as e:
+                logger.warning(f"Failed to process batch output for job: {e}")
                 results.append((False, str(e)))

333-334: Unused parameters in verify and compile_circuit should be marked.

The settings_path and vk_path parameters in verify, and settings_path in compile_circuit are unused. Prefix with underscore for clarity.

Also applies to: 401-401

dsperse/src/run/tile_executor.py (3)

92-94: Use explicit Optional type annotation for backend parameter.

PEP 484 prohibits implicit Optional. The parameter defaults to None but is typed as str.

♻️ Suggested fix
     def get_execution_config(
-        self, tiling: TilingInfo, meta: RunSliceMetadata, backend: str = None, has_jst_runner: bool = False
+        self, tiling: TilingInfo, meta: RunSliceMetadata, backend: str | None = None, has_jst_runner: bool = False
     ) -> dict:

239-251: collect_outputs raises on any tile failure, preventing partial result collection.

When a tile fails, RuntimeError is raised immediately, losing information about tiles that succeeded. For debugging purposes, consider collecting all results first, then raising with a summary of failures.

💡 Alternative approach
def collect_outputs(self, results_map: dict, num_tiles: int, slice_idx: int, tiling: TilingInfo) -> list[TileResult]:
    tile_exec_infos = []
    failures = []
    for tile_idx in range(num_tiles):
        result = results_map[tile_idx]
        if result['success']:
            output = self.process_result(result, tiling, slice_idx, tile_idx)
            if output is not None:
                tile_exec_infos.append(TileResult(tile_idx=tile_idx, success=True, method=result.get('method', 'unknown')))
            else:
                failures.append(f"Tile {tile_idx}: no output tensor")
        else:
            failures.append(f"Tile {tile_idx}: {result.get('error', 'unknown')}")
    
    if failures:
        raise RuntimeError(f"Tile execution failed:\n" + "\n".join(failures))
    return tile_exec_infos

253-256: Lazy import inside method may cause repeated import overhead.

The import of RunnerUtils inside _resolve_path is executed on every call. Consider moving it to the top of the file or caching at class level if there's a circular import concern.

♻️ Suggested fix if no circular import

Move to top-level imports:

from dsperse.src.run.utils.runner_utils import RunnerUtils
dsperse/src/run/runner.py (6)

29-35: Use explicit Optional type annotations for constructor parameters.

Parameters that default to None should use explicit union syntax for PEP 484 compliance.

♻️ Suggested fix
     def __init__(
         self,
-        run_dir: str = None,
+        run_dir: str | None = None,
         threads: int = 1,
         on_slice_complete: Optional[SliceCompleteCallback] = None,
         batch: bool = False,
     ):

351-351: Add strict=True to zip() for length mismatch detection.

When zipping tile_args_list and results, a length mismatch would silently truncate. Using strict=True ensures both iterables have equal length.

♻️ Suggested fix
-        for args, (success, output) in zip(tile_args_list, results):
+        for args, (success, output) in zip(tile_args_list, results, strict=True):

413-413: Rename unused input_name variable to satisfy linter.

The unpacked variable is never used. Prefix with underscore.

♻️ Suggested fix
-        input_tensor, output_shape, input_name = cs_executor.prepare_config(meta)
+        input_tensor, output_shape, _input_name = cs_executor.prepare_config(meta)

497-497: Multiple statements on one line reduces readability.

Split the for-loop body onto separate lines.

♻️ Suggested fix
-                    for oname in info.dependencies.output: tensor_cache[oname] = cached_tensor
+                    for oname in info.dependencies.output:
+                        tensor_cache[oname] = cached_tensor

543-544: Use a custom exception class instead of generic Exception.

Raising bare Exception makes it harder for callers to catch specific errors. Consider using ValueError or a custom SlicePathError.

♻️ Suggested fix
         if slice_path is None or not Path(slice_path).exists():
-            raise Exception("A valid path must be provided for slices")
+            raise ValueError("A valid path must be provided for slices")

167-167: Multiple methods have implicit Optional parameters.

Several methods use = None defaults without explicit | None type annotations. Consider updating for PEP 484 compliance: run_ezkl_inference, run_jstprove_inference, run_tiled_inference, run_tiles, run_channel_split_inference, run_onnx_single, _run, and run.

Also applies to: 197-197, 227-227, 243-243, 405-405, 459-459, 476-476, 530-530

dsperse/src/run/utils/runner_utils.py (5)

175-175: Remove unused parameters from preprocess_input.

The model_directory and save_reshape parameters are declared but never used.

♻️ Suggested fix
     `@staticmethod`
-    def preprocess_input(input_path: str, model_directory: str = None, save_reshape: bool = False) -> torch.Tensor:
+    def preprocess_input(input_path: str) -> torch.Tensor:
         """Preprocess input data from JSON."""

411-418: Silent exception swallowing hides errors.

The try-except-pass pattern at lines 416-417 silently ignores JSON parsing errors or other issues when checking slice completion. At minimum, log the exception.

♻️ Suggested fix
             try:
                 with open(output_file, 'r') as f:
                     data = json.load(f)
                 return True, data
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug(f"Failed to load output for {slice_id}: {e}")
         return False, None

453-453: Remove unused exec_info parameter from process_inference_result.

The exec_info parameter is passed but never used in the method body.

♻️ Suggested fix
     `@staticmethod`
-    def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, exec_info, tensor_cache: dict) -> torch.Tensor | None:
+    def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, tensor_cache: dict) -> torch.Tensor | None:

Note: This requires updating the call site in runner.py line 511.


465-465: Use next(iter(...)) instead of list(...)[0] for efficiency.

Creating a full list just to access the first element is wasteful.

♻️ Suggested fix
-            first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None
+            first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None

420-441: Consider using structured return type for try_resume_slice.

The method returns a 3-tuple (bool, Tensor | None, ExecutionInfo | None). A named tuple or dataclass would improve readability and prevent positional errors at call sites.

Comment on lines +383 to +389
flat = torch.tensor(raw_output).flatten()

if flat.is_floating_point():
scale = scale_base ** scale_exponent
flat = torch.round(flat * scale).long()

scaled = flat.long().tolist()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's check the context around the flagged lines
cat -n dsperse/src/backends/jstprove.py | sed -n '370,400p'

Repository: inference-labs-inc/dsperse

Length of output: 1190


🏁 Script executed:

# Check how scale_base and scale_exponent are set in metadata
rg -n "scale_base|scale_exponent" dsperse/ --type py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 886


🏁 Script executed:

# Search for where these values come from and how they're validated
rg -n "scale_base\s*=" dsperse/ --type py -B1 -A1

Repository: inference-labs-inc/dsperse

Length of output: 285


🏁 Script executed:

# Look for any test files related to this module
fd -e py ".*test.*jstprove" dsperse/

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Let's see the full function to understand the context better
cat -n dsperse/src/backends/jstprove.py | sed -n '360,395p'

Repository: inference-labs-inc/dsperse

Length of output: 1472


🏁 Script executed:

# Search for where this metadata is created or where prepare_veri_output is called
rg -n "prepare_veri_output" dsperse/ --type py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Search for any documentation or constants related to scale_base/scale_exponent
rg -n "scale_base|scale_exponent" . --type py --type md --type txt -B3 -A3

Repository: inference-labs-inc/dsperse

Length of output: 1240


🏁 Script executed:

# Search for where the metadata file is created or where scale_base/scale_exponent are set
rg -n "_metadata.json|scale_base|scale_exponent" dsperse/ --type py -B3 -A3 | head -60

Repository: inference-labs-inc/dsperse

Length of output: 4411


🏁 Script executed:

# Look for any examples, tests, or config files that show typical values
fd -e json . dsperse/ | head -20

Repository: inference-labs-inc/dsperse

Length of output: 227


🏁 Script executed:

# Search for documentation or comments about the scaling operation
rg -n "scale" dsperse/src/backends/jstprove.py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 1570


🏁 Script executed:

# Search for where scale_base and scale_exponent are set/generated in the codebase
rg -n "scale_base.*=" dsperse/ --type py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 452


🏁 Script executed:

# Look for any circuit metadata generation
rg -n "circuit.*metadata|_metadata.json" dsperse/ --type py -B1 -A1 | head -40

Repository: inference-labs-inc/dsperse

Length of output: 1906


🏁 Script executed:

# Check if there are example metadata files or any documentation
find dsperse/ -name "*metadata*" -type f

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Search for where circuit metadata is generated with scale parameters
rg -n "scale_base|scale_exponent" . --type py -B5 -A5 | grep -A10 -B10 "scale_base.*:"

Repository: inference-labs-inc/dsperse

Length of output: 1206


🏁 Script executed:

# Look for any generation or compilation logic that might create these metadata files
rg -n "compile_circuit|generate.*metadata|_metadata" dsperse/src/backends/ --type py -A5 | head -50

Repository: inference-labs-inc/dsperse

Length of output: 3559


🏁 Script executed:

# Check if there's documentation or README about the metadata format
find dsperse/ -name "README*" -o -name "*.md" | xargs grep -l "scale_base\|scale_exponent" 2>/dev/null || echo "No matches in docs"

Repository: inference-labs-inc/dsperse

Length of output: 88


🏁 Script executed:

# Look at the compile_circuit function mentioned in the file to understand metadata better
cat -n dsperse/src/backends/jstprove.py | sed -n '397,430p'

Repository: inference-labs-inc/dsperse

Length of output: 1477


🏁 Script executed:

# Continue reading the compile_circuit function to see if it generates metadata
cat -n dsperse/src/backends/jstprove.py | sed -n '425,480p'

Repository: inference-labs-inc/dsperse

Length of output: 2903


🏁 Script executed:

# Search for where _metadata.json files are created or written
rg -n "_metadata\.json" dsperse/ --type py -B3 -A3

Repository: inference-labs-inc/dsperse

Length of output: 3272


🏁 Script executed:

# Look for the _run_command method to see what external commands are called
rg -n "_run_command" dsperse/src/backends/jstprove.py -B2 -A8 | head -60

Repository: inference-labs-inc/dsperse

Length of output: 2027


🏁 Script executed:

# Look for test files or examples that might show metadata structure
find dsperse/ -type f -name "*.json" | xargs grep -l "scale_base\|scale_exponent" 2>/dev/null | head -5

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Search for JSTproveUtils which might have more context
rg -n "class JSTproveUtils|def.*scale" dsperse/src/backends/jstprove.py -A5

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Check if there are any validation or bounds checking for the scale parameters elsewhere
rg -n "scale.*base|scale.*exponent|scale_base|scale_exponent" dsperse/ --type py | grep -v "metadata.get"

Repository: inference-labs-inc/dsperse

Length of output: 249


🏁 Script executed:

# Look at the JSTproveUtils module to understand the scaling context better
cat -n dsperse/src/backends/jstprove.py | sed -n '1,60p'

Repository: inference-labs-inc/dsperse

Length of output: 2446


🏁 Script executed:

# Search for any usage of the _prepare_verification_output function
rg -n "_prepare_verification_output" dsperse/ --type py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 854


🏁 Script executed:

# Look for the JSTproveUtils class to understand what convert_to_logits and other utilities do
rg -n "class JSTproveUtils" dsperse/src/backends/utils/ --type py -A20

Repository: inference-labs-inc/dsperse

Length of output: 2125


🏁 Script executed:

# Check if there are any tests for the jstprove backend
find dsperse/ -type f -name "*test*jstprove*" -o -name "*jstprove*test*"

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Look at lines 385-389 more carefully - check if there's redundancy
cat -n dsperse/src/backends/jstprove.py | sed -n '383,395p'

Repository: inference-labs-inc/dsperse

Length of output: 560


🏁 Script executed:

# Search for where circuit metadata files are actually created or populated
rg -n "circuit.*metadata\|_metadata.*write\|json.dump.*scale" dsperse/ --type py -B2 -A2

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Check if there are any example metadata files or documentation
find dsperse/ -name "*metadata*" -type f -exec file {} \;

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Look for any external JSTprove tool integration that might create these metadata files
rg -n "jstprove|JSTprove" dsperse/src/backends/jstprove.py | grep -i "metadata\|scale" | head -10

Repository: inference-labs-inc/dsperse

Length of output: 254


🏁 Script executed:

# Verify the line 389 behavior - is .long() on an already-long tensor problematic?
cat -n dsperse/src/backends/jstprove.py | sed -n '383,395p'

Repository: inference-labs-inc/dsperse

Length of output: 560


🏁 Script executed:

# Search for any constants or defaults related to scale parameters in the entire codebase
rg -n "scale_base|scale_exponent" dsperse/ --type py --type json

Repository: inference-labs-inc/dsperse

Length of output: 427


🏁 Script executed:

# Look for any documentation or comments about the metadata format in the jstprove_utils
rg -n "metadata\|scale" dsperse/src/backends/utils/jstprove_utils.py -B2 -A2 | head -50

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Check if the metadata file is created by the external JSTprove command
cat -n dsperse/src/backends/jstprove.py | sed -n '428,445p'

Repository: inference-labs-inc/dsperse

Length of output: 812


🏁 Script executed:

# Search the entire repo for any README or documentation mentioning scale or metadata
find dsperse/ -type f \( -name "*.md" -o -name "README*" -o -name "*.txt" \) -exec grep -l "scale\|metadata\|JSTprove" {} \;

Repository: inference-labs-inc/dsperse

Length of output: 52


🏁 Script executed:

# Check if there's a git history or comments about why .long() is called twice
rg -n "\.long()" dsperse/src/backends/jstprove.py -B3 -A1

Repository: inference-labs-inc/dsperse

Length of output: 276


Remove redundant .long() call and add validation for untrusted scale parameters.

Line 389 redundantly calls .long() on a tensor that was already converted to long on line 387. Additionally, the scale_base and scale_exponent parameters from metadata lack validation—they are untrusted external input from the JSTprove CLI with no bounds checking. Large exponents or extreme base values could cause numeric overflow/underflow or precision loss. Add parameter validation with reasonable bounds before the scaling operation.

🤖 Prompt for AI Agents
In `@dsperse/src/backends/jstprove.py` around lines 383 - 389, The code
redundantly calls .long() when building scaled and fails to validate untrusted
scale parameters; remove the extra conversion by changing the final conversion
to use flat.tolist() (since flat is already long after rounding) and add
validation checks for scale_base and scale_exponent before computing scale =
scale_base ** scale_exponent (e.g., ensure scale_base is within a safe positive
range like [0.1, 1e6] and scale_exponent is an integer within a bounded range
like [-100, 100], reject or clamp values outside these bounds), then only
perform the scaling/rounding when parameters pass validation to prevent
overflow/underflow/precision loss (references: variables/functions raw_output,
flat, scale_base, scale_exponent, scaled).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dsperse/src/backends/onnx_models.py (1)

281-287: ⚠️ Potential issue | 🟠 Major

Error handling may raise NameError and mask original exceptions.

If _create_session fails at line 154, session is never assigned, causing NameError at line 284. Similarly, input_numpy may be undefined if tensor conversion fails. This masks the original exception.

🐛 Proposed fix with defensive checks
         except Exception as e:
             logger.error(f"Error in apply_onnx_shape: {e}")
             # In case of error, return the original tensor with the first input name
-            if len(session.get_inputs()) > 0:
-                return {session.get_inputs()[0].name: input_numpy}
-            else:
-                return {"input": input_numpy}
+            try:
+                # Attempt to get input name from session if available
+                if 'session' in dir() and session is not None:
+                    inputs = session.get_inputs()
+                    if inputs:
+                        input_name = inputs[0].name
+                    else:
+                        input_name = "input"
+                else:
+                    input_name = "input"
+                
+                # Use input_numpy if available, otherwise convert input_tensor
+                if 'input_numpy' not in dir() or input_numpy is None:
+                    if isinstance(input_tensor, torch.Tensor):
+                        input_numpy = input_tensor.detach().cpu().numpy()
+                    else:
+                        input_numpy = np.asarray(input_tensor)
+                
+                return {input_name: input_numpy}
+            except Exception:
+                # Last resort fallback
+                logger.exception("Error in fallback handling")
+                raise e
🤖 Fix all issues with AI agents
In `@dsperse/src/backends/onnx_models.py`:
- Around line 112-113: The code silently truncates the flattened array when
flat.size > elements_needed (flat = flat[:elements_needed]), which can hide
input mismatches; update the branch to emit a warning before truncation using
the module/logger used elsewhere in this file (match the logger pattern used
around line 204), including the array name/shape and elements_needed, then
perform the slice (keep the same variable names: flat and elements_needed) so
callers and logs show the discrepancy.
🧹 Nitpick comments (2)
dsperse/src/backends/onnx_models.py (2)

31-31: Use explicit str | None type hints for optional parameters.

Static analysis flags implicit Optional on lines 31, 50, 63, and 119. PEP 484 recommends explicit union types.

♻️ Suggested fix
     `@staticmethod`
-    def _process_outputs(session: ort.InferenceSession, raw_output: list, output_file: str = None) -> dict:
+    def _process_outputs(session: ort.InferenceSession, raw_output: list, output_file: str | None = None) -> dict:

Apply the same pattern to run_inference, run_inference_tensor, and run_inference_multi.


53-57: Double session creation per inference call is inefficient.

run_inference creates a session at line 53, then apply_onnx_shape (line 55) creates another at line 154. The same pattern exists in run_inference_tensor. For large models, this doubles the model loading overhead.

Consider either passing the session to apply_onnx_shape or extracting input metadata without full session creation.

Comment on lines +112 to +113
elif flat.size > elements_needed:
flat = flat[:elements_needed]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent data truncation can mask input errors.

When the input array has more elements than expected, data is silently discarded. Consider logging a warning to aid debugging, similar to the pattern used elsewhere in this file (e.g., line 204).

🐛 Suggested fix
             elif flat.size > elements_needed:
+                logger.warning(
+                    f"Truncating input: expected {elements_needed} elements, got {flat.size}"
+                )
                 flat = flat[:elements_needed]
🤖 Prompt for AI Agents
In `@dsperse/src/backends/onnx_models.py` around lines 112 - 113, The code
silently truncates the flattened array when flat.size > elements_needed (flat =
flat[:elements_needed]), which can hide input mismatches; update the branch to
emit a warning before truncation using the module/logger used elsewhere in this
file (match the logger pattern used around line 204), including the array
name/shape and elements_needed, then perform the slice (keep the same variable
names: flat and elements_needed) so callers and logs show the discrepancy.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dsperse/src/backends/jstprove.py`:
- Around line 197-201: The fixed manifest filename (manifest_path =
manifest_parent / "batch_witness_manifest.json") can be clobbered by concurrent
runs; update the code that computes manifest_path (located around
manifest_parent, manifest_dir, circuit_path, and where json.dump({"jobs": jobs},
f) is called) to generate a unique filename per run (e.g., append a UUID,
timestamp+PID, or use tempfile to create a unique temp file) and write the
manifest to that unique path, ensuring manifest_parent.mkdir(...) remains and
the rest of the code that consumes the manifest uses the new unique
manifest_path.

Comment on lines +197 to +201
manifest_parent = Path(manifest_dir) if manifest_dir else circuit_path.parent
manifest_parent.mkdir(parents=True, exist_ok=True)
manifest_path = manifest_parent / "batch_witness_manifest.json"
with open(manifest_path, "w") as f:
json.dump({"jobs": jobs}, f)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Batch manifest filename can collide across concurrent runs.

Using a fixed name (batch_witness_manifest.json) risks clobbering when multiple batches run in the same directory. Consider a unique filename (UUID / tempfile) to avoid races.

💡 Example tweak
-        manifest_path = manifest_parent / "batch_witness_manifest.json"
+        manifest_path = manifest_parent / f"batch_witness_manifest_{os.getpid()}.json"
🤖 Prompt for AI Agents
In `@dsperse/src/backends/jstprove.py` around lines 197 - 201, The fixed manifest
filename (manifest_path = manifest_parent / "batch_witness_manifest.json") can
be clobbered by concurrent runs; update the code that computes manifest_path
(located around manifest_parent, manifest_dir, circuit_path, and where
json.dump({"jobs": jobs}, f) is called) to generate a unique filename per run
(e.g., append a UUID, timestamp+PID, or use tempfile to create a unique temp
file) and write the manifest to that unique path, ensuring
manifest_parent.mkdir(...) remains and the rest of the code that consumes the
manifest uses the new unique manifest_path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
install.sh (1)

213-214: ⚠️ Potential issue | 🟡 Minor

Update or remove stale JSTprove references in Open MPI installation.

The install_openmpi function (line 213) and its call site (line 436) still reference JSTprove, but JSTprove installation has been removed from this script.

If Open MPI is still needed for other parallel execution features in the new backend, update the comments to reflect the actual purpose. If it's no longer needed, consider removing this installation step.

Proposed fix if Open MPI is still needed
-# Install Open MPI (required for JSTprove)
+# Install Open MPI (for parallel execution backends)
 install_openmpi() {
-  # Install Open MPI (required for JSTprove)
+  # Install Open MPI (for parallel execution)
   install_openmpi

Also applies to: 436-437

🤖 Fix all issues with AI agents
In `@install.sh`:
- Line 7: Update the stale comment "Open MPI (required for JSTprove)" in
install.sh: either remove that line entirely if Open MPI is no longer required,
or replace it with an accurate dependency note (e.g., "Open MPI" or the actual
reason it's needed). Locate the exact comment string "Open MPI (required for
JSTprove)" and edit or delete it so the header no longer references JSTprove.

# - Dsperse CLI (python package, console script: dsperse)
# - EZKL CLI (if missing)
# - EZKL lookup tables (if missing)
# - Open MPI (required for JSTprove)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale reference to JSTprove in header comment.

The script description on line 2 was updated to remove JSTprove, but line 7 still mentions "Open MPI (required for JSTprove)". Since JSTprove installation was removed from this script, this comment is now misleading.

Proposed fix
-#   - Open MPI (required for JSTprove)
+#   - Open MPI (optional, for parallel execution)

Or remove the line entirely if Open MPI is no longer a dependency.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# - Open MPI (required for JSTprove)
# - Open MPI (optional, for parallel execution)
🤖 Prompt for AI Agents
In `@install.sh` at line 7, Update the stale comment "Open MPI (required for
JSTprove)" in install.sh: either remove that line entirely if Open MPI is no
longer required, or replace it with an accurate dependency note (e.g., "Open
MPI" or the actual reason it's needed). Locate the exact comment string "Open
MPI (required for JSTprove)" and edit or delete it so the header no longer
references JSTprove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants