-
Notifications
You must be signed in to change notification settings - Fork 12
V3 #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
V3 #47
Conversation
WalkthroughRefactors slicing/metadata to a typed schema, adds an Autotiler for spatial tiling and channel-splitting, introduces Tile/Channel executors, parallel compile/run/prove/verify pipelines, lazy ONNX tracing, many backend utilities (JSTprove/EZKL/ONNX), CLI flags for tiling/parallelism, and extensive tests across unit and integration suites. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant Slicer as Slicer/OnnxSlicer
participant Autotiler as Autotiler
participant Compiler as Compiler
participant Runner as Runner
participant TileExec as TileExecutor
participant Prover as Prover
participant Verifier as Verifier
CLI->>Slicer: slice_model(tile_size?)
Slicer->>Autotiler: detect_tiling / apply_tiling
Autotiler-->>Slicer: tiled_slices + tiled_info
Slicer->>Compiler: compile(slices, targets)
Compiler-->>Runner: compiled artifacts metadata
CLI->>Runner: run(run_dir, threads)
Runner->>Runner: dispatch_slice (select backend, tiled/channel)
alt tiled
Runner->>TileExec: split_into_tiles
loop per tile (parallel)
Runner->>TileExec: execute_worker(tile)
TileExec-->>Runner: tile result
end
Runner->>TileExec: reconstruct_from_tiles
else channel-split
Runner->>ChannelSplitExecutor: execute_group(s)
else single
Runner->>Runner: run_onnx_single / run_jstprove_inference / run_ezkl_inference
end
Runner->>Prover: prove(run_dir, tiles_range?, parallel)
Prover-->>Verifier: proof metadata
CLI->>Verifier: verify(run_dir, tiles_range?, parallel)
Verifier-->>CLI: verification results
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @dsperse/src/slice/autotiler.py:
- Around line 91-106: Non-conv nodes are being re-created with helper.make_node
using only op_type, inputs and outputs, which drops node-specific attributes;
when iterating non_conv_ops (orig_node) and building new nodes (appended to
nodes), preserve orig_node.attribute by either passing those attributes into
helper.make_node or by cloning orig_node and replacing its input/output fields
(so attributes like LeakyRelu.alpha or Clip.min/max are retained); update the
creation of the new node in the non_conv_ops loop (where new_inputs/new_outputs
are computed and helper.make_node(...) is called) to include the original
attributes from orig_node.attribute.
🧹 Nitpick comments (5)
dsperse/src/slice/autotiler.py (4)
10-16: Minimum tile size constraint is implicit.The function searches for tile sizes down to 8 (since
range(..., 7, -1)excludes 7). Consider adding a comment explaining the minimum tile size constraint, or making it a named constant for clarity.Suggested improvement
+MIN_TILE_SIZE = 8 + def find_tile_size(spatial_dim: int, target: int) -> int | None: + """Find a tile size that evenly divides spatial_dim, searching from target down to MIN_TILE_SIZE.""" if spatial_dim <= target: return None - for tile in range(min(target, spatial_dim), 7, -1): + for tile in range(min(target, spatial_dim), MIN_TILE_SIZE - 1, -1): if spatial_dim % tile == 0: return tile return None
44-44: Prefix unused variables with underscore.As flagged by static analysis,
h_inandw_inare unpacked but never used.Suggested fix
- _, c_in, h_in, w_in = orig_dims + _, c_in, _h_in, _w_in = orig_dims
139-142: Hard-coded slice limit may be insufficient.The loop iterates up to 500 slices. Very large models could exceed this. Consider using a more dynamic approach or making this configurable.
Suggested improvement
- for i in range(500): - path = os.path.join(slices_dir, f"slice_{i}", "payload", f"slice_{i}.onnx") - if not os.path.exists(path): - continue + i = 0 + while True: + path = os.path.join(slices_dir, f"slice_{i}", "payload", f"slice_{i}.onnx") + if not os.path.exists(path): + break + i += 1 + # ... rest of loop body moved hereAlternatively, use
globoros.listdirto discover slice directories dynamically.
153-155: Prefix unused variables and consider tiling based on both dimensions.Variables
candware unused. Additionally, onlyhis checked againsttile_size, butwmight also need tiling for non-square spatial dimensions.Suggested fix
- _, c, h, w = dims + _, _c, h, w = dims if h <= tile_size: continue + # Note: Currently only tiles based on height; width is assumed to match or be smallerdsperse/src/slice/onnx_slicer.py (1)
500-500: Use explicitOptionalor union type for tile_size.As flagged by static analysis (PEP 484),
tile_size: int = Noneshould use explicit optional typing.Suggested fix
- def slice_model(self, output_path=None, tile_size: int = None): + def slice_model(self, output_path=None, tile_size: int | None = None):Or using
Optional:- def slice_model(self, output_path=None, tile_size: int = None): + def slice_model(self, output_path=None, tile_size: Optional[int] = None):Note:
Optionalwould need to be imported fromtypingif not already available.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
dsperse/src/analyzers/onnx_analyzer.pydsperse/src/cli/slice.pydsperse/src/slice/autotiler.pydsperse/src/slice/onnx_slicer.pydsperse/src/slice/slicer.py
🧰 Additional context used
🧬 Code graph analysis (3)
dsperse/src/cli/slice.py (3)
dsperse/src/slice/onnx_slicer.py (1)
slice_model(500-523)dsperse/src/slice/slicer.py (1)
slice_model(58-79)dsperse/src/slice/utils/torch_utils.py (1)
slice_model(558-578)
dsperse/src/slice/onnx_slicer.py (4)
dsperse/src/slice/autotiler.py (1)
autotile_slices(130-171)dsperse/src/cli/slice.py (1)
slice_model(111-208)dsperse/src/slice/slicer.py (1)
slice_model(58-79)dsperse/src/analyzers/onnx_analyzer.py (1)
generate_slices_metadata(337-389)
dsperse/src/slice/slicer.py (3)
dsperse/src/cli/slice.py (1)
slice_model(111-208)dsperse/src/slice/onnx_slicer.py (1)
slice_model(500-523)dsperse/src/slice/utils/torch_utils.py (1)
slice_model(558-578)
🪛 Ruff (0.14.10)
dsperse/src/slice/onnx_slicer.py
500-500: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
dsperse/src/slice/autotiler.py
44-44: Unpacked variable h_in is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
44-44: Unpacked variable w_in is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
153-153: Unpacked variable c is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
153-153: Unpacked variable w is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🔇 Additional comments (11)
dsperse/src/slice/autotiler.py (4)
1-8: LGTM on imports and constant definition.The imports and
ELEMENTWISE_OPSconstant are well-defined. The set of elementwise operations appears comprehensive for typical ONNX models with convolutions.
19-23: LGTM!The tileability check correctly ensures a model has a Conv operation and all other operations are elementwise.
26-37: LGTM!The function correctly extracts convolution parameters with appropriate ONNX defaults.
67-69: Halo calculation assumes symmetric padding.The halo is computed from
pads[0]andpads[1]only (top/left padding). If the original Conv has asymmetric padding (wherepads[0] != pads[2]orpads[1] != pads[3]), the tiled output may be incorrect.Consider validating symmetric padding or documenting this assumption:
+ # Assumes symmetric padding; asymmetric padding not supported + if pads[0] != pads[2] or pads[1] != pads[3]: + logger.warning(f"Asymmetric padding detected in {slice_path}, tiling may produce incorrect results") halo_h, halo_w = pads[0], pads[1]dsperse/src/cli/slice.py (2)
40-43: LGTM on CLI argument additions.The
--tile-sizeand--no-tilearguments are well-defined with appropriate types and help text. The--no-tileflag correctly acts as an override to disable tiling.
188-193: LGTM on tiling integration in slice command.The logic correctly derives
tile_size(respecting--no-tileas an override), logs it, and passes it through to the slicer. The integration is clean.dsperse/src/slice/slicer.py (1)
58-79: LGTM on Slicer orchestrator changes.The
tile_sizeparameter is correctly added to the signature, documented in the docstring, and propagated to the underlying slicer implementation. The existing behavior foroutput_typeconversion is preserved.dsperse/src/analyzers/onnx_analyzer.py (2)
337-352: LGTM on tiled_info parameter addition.The optional
tiled_infoparameter withNonedefault and normalization to empty dict is a clean approach that avoids None checks throughout the method.
378-380: LGTM on tiling metadata injection.Correctly checks if
segment_idxexists intiled_infoand adds tiling metadata to the segment. This integrates seamlessly with the existing metadata flow.dsperse/src/slice/onnx_slicer.py (2)
9-9: LGTM on autotiler import.The import correctly brings in the
autotile_slicesfunction from the new autotiler module.
514-521: LGTM on autotiling integration.The autotiling logic is correctly implemented:
- Initializes
tiled_infoas empty dict- Only calls
autotile_sliceswhentile_sizeis provided- Logs the number of tiled slices for visibility
- Passes
tiled_infoto metadata generationThe fallback for
slices_dirwhenoutput_pathis None is consistent with existing patterns in the codebase.
millioner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 🔥
But mostly a black magic for me. Would be nice to have some comments with explanations.
Also would be nice to use pathlib.Path consistently, but that's related to whole repo. Otherwise looks great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 103-106: The two methods run_inference and run_inference_multi
have inconsistent failure return values (run_inference returns (False, None)
while run_inference_multi returns (False, str(e))); make them consistent by
choosing one convention and applying it to both: either return (False, None) on
exceptions in run_inference_multi (replace return False, str(e) with return
False, None) or change run_inference to return (False, str(e)) by catching the
exception and returning its string; update the error logging to include the
exception variable (e) in both functions to preserve diagnostics and ensure
callers are updated to expect the chosen tuple shape.
🧹 Nitpick comments (5)
dsperse/src/backends/onnx_models.py (2)
54-56: Session options inconsistency withrun_inference.The multi-input path uses minimal session options while
run_inference(lines 23-29) sets additional flags likeenable_profiling=False,enable_mem_pattern=False, etc. This inconsistency could lead to different behavior between single and multi-input inference paths.♻️ Suggested fix: Align session options or extract to shared helper
@staticmethod def run_inference_multi(model_path: str, primary_input_file: str, extra_tensors: dict, output_file: str): """Run inference with multiple named inputs.""" try: session_options = ort.SessionOptions() session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL + session_options.enable_profiling = False + session_options.enable_mem_pattern = False + session_options.enable_cpu_mem_arena = False + session_options.enable_mem_reuse = False + session_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL session = ort.InferenceSession(model_path, session_options)
77-90: Silent data truncation and zero-padding may mask input mismatches.When
arr.shape != tuple(shape), the code truncates to fit (line 78). When there's insufficient data, it zero-pads (lines 87-89). Both operations silently alter input data without warning, which could hide upstream bugs or misconfigurations.Consider logging a warning when truncation or padding occurs to aid debugging.
♻️ Suggested improvement: Add logging for data adjustments
if arr.shape != tuple(shape): + logger.warning(f"Input '{name}' shape {arr.shape} reshaped to {shape}") arr = arr.flatten()[:np.prod(shape)].reshape(shape) input_dict[name] = arr else: needed = np.prod(shape) flat = primary_np.flatten() if flat.size >= needed: input_dict[name] = flat[:needed].reshape(shape) primary_np = flat[needed:].reshape(-1) else: + logger.warning(f"Input '{name}' zero-padded: had {flat.size}, needed {needed}") padded = np.zeros(needed, dtype=np.float32) padded[:flat.size] = flat input_dict[name] = padded.reshape(shape) primary_np = np.array([], dtype=np.float32)dsperse/src/run/runner.py (1)
349-354: Prefernext(iter(...))overlist(...)[0]for single element access.As flagged by static analysis, using
next(iter(result['output_tensors'].values()))is more efficient than constructing a full list just to access the first element.♻️ Suggested fix
if ok and result: if isinstance(result, dict) and 'output_tensors' in result: for oname, tensor in result['output_tensors'].items(): tensor_cache[oname] = tensor - first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None final_tensor = first_output if first_output is not None else result.get('logits')dsperse/src/slice/autotiler.py (2)
182-183: Hardcoded opset version 13 may cause compatibility issues.If the original model uses operators from newer opsets (14+), the tiled model may fail to load or produce incorrect results.
♻️ Suggested fix: Preserve original model's opset
graph = helper.make_graph(nodes, "tiled_slice", [X], [Y], initializers) - model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 13)]) - model.ir_version = 8 + # Preserve original opset versions + model = helper.make_model(graph, opset_imports=m.opset_import) + model.ir_version = m.ir_version
240-242: Unused variablescandwin unpacking.Per static analysis, prefix unused variables with underscore.
♻️ Suggested fix
- _, c, h, w = dims + _, _c, h, _w = dims if h <= tile_size: continue
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
dsperse/src/backends/onnx_models.pydsperse/src/cli/slice.pydsperse/src/run/runner.pydsperse/src/slice/autotiler.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
RunnerUtils(16-411)preprocess_input(183-219)process_final_output(223-239)save_to_file_flattened(272-287)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
353-353: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
364-364: Create your own exception
(TRY002)
364-364: Avoid specifying long messages outside the exception class
(TRY003)
404-404: Do not catch blind exception: Exception
(BLE001)
dsperse/src/backends/onnx_models.py
45-45: Consider moving this statement to an else block
(TRY300)
46-46: Do not catch blind exception: Exception
(BLE001)
103-103: Consider moving this statement to an else block
(TRY300)
104-104: Do not catch blind exception: Exception
(BLE001)
dsperse/src/slice/autotiler.py
240-240: Unpacked variable c is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
240-240: Unpacked variable w is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🔇 Additional comments (10)
dsperse/src/cli/slice.py (2)
40-41: LGTM! Clean CLI option for tile size.The option is well-defined with appropriate type constraint, sensible default, and clear help text explaining the purpose.
186-192: LGTM! Proper propagation of tile_size to the slicer API.The tile_size is correctly retrieved and passed through to the slicer, with appropriate logging for debugging.
dsperse/src/backends/onnx_models.py (1)
31-43: LGTM! Output tensor collection for downstream propagation.The addition of
output_tensorsdictionary enables proper tensor propagation through multi-input slice chains.dsperse/src/run/runner.py (3)
323-326: LGTM! Good tensor cache initialization for multi-input propagation.The tensor cache keyed by input name enables proper routing of intermediate outputs to downstream slices that need them.
367-368: Fallback toinput_tensorwhenfinal_tensoris None may be unexpected.If no slice successfully produced output, falling back to the original input for probability calculation could yield misleading results. Consider raising an error or logging a warning if no slice produced valid output.
387-410: LGTM! Clean multi-input slice execution method.The method properly handles path resolution, delegates to
OnnxModels.run_inference_multi, and returns a consistent exec_info structure.dsperse/src/slice/autotiler.py (4)
1-30: LGTM! Excellent documentation.The module docstring clearly explains the tiling strategy with a concrete example, making the algorithm easy to understand.
33-42: Edge case: Returns None whentarget >= spatial_dim.The condition
7 <= target < spatial_dimmeans iftarget >= spatial_dim, no tile size is returned. This is likely intentional (no tiling needed if target is larger than input), but could be surprising ifspatial_dimitself is a valid tile size (whentarget == spatial_dim).Is the behavior of returning
Nonewhentarget >= spatial_dimintentional? If tiling is still desired whentarget == spatial_dim, the condition should be7 <= target <= spatial_dim.
117-119: Assumes symmetric padding (top=bottom, left=right).The halo calculation uses only
pads[0]andpads[1](top and left padding). If the original Conv has asymmetric padding (e.g.,[1, 1, 2, 2]), this will compute incorrect tile boundaries.Consider whether asymmetric padding needs to be supported, or add a check to reject/warn on asymmetric padding:
if pads[0] != pads[2] or pads[1] != pads[3]: logger.warning("Asymmetric padding not supported for tiling") return None
203-260: LGTM! Well-structured batch tiling orchestration.The function properly scans directories, filters tileable slices, and persists tiling metadata. The early-exit guards prevent unnecessary processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 387-400: The _run_slice_multi_input function currently may
continue with an invalid onnx_path after all resolution attempts; update it to
fail fast: after the existing resolution attempts (checking slice_info["path"]
and slice_info["relative_path"]), if onnx_path is still missing or the resolved
path does not exist, either raise a clear exception (e.g., ValueError with
context) or return early so run_inference_multi is not called with an invalid
path; reference the onnx_path variable, slice_info and slice_dir in the error
message to aid debugging.
🧹 Nitpick comments (4)
dsperse/src/run/runner.py (3)
349-361: Suggest safer access pattern for first output.The logic for extracting the first output tensor could raise an
IndexErrorifoutput_tensorsis unexpectedly empty. Consider usingnext(iter(...), None)for safer access.♻️ Proposed refactor
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
364-364: Consider a custom exception for slice execution failures.Using a generic
Exceptionmakes it harder for callers to catch and handle specific failure modes. A custom exception class (e.g.,SliceExecutionError) would improve error handling granularity.
401-415: Improve exception handling specificity.Catching all exceptions without differentiation makes debugging harder and could mask unexpected errors. Consider catching specific exceptions (e.g.,
FileNotFoundError,RuntimeError) or at least logging the full traceback.♻️ Proposed refactor
start_time = time.time() try: success, result = OnnxModels.run_inference_multi( model_path=onnx_path, primary_input_file=primary_input_file, extra_tensors=extra_tensors, output_file=output_file ) except Exception as e: + logger.exception(f"Multi-input inference failed for {onnx_path}") success, result = False, str(e)dsperse/src/backends/onnx_models.py (1)
104-106: Consider more specific exception handling.Catching all exceptions broadly makes debugging harder. Consider catching specific exceptions that
session.runor helper methods might raise, or at least log the full traceback for operational visibility.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dsperse/src/backends/onnx_models.pydsperse/src/run/runner.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (2)
dsperse/src/run/utils/runner_utils.py (4)
RunnerUtils(16-411)preprocess_input(183-219)process_final_output(223-239)save_to_file_flattened(272-287)dsperse/src/run/runner.py (1)
run(46-88)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
353-353: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
364-364: Create your own exception
(TRY002)
364-364: Avoid specifying long messages outside the exception class
(TRY003)
409-409: Do not catch blind exception: Exception
(BLE001)
dsperse/src/backends/onnx_models.py
45-45: Consider moving this statement to an else block
(TRY300)
46-46: Do not catch blind exception: Exception
(BLE001)
103-103: Consider moving this statement to an else block
(TRY300)
104-104: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (6)
dsperse/src/run/runner.py (4)
323-326: LGTM: Tensor caching enables multi-input support.The tensor cache initialization correctly maps the model's primary input name to the input tensor, enabling downstream slices to reference intermediate outputs by name.
336-342: LGTM: Input source selection correctly handles cached and fresh inputs.The logic appropriately selects the input tensor from the cache when dependencies specify filtered_inputs, otherwise falling back to the model's primary input. The fallback chain (cache → input_tensor) ensures robustness.
343-348: LGTM: Clean separation of single-input and multi-input execution paths.The branching logic correctly identifies multi-input slices and prepares the extra_tensors dictionary while maintaining backward compatibility for single-input workflows.
370-374: No changes needed — shape handling is correct and covers all tensor formats from all backends.All backends (ONNX, EZKL, JSTprove) and fallback paths consistently return 2D tensors
[batch_size, num_classes]. The else clause (lines 372–373) correctly handles this by flattening and unsqueezing before applying softmax, which works for 1D, 2D, and higher-dimensional tensors. The 3D condition on line 370 is unreachable in practice but does not affect correctness.dsperse/src/backends/onnx_models.py (2)
36-43: LGTM: Output collection enables multi-output support.Collecting all model outputs into a named dictionary enables downstream slices to reference specific outputs, which is essential for multi-input scenarios. The backward compatibility is maintained by continuing to use the first output as the primary result.
50-66: LGTM: Multi-input inference setup is correct.The method properly initializes the ONNX session with optimizations disabled (matching single-input behavior) and prepares the primary input tensor for distribution across model inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
dsperse/src/slice/onnx_slicer.py (1)
524-531: Use explicitOptionalor union type fortile_sizeparameter.Per PEP 484,
int = NoneimpliesOptional[int]which should be explicit. This improves type clarity and satisfies static analysis.Suggested fix
- def slice_model(self, output_path=None, tile_size: int = None): + def slice_model(self, output_path=None, tile_size: int | None = None):dsperse/src/run/runner.py (1)
343-361: Good branching logic for multi-input slices.The conditional execution path for multi-input slices is well-structured. The tensor cache update logic correctly handles both multi-output and single-output cases.
One minor style improvement per static analysis: prefer
next(iter(...))overlist(...)[0]on line 353.Optional style fix for line 353
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dsperse/src/backends/onnx_models.pydsperse/src/run/runner.pydsperse/src/slice/onnx_slicer.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/slice/onnx_slicer.py (2)
dsperse/src/slice/autotiler.py (1)
autotile_slices(203-262)dsperse/src/analyzers/onnx_analyzer.py (1)
generate_slices_metadata(337-389)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
353-353: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
364-364: Create your own exception
(TRY002)
364-364: Avoid specifying long messages outside the exception class
(TRY003)
406-406: Do not catch blind exception: Exception
(BLE001)
dsperse/src/slice/onnx_slicer.py
524-524: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
dsperse/src/backends/onnx_models.py
45-45: Consider moving this statement to an else block
(TRY300)
46-46: Do not catch blind exception: Exception
(BLE001)
66-66: Abstract raise to an inner function
(TRY301)
66-66: Avoid specifying long messages outside the exception class
(TRY003)
75-75: Abstract raise to an inner function
(TRY301)
75-75: Avoid specifying long messages outside the exception class
(TRY003)
90-90: Consider moving this statement to an else block
(TRY300)
91-91: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (12)
dsperse/src/slice/onnx_slicer.py (6)
9-9: LGTM!Import of the new
autotile_slicesfunction aligns with the autotiler feature being added in this PR.
166-179: LGTM!The addition of
future_inputsparameter enables proper handling of cross-segment tensor dependencies. The default to an empty set is appropriate.
238-257: LGTM!The revised output determination logic correctly handles three cases: tensors not consumed internally, model outputs, and tensors needed by future segments. This ensures proper cross-segment data flow.
364-382: LGTM!The pre-computation of
segment_inputs_mapcorrectly handles segment boundary calculation with theif i > 0 else 0guard. This enables accurate identification of cross-segment dependencies before the main slicing loop.
405-413: LGTM!The computation of
future_inputsby aggregating inputs from all downstream segments is correct and properly passed to_get_segment_details.
538-545: LGTM!The autotiling integration is clean and backward-compatible. When
tile_sizeis not provided, the behavior remains unchanged. Thetiled_infois properly propagated to metadata generation.dsperse/src/backends/onnx_models.py (3)
36-43: LGTM!The addition of
output_tensorscollection enables proper tensor propagation for multi-output models, which is essential for the new multi-slice chaining feature.
46-48: Improved error handling.Returning
str(e)instead ofNoneon failure provides actionable debug information to callers. This is a good improvement.
79-93: LGTM!The execution and output collection follows the same pattern as
run_inference, maintaining consistency. Theoutput_tensorsmap enables downstream tensor propagation.dsperse/src/run/runner.py (3)
323-326: LGTM!The tensor cache initialization with the model's primary input enables cross-slice tensor propagation. The default fallback to "images" is reasonable for common image model workflows.
367-378: LGTM!The probability computation correctly handles various tensor shapes. The fallback to
input_tensorwhenfinal_tensoris None provides robustness. The 3D case handles batch=1 classification outputs, while the else branch correctly handles 2D and other formats.
387-412: LGTM!The
_run_slice_multi_inputmethod is well-structured with proper path validation and error handling. Theexec_infostructure is consistent with other slice execution methods, enabling uniform result processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 46-48: Replace the current logger.warning(...) in the except
Exception as e blocks that return (False, str(e)) with logger.exception(...) so
the traceback is preserved; also restore the previous error-return contract by
returning (False, None) (or, if you intentionally want to change the contract,
return a documented structured error object and update all callers accordingly);
apply this change to both except blocks that currently return (False, str(e)) so
callers continue to receive the original (False, None) behavior unless you
intentionally migrate them to the new structured error format and update call
sites.
- Around line 31-45: The code double-converts raw_output[0] and will crash if
the model has no outputs; update the ONNX run handling in OnnxModels (around
session.run / output_tensors) to first validate that raw_output (or
session.get_outputs()) is non-empty and return/raise a clear error if no outputs
exist, then derive output_tensor from the already-built output_tensors (e.g.,
take the first value from output_tensors) instead of reconverting raw_output[0],
and pass that tensor into RunnerUtils.process_final_output and result population
to avoid the redundant conversion.
🧹 Nitpick comments (1)
dsperse/src/backends/onnx_models.py (1)
54-57: AlignSessionOptionsbetween single- and multi-input inference (behavior/perf drift).
run_inferencedisables several ORT options, butrun_inference_multionly disables graph optimizations. If this is unintentional, reuse the same options builder in both paths.Proposed patch
try: session_options = ort.SessionOptions() session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL + session_options.enable_profiling = False + session_options.enable_mem_pattern = False + session_options.enable_cpu_mem_arena = False + session_options.enable_mem_reuse = False + session_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL session = ort.InferenceSession(model_path, session_options)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/backends/onnx_models.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
RunnerUtils(16-411)preprocess_input(183-219)process_final_output(223-239)save_to_file_flattened(272-287)
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py
45-45: Consider moving this statement to an else block
(TRY300)
46-46: Do not catch blind exception: Exception
(BLE001)
65-65: Abstract raise to an inner function
(TRY301)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
75-75: Abstract raise to an inner function
(TRY301)
75-75: Avoid specifying long messages outside the exception class
(TRY003)
77-77: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
79-79: Abstract raise to an inner function
(TRY301)
79-79: Avoid specifying long messages outside the exception class
(TRY003)
94-94: Consider moving this statement to an else block
(TRY300)
95-95: Do not catch blind exception: Exception
(BLE001)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 50-108: model_input.type is a string (e.g. "tensor(float)"), so
the code at run_inference_multi that reads
model_input.type.tensor_type.elem_type will raise AttributeError; instead parse
the type string to extract the element type (e.g. regex "tensor(NAME)") and map
NAME (float, double, int32, int64, uint8, int8, uint16, int16, bool, float16,
uint32, uint64) to the appropriate numpy dtype rather than using
tensor_type.elem_type; update the onnx_dtype_map lookup to accept these string
names and convert arr.astype accordingly; also make session_options in
run_inference_multi match the flags used in run_inference and apply_onnx_shape
(enable_profiling, enable_cpu_mem_arena, set
execution_mode/graph_optimization_level as in those functions) so behavior is
consistent.
- Around line 31-45: The fix: preserve each ONNX output's original dtype when
populating output_tensors instead of forcing dtype=torch.float32; in both
run_inference and run_inference_multi use a conversion that respects the numpy
dtype (e.g., torch.from_numpy or infer dtype from raw_output[i]) so int64/bool
outputs stay intact, and only coerce the primary logits/output used by
RunnerUtils.process_final_output to float32 before calling that helper; also
remove the redundant explicit exception interpolation in logger.exception calls
(call logger.exception(...) without f"...{e}").
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/backends/onnx_models.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
RunnerUtils(16-411)preprocess_input(183-219)process_final_output(223-239)save_to_file_flattened(272-287)
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py
45-45: Consider moving this statement to an else block
(TRY300)
47-47: Redundant exception object included in logging.exception call
(TRY401)
71-71: Abstract raise to an inner function
(TRY301)
71-71: Avoid specifying long messages outside the exception class
(TRY003)
85-85: Abstract raise to an inner function
(TRY301)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
87-87: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
89-89: Abstract raise to an inner function
(TRY301)
89-89: Avoid specifying long messages outside the exception class
(TRY003)
104-104: Consider moving this statement to an else block
(TRY300)
106-106: Redundant exception object included in logging.exception call
(TRY401)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @dsperse/src/backends/onnx_models.py:
- Around line 58-60: The session options here only set graph_optimization_level,
causing inconsistent behavior with run_inference; update the session_options
before creating the ort.InferenceSession to also set enable_profiling,
enable_mem_pattern, enable_cpu_mem_arena, enable_mem_reuse and execution_mode to
the same values used in run_inference so both inference paths use identical
ort.SessionOptions (keep graph_optimization_level as currently set).
🧹 Nitpick comments (3)
dsperse/src/backends/onnx_models.py (3)
36-52: Output extraction and error handling look correct.The logic to extract named outputs and map them to tensors is sound. Minor note:
logger.exception()automatically includes the exception traceback, so passingein the f-string is redundant.🔧 Optional cleanup
except Exception as e: - logger.exception(f"Error during inference: {e}") + logger.exception("Error during inference") return False, str(e)
91-93: Addstrict=Truetozip()for defensive validation.Although the rank is validated at line 88-89, adding
strict=Truewould make the intent explicit and catch any edge cases where dimensions differ unexpectedly.- for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape)): + for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape, strict=True)):
99-115: Consider extracting shared output processing logic.Lines 99-112 are nearly identical to lines 36-49 in
run_inference. Extracting this into a helper method would reduce duplication and ensure consistent behavior.♻️ Example helper extraction
@staticmethod def _process_outputs(session, raw_output, output_file): """Process model outputs and save results.""" model_outputs = session.get_outputs() if not model_outputs: raise ValueError("Model has no outputs") output_tensors = {} for i, out in enumerate(model_outputs): output_tensors[out.name] = torch.tensor(raw_output[i], dtype=torch.float32) first_output_name = model_outputs[0].name output_tensor = output_tensors[first_output_name] result = RunnerUtils.process_final_output(output_tensor) result['output_tensors'] = output_tensors RunnerUtils.save_to_file_flattened(result['logits'], output_file) return resultThen both methods can call:
result = OnnxModels._process_outputs(session, raw_output, output_file) return True, resultAlso, same minor note on line 114:
logger.exception()already includes the exception, so the f-string interpolation is redundant.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/backends/onnx_models.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (4)
RunnerUtils(16-411)preprocess_input(183-219)process_final_output(223-239)save_to_file_flattened(272-287)
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py
38-38: Abstract raise to an inner function
(TRY301)
38-38: Avoid specifying long messages outside the exception class
(TRY003)
49-49: Consider moving this statement to an else block
(TRY300)
51-51: Redundant exception object included in logging.exception call
(TRY401)
75-75: Abstract raise to an inner function
(TRY301)
75-75: Avoid specifying long messages outside the exception class
(TRY003)
89-89: Abstract raise to an inner function
(TRY301)
89-89: Avoid specifying long messages outside the exception class
(TRY003)
91-91: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
93-93: Abstract raise to an inner function
(TRY301)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
101-101: Abstract raise to an inner function
(TRY301)
101-101: Avoid specifying long messages outside the exception class
(TRY003)
112-112: Consider moving this statement to an else block
(TRY300)
114-114: Redundant exception object included in logging.exception call
(TRY401)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dsperse/src/backends/onnx_models.py (1)
75-105: Usestrict=Truein zip for safer dimension validation.The multi-input validation logic is thorough and well-structured. However, consider adding
strict=Trueto thezip()call at line 95 for more defensive programming, even though the rank check at lines 91-93 should already ensure matching lengths.♻️ Proposed fix
- for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape)): + for i, (got, expected) in enumerate(zip(arr.shape, model_input.shape, strict=True)):
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/backends/onnx_models.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/backends/onnx_models.py
35-35: Avoid specifying long messages outside the exception class
(TRY003)
85-85: Abstract raise to an inner function
(TRY301)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
93-93: Abstract raise to an inner function
(TRY301)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
95-95: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
97-97: Abstract raise to an inner function
(TRY301)
97-97: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (6)
dsperse/src/backends/onnx_models.py (6)
3-3: LGTM!The
reimport is necessary for the new_parse_onnx_typemethod.
30-46: LGTM!The output processing logic correctly converts raw ONNX outputs to structured tensors, derives the primary output from the first model output, and persists results appropriately.
48-59: LGTM!The refactoring to use helper methods improves code organization and maintainability. The consistent tuple return type
(bool, result/error)and proper exception logging are good practices.
121-121: LGTM!The update to use
_create_sessionmaintains consistency with the refactored session creation pattern.
18-28: The deterministic session configuration is correct and all specified SessionOptions (enable_profiling,enable_mem_pattern,enable_cpu_mem_arena,enable_mem_reuse,ORT_DISABLE_ALL, andORT_SEQUENTIAL) are supported in onnxruntime 1.21.0. The setup appropriately disables optimizations and uses sequential execution to ensure deterministic behavior for ZK proofs.
61-73: Verify complete ONNX type coverage is not necessary for this codebase.The type mapping in
_parse_onnx_typealready covers all types actually used in this project. The repository contains only vision inference models (age, doom, net, resnet, version) which standardly use float32, float16, and integer types—all present in the type map. Advanced types like bfloat16, string, and complex are not used anywhere in the codebase. The regex patterntensor\((\w+)\)correctly matches the standard ONNX type format with no detected variations. The fallback tonp.float32for unmapped types provides safe handling of edge cases and aligns with the codebase's universal float32 convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (9)
dsperse/src/run/runner.py (5)
331-341: Unused variableoutput_nameon line 340.The variable
output_nameis assigned but never used in_run_tiled_slice. This was flagged by static analysis.Suggested fix
input_name = tiling["input_name"] - output_name = tiling["output_name"] + # output_name available in tiling["output_name"] if needed input_tensor = tensor_cache.get(input_name)
365-396: Race condition: concurrent writes to sharedtensor_cachedict.
run_single_tileis executed in parallel threads, and each thread writes totensor_cache(lines 394-395) from within theas_completedloop. While the writes happen in the main thread afterfuture.result(), the closure capturestensor_cacheby reference. This is safe as written, but the variabletile_output_nameis unpacked but never used (line 390).Prefix unused variable with underscore
for future in as_completed(futures): - tile_idx, tile_output_name, ok, result = future.result() + tile_idx, _tile_output_name, ok, result = future.result() if not ok: raise Exception(f"Tile {tile_idx} failed: {result}")
387-387: Consider cappingmax_workersbased on system resources.Using
min(32, num_tiles)is reasonable, but for very large tile counts or resource-constrained environments, you may want to make this configurable or useos.cpu_count()as a factor.
356-357: Consider defining custom exception classes for tiling errors.Multiple places raise generic
Exceptionwith long messages. Per static analysis hints (TRY002, TRY003), consider creating domain-specific exceptions (e.g.,TiledSliceError) for better error handling downstream.Example custom exception
class TiledSliceError(Exception): """Raised when a tiled slice operation fails.""" pass # Usage: if not success: raise TiledSliceError(f"Split slice failed: {split_result}")Also applies to: 391-392, 413-414, 493-493
482-483: Usenext(iter(...))instead oflist(...)[0].Static analysis suggests using
next(iter(result['output_tensors'].values()))for efficiency when only the first element is needed.Suggested fix
for oname, tensor in result['output_tensors'].items(): tensor_cache[oname] = tensor - first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None final_tensor = first_output if first_output is not None else result.get('logits')dsperse/src/slice/autotiler.py (4)
166-168: Remove unused variableorig_output.Static analysis flagged that
orig_outputis assigned but never used.Suggested fix
m = onnx.load(slice_path) orig_input = m.graph.input[0] - orig_output = m.graph.output[0] orig_dims = [d.dim_value for d in orig_input.type.tensor_type.shape.dim]
336-429: Code duplication withcreate_tile_slice.
tile_conv_sliceshares significant logic withcreate_tile_slice(weight extraction, graph building, non-Conv op handling). Consider refactoring to share common logic if this legacy function needs long-term maintenance.
453-455: Square-only constraint may be limiting.The check
h <= tile_size or h != wrestricts tiling to square spatial dimensions only. This may be intentional for simplicity, but it excludes non-square inputs (e.g., 1920x1080). Consider documenting this limitation or making it configurable in the future.
560-563: Silent failure on malformed slice directory names.When
int(entry.name.split("_")[1])fails, the slice is silently skipped. This is fine for robustness, but consider logging at debug level for troubleshooting.Add debug logging
+import logging +logger = logging.getLogger(__name__) # In autotile_slices: try: i = int(entry.name.split("_")[1]) except (IndexError, ValueError): + logger.debug(f"Skipping malformed slice directory: {entry.name}") continue
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dsperse/src/run/runner.pydsperse/src/slice/autotiler.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/run/runner.py (2)
dsperse/src/utils/utils.py (3)
Utils(12-362)write_input(127-131)read_input(134-171)dsperse/src/backends/onnx_models.py (3)
OnnxModels(14-255)run_inference(49-59)run_inference_multi(76-105)
🪛 Ruff (0.14.10)
dsperse/src/slice/autotiler.py
168-168: Local variable orig_output is assigned to but never used
Remove assignment to unused variable orig_output
(F841)
dsperse/src/run/runner.py
340-340: Local variable output_name is assigned to but never used
Remove assignment to unused variable output_name
(F841)
357-357: Create your own exception
(TRY002)
357-357: Avoid specifying long messages outside the exception class
(TRY003)
390-390: Unpacked variable tile_output_name is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
392-392: Create your own exception
(TRY002)
392-392: Avoid specifying long messages outside the exception class
(TRY003)
414-414: Create your own exception
(TRY002)
414-414: Avoid specifying long messages outside the exception class
(TRY003)
482-482: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
493-493: Create your own exception
(TRY002)
493-493: Avoid specifying long messages outside the exception class
(TRY003)
535-535: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (17)
dsperse/src/run/runner.py (6)
9-9: LGTM!Import of
ThreadPoolExecutorandas_completedis appropriate for the parallel tile execution.
46-47: LGTM!Initialization of
tiled_infotoNonein__init__is correct.
322-329: LGTM!The
_load_tiled_infomethod correctly loads tiled metadata from the expected path and handles the case where the file doesn't exist.
439-441: LGTM!Properly initializing tensor cache with the model's input name and reading the input tensor.
472-476: LGTM!Good handling of multi-input slices by checking
filtered_inputslength and routing appropriately.
516-541: LGTM with minor note.The
_run_slice_multi_inputmethod correctly handles path resolution and error cases. The broadexcept Exception(line 535) is flagged by static analysis (BLE001), but given this is inference code where various exceptions can occur, it's acceptable to catch broadly here while logging the exception upstream inrun_inference_multi.dsperse/src/slice/autotiler.py (11)
1-27: LGTM!Clear docstring explaining the tiling architecture and good constant definition for elementwise ops.
30-35: Edge case:find_tile_sizemay returnNoneunexpectedly for valid inputs.When
target >= spatial_dim, the function returnsNone. This is correct, but the looprange(target, 7, -1)will also returnNoneif no divisor is found. Consider documenting that this is expected behavior when no suitable tile size exists.
38-44: LGTM!
is_tileablecorrectly checks for single-input models with Conv and only elementwise ops.
47-58: LGTM!
get_conv_paramsextracts Conv node attributes with sensible defaults.
61-150: LGTM!
create_split_slicecorrectly creates a Pad + Slice graph for extracting tiles with halo regions. The ONNX graph construction is well-structured.
177-180: LGTM - good guard against division by zero.Checking
sh == 0 or sw == 0before division is correct defensive programming.
191-192: LGTM!Correctly returns
Nonewhen weights are not found in initializers.
267-333: LGTM!
create_concat_slicecorrectly builds a hierarchical Concat (rows first, then columns) to reassemble tiles.
432-543: LGTM!
autotile_sliceorchestrates the split/tile/concat creation correctly and returns comprehensive metadata.
546-579: LGTM!
autotile_slicescorrectly scans directories, skips already-tiled slices, and saves aggregated metadata. The use ofsorted()ensures deterministic ordering.
224-244: Logic for handling inter-dependent non-Conv ops is sound, but naming collision edge case exists.When copying non-Conv ops (lines 230–242), input remapping preserves intermediate tensor names from the original graph, enabling correct dependencies between chained ops. For example, if the graph contains
Conv → Relu → Add, the Relu's output name is preserved and the Add can reference it through theelseclause on line 238.However, there is a potential edge case: if the original model contains tensors named
"conv_out"or matching the patternf"tile_{slice_idx}_{tile_idx}_*"(used for newly generated input/output names), a naming collision could occur and break the graph structure. While unlikely in practice (models typically use descriptive names), this should be documented or guarded against.Initializer handling at lines 226–228 correctly includes all non-Conv initializers, which are referenced by name in the remapped inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 374-376: The code calls tensor_cache.get(tile_input_name) and
passes the result to Utils.write_input without checking for None; add a
validation after retrieving tile_tensor (from tensor_cache.get(tile_input_name))
to detect missing tensors and raise or log a clear error referencing
tile_input_name and tile_in (or otherwise handle the missing case) before
calling Utils.write_input so downstream failures are avoided and the error
message clearly identifies the missing tile input.
- Line 405: The line building concat_tensors uses direct dict access and can
raise a raw KeyError if a name in concat_info["input_names"] is missing from
tensor_cache; before constructing concat_tensors (or while doing so), validate
that every name in concat_info["input_names"] exists in tensor_cache and if any
are missing raise a clear exception or log an informative error that includes
the missing names and context (e.g., which concat operation or concat_info id),
then build concat_tensors from the validated names; reference the symbols
concat_tensors, concat_info["input_names"], and tensor_cache when implementing
the check.
- Around line 339-341: The code uses input_tensor = tensor_cache.get(input_name)
without validating it; update the runner so after retrieving input_tensor (from
tensor_cache.get(input_name)) you check for None and handle it explicitly (e.g.,
raise a clear exception or log an error and abort) before calling
Utils.write_input; reference the tiling dict (input_name/output_name) and ensure
any error message includes the missing input_name and context (tiling) to make
debugging straightforward.
- Around line 498-499: The current fallback that sets final_tensor =
input_tensor when final_tensor is None can mask failures; instead, detect
whether the inference chain or slice-processing loop actually ran (e.g., check
the slice iteration count or chain length) and if it did but final_tensor is
still None, either log a warning with context (including any error/exception
info) or raise an explicit RuntimeError; only allow the silent fallback to
input_tensor when the chain was intentionally empty. Update the block around
final_tensor/input_tensor to perform this check and use the module logger or
raise the error so silent masking is avoided.
🧹 Nitpick comments (5)
dsperse/src/run/runner.py (5)
322-329: Consider handling malformed JSON gracefully.If
tiled_info.jsonexists but contains invalid JSON,json.load()will raise an exception that propagates uncaught. For robustness, consider wrapping in a try-except to log a warning and fall back toNone.💡 Suggested improvement
def _load_tiled_info(self): tiled_info_path = self.slices_path / "tiled" / "tiled_info.json" if tiled_info_path.exists(): - with open(tiled_info_path, 'r') as f: - self.tiled_info = json.load(f) - logger.info(f"Loaded tiled info for slices: {self.tiled_info.get('tiled_indices', [])}") + try: + with open(tiled_info_path, 'r') as f: + self.tiled_info = json.load(f) + logger.info(f"Loaded tiled info for slices: {self.tiled_info.get('tiled_indices', [])}") + except json.JSONDecodeError as e: + logger.warning(f"Failed to parse tiled_info.json: {e}") + self.tiled_info = None else: self.tiled_info = None
388-396: Prefix unused variable with underscore.The
tile_output_nameis unpacked but never used, as flagged by static analysis (RUF059).♻️ Proposed fix
for future in as_completed(futures): - tile_idx, tile_output_name, ok, result = future.result() + tile_idx, _tile_output_name, ok, result = future.result() if not ok: raise Exception(f"Tile {tile_idx} failed: {result}")
484-485: Prefernext(iter(...))over list slice.Static analysis (RUF015) suggests using
next(iter(result['output_tensors'].values()))for better efficiency.♻️ Proposed fix
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
537-538: Broad exception catch is acceptable here but could be narrowed.Static analysis flags the blind
Exceptioncatch (BLE001). While acceptable for wrapping external ONNX calls, consider catching more specific exceptions (e.g.,RuntimeError,onnxruntime.OrtException) if the failure modes are known.
387-396: Consider adding a timeout to parallel tile execution.The
ThreadPoolExecutorruns tiles without a timeout. If a tile hangs indefinitely (e.g., due to model issues), the entire run will block forever. For production robustness, consider adding a timeout tofuture.result().💡 Suggested improvement
+TILE_TIMEOUT_SECONDS = 300 # 5 minutes per tile + with ThreadPoolExecutor(max_workers=min(32, num_tiles)) as executor: futures = {executor.submit(run_single_tile, ti): ti for ti in tile_infos} for future in as_completed(futures): - tile_idx, _tile_output_name, ok, result = future.result() + try: + tile_idx, _tile_output_name, ok, result = future.result(timeout=TILE_TIMEOUT_SECONDS) + except TimeoutError: + ti = futures[future] + raise Exception(f"Tile {ti['tile_idx']} timed out after {TILE_TIMEOUT_SECONDS}s")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/run/runner.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
340-340: Local variable output_name is assigned to but never used
Remove assignment to unused variable output_name
(F841)
357-357: Create your own exception
(TRY002)
357-357: Avoid specifying long messages outside the exception class
(TRY003)
390-390: Unpacked variable tile_output_name is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
392-392: Create your own exception
(TRY002)
392-392: Avoid specifying long messages outside the exception class
(TRY003)
414-414: Create your own exception
(TRY002)
414-414: Avoid specifying long messages outside the exception class
(TRY003)
457-457: Avoid specifying long messages outside the exception class
(TRY003)
484-484: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
495-495: Create your own exception
(TRY002)
495-495: Avoid specifying long messages outside the exception class
(TRY003)
537-537: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (1)
dsperse/src/run/runner.py (1)
46-47: LGTM!Clean initialization of
tiled_infoattribute for later population.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 322-330: The _load_tiled_info method should handle malformed JSON:
wrap the json.load call for tiled_info_path in a try/except that catches
json.JSONDecodeError (and optionally ValueError), log a warning via logger
including the file path (tiled_info_path) and the exception message, and set
self.tiled_info = None instead of letting the exception propagate; keep the
existing exists() check and the info log only when parsing succeeds.
- Around line 389-401: The tile worker loop writes to the shared tensor_cache
from multiple threads causing a race condition; wrap all modifications to
tensor_cache (the block that iterates result['output_tensors'] in the loop that
handles future.result() from run_single_tile) with a threading.Lock (create a
lock near where ThreadPoolExecutor is created and acquire/release around the
write) to ensure atomic updates, and rename the unused tile_output_name variable
to _tile_output_name to satisfy static analysis; leave tile_results assignment
as-is but do it while holding the lock if it depends on tensor_cache
consistency.
🧹 Nitpick comments (2)
dsperse/src/run/runner.py (2)
339-344: Remove or use the unusedoutput_namevariable.
output_nameis assigned on line 340 but never used, as flagged by static analysis.♻️ Suggested fix
input_name = tiling["input_name"] - output_name = tiling["output_name"] input_tensor = tensor_cache.get(input_name)
491-492: Prefernext(iter(...))over single-element slice.Per static analysis (RUF015), using
next(iter(...))is more efficient than creating a list for a single element.♻️ Suggested fix
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/run/runner.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
340-340: Local variable output_name is assigned to but never used
Remove assignment to unused variable output_name
(F841)
343-343: Avoid specifying long messages outside the exception class
(TRY003)
359-359: Create your own exception
(TRY002)
359-359: Avoid specifying long messages outside the exception class
(TRY003)
379-379: Avoid specifying long messages outside the exception class
(TRY003)
394-394: Unpacked variable tile_output_name is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
396-396: Create your own exception
(TRY002)
396-396: Avoid specifying long messages outside the exception class
(TRY003)
411-411: Avoid specifying long messages outside the exception class
(TRY003)
421-421: Create your own exception
(TRY002)
421-421: Avoid specifying long messages outside the exception class
(TRY003)
464-464: Avoid specifying long messages outside the exception class
(TRY003)
491-491: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
502-502: Create your own exception
(TRY002)
502-502: Avoid specifying long messages outside the exception class
(TRY003)
546-546: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (5)
dsperse/src/run/runner.py (5)
9-9: LGTM!The
ThreadPoolExecutorimport andtiled_infoattribute initialization are appropriate for the new parallel tiled execution feature.Also applies to: 46-47
405-438: LGTM!The concat stage logic is well-structured with proper input validation and timing metrics. The error handling follows the pattern established elsewhere in this method.
446-451: LGTM!The tensor cache initialization and tiled indices extraction properly handle the case when
tiled_infoisNone.
505-518: LGTM!The final tensor handling provides a reasonable fallback when no output tensor is produced, and the softmax logic handles the expected output dimensions.
527-552: LGTM!The multi-input slice execution follows the established error handling pattern used in
_run_ezkl_sliceand_run_jstprove_slice, where broad exception catching ensures graceful degradation with structured error info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @dsperse/src/run/runner.py:
- Around line 491-493: The current construction of extra_tensors = {name:
tensor_cache.get(name) for name in filtered_inputs} can introduce None values if
tensor_cache lacks a key, which will then be forwarded into
_run_slice_multi_input / run_inference_multi and trigger confusing ONNX errors;
update the call-site so you first detect missing = [n for n in filtered_inputs
if n not in tensor_cache] and either raise/log a clear error (including the
missing names) or omit missing entries and fail-fast, then build extra_tensors
only from present keys (e.g., {n: tensor_cache[n] for n in filtered_inputs if n
in tensor_cache}) before calling _run_slice_multi_input to avoid passing None
values.
🧹 Nitpick comments (2)
dsperse/src/run/runner.py (2)
344-348: Remove unused variableoutput_name.The variable
output_nameon line 345 is assigned but never used. Based on static analysis hint.Suggested fix
input_name = tiling["input_name"] - output_name = tiling["output_name"] input_tensor = tensor_cache.get(input_name)
500-502: Minor: prefernext(iter(...))over list slice.Per static analysis, using
next(iter(...))is more efficient than creating a list just to access the first element.Suggested fix
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/run/runner.py
🧰 Additional context used
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
345-345: Local variable output_name is assigned to but never used
Remove assignment to unused variable output_name
(F841)
348-348: Avoid specifying long messages outside the exception class
(TRY003)
364-364: Create your own exception
(TRY002)
364-364: Avoid specifying long messages outside the exception class
(TRY003)
384-384: Avoid specifying long messages outside the exception class
(TRY003)
402-402: Create your own exception
(TRY002)
402-402: Avoid specifying long messages outside the exception class
(TRY003)
418-418: Avoid specifying long messages outside the exception class
(TRY003)
428-428: Create your own exception
(TRY002)
428-428: Avoid specifying long messages outside the exception class
(TRY003)
474-474: Avoid specifying long messages outside the exception class
(TRY003)
501-501: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
512-512: Create your own exception
(TRY002)
512-512: Avoid specifying long messages outside the exception class
(TRY003)
556-556: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (5)
dsperse/src/run/runner.py (5)
8-10: LGTM!The threading and concurrent.futures imports are appropriate for the parallel tile execution feature.
47-48: LGTM!Proper initialization of the
tiled_infoattribute.
323-334: LGTM!Good defensive handling with graceful fallback on JSON parse errors.
372-407: Thread-safe cache access pattern looks correct.The implementation correctly:
- Populates
tensor_cachewith split outputs before spawning tile threads- Tile workers only read from cache (via
.get())- Uses
cache_lockfor concurrent writes totensor_cacheThe
min(32, num_tiles)worker limit is a reasonable cap. Consider making this configurable if needed for different hardware profiles.
537-562: LGTM with note on exception handling.The method follows the same pattern as other slice runners. Catching broad
Exception(line 556) is pragmatic here since ONNX runtime can raise various exception types. The error is properly captured and returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
dsperse/src/run/runner.py (6)
323-334: Consider catching broader file I/O errors.The method catches
json.JSONDecodeErrorbut other file errors (e.g.,PermissionError,IOError) would propagate and potentially crash the runner. Since tiling is optional, these should also gracefully degrade.Suggested improvement
if tiled_info_path.exists(): try: with open(tiled_info_path, 'r') as f: self.tiled_info = json.load(f) logger.info(f"Loaded tiled info for slices: {self.tiled_info.get('tiled_indices', [])}") - except json.JSONDecodeError as e: - logger.warning(f"Failed to parse tiled_info.json: {e}. Tiling disabled.") + except (json.JSONDecodeError, OSError) as e: + logger.warning(f"Failed to load tiled_info.json: {e}. Tiling disabled.") self.tiled_info = None
344-346: Remove unused variableoutput_name.The variable
output_nameis assigned but never used, as flagged by static analysis.Suggested fix
input_name = tiling["input_name"] - output_name = tiling["output_name"] input_tensor = tensor_cache.get(input_name)
394-407: Thread-safety pattern looks correct but is fragile.The current concurrency pattern works because:
- Split writes all tile inputs before
ThreadPoolExecutorstarts- Each tile reads only its pre-populated input (no concurrent writes to those keys)
- Tile outputs are written with
cache_lockprotection- Concat runs after all tiles complete
However, the read on line 382 (
tensor_cache.get(tile_input_name)) occurs without the lock. While safe given the current execution order, this implicit dependency could break if the code evolves. Consider adding a comment documenting this invariant.
504-505: Usenext(iter(...))for single element access.Per static analysis, prefer
next(iter(...))over creating a list for single-element access.Suggested fix
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
364-364: Consider custom exception types for tiled execution errors.Multiple places raise generic
Exceptionwith descriptive messages. Creating specific exception classes (e.g.,TiledExecutionError,SplitError,ConcatError) would improve error handling and allow callers to catch specific failure modes.Example custom exception
class TiledExecutionError(Exception): """Raised when tiled slice execution fails.""" def __init__(self, slice_idx: int, stage: str, message: str): self.slice_idx = slice_idx self.stage = stage # 'split', 'tile', 'concat' super().__init__(f"Tiled execution failed at {stage} for slice {slice_idx}: {message}")Also applies to: 402-402, 428-428
472-480: Edge case: emptyoutput_namesleavesfinal_tensorunchanged.If
output_namesis empty (line 478),final_tensorwon't be updated from the tiled execution. While the metadata should always include outputs, this could cause unexpected behavior if metadata is malformed. The non-tiled path (lines 500-511) has more robust handling.Suggested defensive check
if ok and output_names: final_tensor = tensor_cache.get(output_names[0]) + elif ok: + logger.warning(f"Tiled slice {slice_idx} has no output_names in metadata")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/run/runner.py
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/run/runner.py (4)
dsperse/src/utils/utils.py (3)
Utils(12-362)write_input(127-131)read_input(134-171)dsperse/src/backends/onnx_models.py (3)
OnnxModels(14-255)run_inference(49-59)run_inference_multi(76-105)dsperse/src/backends/pth_models.py (1)
run_inference(37-75)dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-411)
🪛 Ruff (0.14.10)
dsperse/src/run/runner.py
345-345: Local variable output_name is assigned to but never used
Remove assignment to unused variable output_name
(F841)
348-348: Avoid specifying long messages outside the exception class
(TRY003)
364-364: Create your own exception
(TRY002)
364-364: Avoid specifying long messages outside the exception class
(TRY003)
384-384: Avoid specifying long messages outside the exception class
(TRY003)
402-402: Create your own exception
(TRY002)
402-402: Avoid specifying long messages outside the exception class
(TRY003)
418-418: Avoid specifying long messages outside the exception class
(TRY003)
428-428: Create your own exception
(TRY002)
428-428: Avoid specifying long messages outside the exception class
(TRY003)
474-474: Avoid specifying long messages outside the exception class
(TRY003)
494-494: Avoid specifying long messages outside the exception class
(TRY003)
504-504: Prefer next(iter(result['output_tensors'].values())) over single element slice
Replace with next(iter(result['output_tensors'].values()))
(RUF015)
515-515: Create your own exception
(TRY002)
515-515: Avoid specifying long messages outside the exception class
(TRY003)
559-559: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (3)
dsperse/src/run/runner.py (3)
8-10: LGTM!The new imports for threading and concurrent futures are appropriate for the parallel tile execution pattern, and the instance variable initialization is correctly placed.
Also applies to: 47-48
518-521: Verify fallback behavior: using input tensor as output may mask errors.When no output tensor is produced, the code falls back to using the input tensor and only logs a warning. This could silently produce incorrect predictions if the execution chain fails to propagate outputs correctly. Consider whether this should raise an error instead, or at least include this in the results metadata so callers can detect it.
540-565: LGTM!The multi-input slice runner follows the established pattern in this file. The broad exception catch (line 559) is appropriate here for inference robustness, returning structured error info rather than crashing the pipeline.
Removed the `metadata` module and replaced its references with direct imports from `analyzers.schema`. Enhanced ONNX slicing by introducing shape inference, slicing point refinements, and runtime tracing utilities to improve slicing artifact accuracy and workflow efficiency.
…nce slice point refinement, and update tests for channel splitting and tiling workflows
Optimized channel splitting logic by removing redundant spatial tiling checks and improving constraint handling. Enhanced ONNX slicing with temporary model tracing, cleanup utilities, and refined error handling during slicing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dsperse/src/prover.py (1)
245-281: Add guard for EZKL runner availability.At line 269,
self.ezkl_runnercan beNoneif the EZKL CLI failed to initialize (set at line 152). Calling.prove()without checking will raiseAttributeError. The JSTprove path already has this guard at line 258; apply the same pattern here.Proposed fix
+ if self.ezkl_runner is None: + return False, ExecutionMethod.EZKL_PROVE, "EZKL CLI not available" if not pk_path or not os.path.exists(pk_path): return False, ExecutionMethod.EZKL_PROVE, f"Proving key not found at {pk_path}"
🤖 Fix all issues with AI agents
In `@dsperse/src/analyzers/onnx_analyzer.py`:
- Around line 543-571: The _get_tensor_shape function is incorrectly appending
graph.initializer dims into TensorShape.input (in the loop over
graph.initializer); remove that behavior so initializers are not added to
inputs, and instead, if you need to represent weights, collect initializer
shapes into a separate WeightShape structure or list (e.g., weight_shapes) and
return/use it where expected; update references to TensorShape(input=...,
output=...) so only graph.input shapes and graph.output shapes are included and
initializers are not mixed into inputs.
In `@dsperse/src/run/utils/runner_utils.py`:
- Around line 180-188: The current availability checks (has_jst / has_ezkl) only
consider meta.jstprove_circuit_path and meta.ezkl_circuit_path but the preparers
(_prepare_jstprove_meta and _prepare_ezkl_meta) accept meta.circuit_path as a
fallback; update the gating logic so backends are marked available when either
their specific circuit path OR meta.circuit_path is set and the corresponding
runner exists (e.g., include meta.circuit_path in the has_jst and has_ezkl
predicates alongside runner.jstprove_runner / runner.ezkl_runner), then preserve
adding Backend.JSTPROVE, Backend.EZKL, and Backend.ONNX to available as before.
In `@dsperse/src/slice/onnx_slicer.py`:
- Around line 33-41: The cleanup currently deletes the temporary traced file but
leaves self.onnx_path pointing to that deleted path; update the cleanup logic in
OnnxSlicer (e.g., in slice_model's teardown or a dedicated cleanup/reset method)
to restore the original path by saving the original value before replacing it
(store e.g., self._original_onnx_path when you set self._traced_model_path) and
then after removing the traced file set self.onnx_path =
self._original_onnx_path and clear/reset any traced analysis state
(self.traced_shapes, self.traced_dtypes, self._traced_model_path, and
self._original_onnx_path), or alternatively avoid deleting the traced file and
keep it for the instance lifetime—pick one consistent approach and implement it
in the cleanup code that currently removes self._traced_model_path so subsequent
calls to slice_model can reuse the object without file-not-found errors.
In `@dsperse/src/slice/utils/onnx_utils.py`:
- Around line 213-230: The isolate_conv function currently returns an unordered
set (updated_points) which leads to nondeterministic slice ordering; change it
to return a deterministically ordered collection by converting updated_points to
a sorted list of integers (e.g., sorted(updated_points)) before returning so
downstream slicing receives stable, ascending slice indices; ensure the return
type matches callers expecting a list and that the sorting is numeric (use
key=int if needed) while keeping the logic around nodes_dict, max_idx and index
additions intact.
- Around line 66-75: Move the onnxruntime.tools import into the try block inside
apply_symbolic_shape_inference so ImportError is caught; explicitly catch
ImportError and call logger.warning with a clear message that the optional ORT
shape inference tool is unavailable, and for other exceptions use
logger.exception to preserve stack traces when
symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model) fails; keep
returning the original onnx_model on failure.
♻️ Duplicate comments (25)
dsperse/src/analyzers/runner_analyzer.py (1)
66-73: Per-slice routing heuristic is still too broad.The heuristic (
slice_metadatapresent andlayersabsent) can still misroute model-level entries that includeslice_metadata. Consider a stricter discriminator (e.g., require per-slice-only keys and exclude model-level keys).dsperse/src/utils/utils.py (1)
18-26: Guard against unconditional IR/opset overrides.This still force-writes IR 9 + opset versions. If a model targets higher IR/opset, this silently downgrades and can break compatibility. Consider a
force_compat=Falseflag and only upgrade versions unless explicitly forced.🐛 Suggested adjustment
- def save_onnx_model(model: onnx.ModelProto, path: str | Path, opset_version: int = 18): + def save_onnx_model(model: onnx.ModelProto, path: str | Path, opset_version: int = 18, force_compat: bool = False): """Save ONNX model with compatible IR version (9) and specified opset.""" - model.ir_version = 9 + if force_compat: + model.ir_version = 9 if model.opset_import: for opset in model.opset_import: if opset.domain == "" or opset.domain == "ai.onnx": - opset.version = opset_version + if force_compat or opset.version < opset_version: + opset.version = opset_versionONNX IR version 9 opset 18 compatibility matrix and guidance on upgrading/downgrading opset_importdsperse/src/verifier.py (2)
431-433: Split one‑line conditional returns (lint).
Line 431–433 still uses single-lineifstatements, which triggers lint failures in common setups.🛠️ Proposed fix
- if detected == "dslice": return self.verify_dslice(run_path, model_path, backend=backend) - if detected == "dsperse": return self.verify_dsperse(run_path, model_path, backend=backend) - if detected == "dirs": return self.verify_dirs(run_path, model_path, backend=backend) + if detected == "dslice": + return self.verify_dslice(run_path, model_path, backend=backend) + if detected == "dsperse": + return self.verify_dsperse(run_path, model_path, backend=backend) + if detected == "dirs": + return self.verify_dirs(run_path, model_path, backend=backend)
490-506: Guard missing backend runners in single‑slice verification.
Lines 490–506 callself.jstprove_runner.verify/self.ezkl_runner.verifywithout checking forNone, which can raiseAttributeErrorwhen the CLI isn’t available.🛠️ Proposed fix
if preferred == Backend.JSTPROVE: + if self.jstprove_runner is None: + error_msg = "JSTprove CLI not available" + else: circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path) input_path, output_path = Path(run_path) / "input.json", Path(run_path) / "output.json" wf = self._get_witness_file_from_run(run_path, slice_id) witness_path = Path(wf) if wf else (Path(run_path) / "output_witness.bin") try: success = self.jstprove_runner.verify(str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path)) except Exception as e: error_msg = str(e) else: + if self.ezkl_runner is None: + error_msg = "EZKL CLI not available" + else: vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path) settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path) try: success = self.ezkl_runner.verify(str(proof_path), settings_path=settings_path, vk_path=vk_path) except Exception as e: error_msg = str(e)dsperse/src/prover.py (1)
19-52: Honoroutput_pathfor tiled proofs in worker path.
Line 51 hardcodesrun_path/.../proof.json, ignoringoutput_path. This breaks CLI output redirection when using parallel workers.🛠️ Proposed fix
- (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, run_path, slice_dir) = args + (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, output_path, run_path, slice_dir) = args @@ - tile_proof_path = Path(run_path) / slice_id / tile_name / "proof.json" - os.makedirs(tile_proof_path.parent, exist_ok=True) + proof_rel = os.path.join(slice_id, tile_name) + tile_proof_path = Utils.proof_output_path(Path(run_path), proof_rel, output_path) + os.makedirs(tile_proof_path.parent, exist_ok=True) @@ - work_items.append((slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling.to_dict() if tiling else None, str(run_path), str(slice_dir))) + work_items.append((slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling.to_dict() if tiling else None, output_path, str(run_path), str(slice_dir)))Also applies to: 423-423
dsperse/src/compile/compiler.py (3)
135-139: Usepayload/for non‑channel‑split output dirs.
Lines 135–139 write outputs toslice_dir/<backend>, which diverges from the payload layout used elsewhere (and expected by tests/consumers).🛠️ Proposed fix
- if tiling_info: - output_dir = os.path.join(slice_dir, be, "tiles") - else: - output_dir = os.path.join(slice_dir, be) + if tiling_info: + output_dir = os.path.join(slice_dir, "payload", be, "tiles") + else: + output_dir = os.path.join(slice_dir, "payload", be)
629-650: Don’t silently skip tiled slices when tile path is missing.
Lines 629–650 currentlycontinue, which drops the slice from compilation and metadata updates. This should be a hard failure (or explicit fallback record) so the error isn’t masked.🛠️ Proposed fix (fail fast)
- logger.warning(f"Slice {idx}: Tiled but tile path not found at {tile_path}, skipping") - print(f"[compile] Slice {idx}: tile.onnx not found at {tile_path}, skipping tiled slice") - continue + msg = f"Slice {idx}: tiled but tile path not found at {tile_path}" + logger.error(msg) + raise FileNotFoundError(msg) @@ - logger.warning(f"Slice {idx}: Tiled but tile path missing in metadata, skipping") - print(f"[compile] Slice {idx}: tiled but tile path missing, skipping") - continue + msg = f"Slice {idx}: tiled but tile path missing in metadata" + logger.error(msg) + raise FileNotFoundError(msg)
708-720:compiled_countshould increment once per slice, not per backend.
Line 720 increments inside the per‑backend loop, so it over‑counts when multiple backends compile a single slice.🛠️ Proposed fix
- for be, comp_block in compilation_blocks.items(): + slice_compiled = False + for be, comp_block in compilation_blocks.items(): ... - compiled_count += 1 + slice_compiled = True + + if slice_compiled: + compiled_count += 1dsperse/src/run/utils/runner_utils.py (2)
120-120: Use explicit Optional inrun_onnx_slicesignature.
Line 120 usesslice_dir: Path = None, which violates PEP 484. PreferPath | None.🛠️ Proposed fix
- def run_onnx_slice(meta: RunSliceMetadata, input_tensor_path, output_tensor_path, slice_dir: Path = None): + def run_onnx_slice(meta: RunSliceMetadata, input_tensor_path, output_tensor_path, slice_dir: Path | None = None):
404-418: Avoid directtensor["output"]access.
Line 405 raisesKeyErrorif the result dict lacks"output". Useextract_output_tensor(already added) or guard explicitly.🛠️ Proposed fix
- output = tensor["output"] + output = RunnerUtils.extract_output_tensor(tensor) + if output is None: + raise ValueError("Missing output tensor in result")dsperse/src/compile/utils/compiler_utils.py (3)
271-279: Drop unusedsuccessparameter.It’s unused and the compiled state already lives in
compilation_info. Removing it reduces API noise.♻️ Suggested fix
- def update_slice_metadata(idx: int, filepath: str | Path, success: bool, compilation_info: Dict[str, Any], backend_name: str = Backend.EZKL): + def update_slice_metadata(idx: int, filepath: str | Path, compilation_info: Dict[str, Any], backend_name: str = Backend.EZKL):
136-149: JSTprove success detection still keys offcircuit.JSTprove blocks use
compiled, so this path can misclassify success. Please switch the detection tocompiled(and keep the no‑vk/pk check).🔧 Suggested fix
- if compilation_data.get('circuit') and not compilation_data.get('vk_key'): + if compilation_data.get('compiled') and not compilation_data.get('vk_key'): return _ok('compiled') and _ok('settings')
470-477: Tiled compilation block still only emitstile_0.When
num_tiles > 1,tile_countdoesn’t match thefilesmap. Either emit one entry per tile or settile_countto 1.🔧 Suggested fix (populate all tile entries)
if tiling_info: num_tiles = tiling_info.get("num_tiles", 1) - block.update({ + tiles = {f"tile_{i}": pref_files for i in range(num_tiles)} + block.update({ "tiled": True, "tile_size": tiling_info.get("tile_size"), "tile_count": num_tiles, - "files": {"tile_0": pref_files} + "files": tiles })dsperse/src/analyzers/schema.py (2)
107-115: Guard against shortconv_outlists.
conv_out[1]will throw if the list has fewer than 2 entries. Add a defensive pad/default.🔧 Suggested fix
if not d: return None conv_out = d.get("conv_out", [0, 0]) + if isinstance(conv_out, list) and len(conv_out) < 2: + conv_out = [0, 0] return cls( path=d.get("path", ""), conv_out=(conv_out[0], conv_out[1]) if isinstance(conv_out, list) else conv_out, )
243-265: Defensive checks forhalo/out_tile/stridelist lengths.Short lists will raise
IndexErrorduring tuple construction. Pad/normalize first.🔧 Suggested fix
halo = d.get("halo", [0, 0]) out_tile = d.get("out_tile", [0, 0]) stride = d.get("stride", [1, 1]) + if isinstance(halo, list) and len(halo) < 2: + halo = [0, 0] + if isinstance(out_tile, list) and len(out_tile) < 2: + out_tile = [0, 0] + if isinstance(stride, list) and len(stride) < 2: + stride = [1, 1]dsperse/src/slice/autotiler.py (3)
63-93: Tile size search still ignores asymmetric stride.
find_optimal_tile_sizeonly checks one stride, which can mismatch whensh != sw. Please enforce divisibility against both strides and pass a(sh, sw)tuple.🔧 Suggested fix
-def find_optimal_tile_size(spatial_dim: int, target: int, min_tile: int = 7, stride: int = 1) -> int | None: +def find_optimal_tile_size( + spatial_dim: int, + target: int, + min_tile: int = 7, + stride: tuple[int, int] = (1, 1), +) -> int | None: """Find the largest divisor of spatial_dim that is <= target and >= min_tile.""" + sh, sw = stride if min_tile <= target < spatial_dim: for tile in range(target, min_tile - 1, -1): - if spatial_dim % tile == 0 and tile % stride == 0: + if spatial_dim % tile == 0 and tile % sh == 0 and tile % sw == 0: return tile return None- actual_spatial_tile_size = Autotiler.find_optimal_tile_size(spatial_h, target_tile, min_tile, stride=stride) + actual_spatial_tile_size = Autotiler.find_optimal_tile_size(spatial_h, target_tile, min_tile, stride=(sh, sw))
180-187: Channel splitting still allows elementwise ops.Channel‑split paths sum group outputs, which breaks semantics for non‑linear ops. Restrict to pure‑Conv slices or apply elementwise ops after summing.
🔧 Minimal guard
if conv_params['group'] != 1: return False + ops = {n.op_type for n in model.graph.node} + if ops != {"Conv"}: + return False return True
566-595: Don’t assume the last non‑Conv op produces the output.Mapping
tile_outto the last elementwise node breaks branching graphs or when the output is produced earlier. Wire the output based on the original graph’s output tensor instead.🔧 Suggested fix
- for i, orig_node in enumerate(non_conv_ops): + orig_output_name = m.graph.output[0].name if m.graph.output else "tile_out" + for i, orig_node in enumerate(non_conv_ops): new_inputs = [] for inp in orig_node.input: if any(inp == n.output[0] for n in m.graph.node if n.op_type == "Conv"): new_inputs.append("conv_out") elif inp == orig_input_name: new_inputs.append("tile_in") else: new_inputs.append(inp) - - is_last = (i == len(non_conv_ops) - 1) - new_outputs = ["tile_out"] if is_last else list(orig_node.output) + new_outputs = [("tile_out" if o == orig_output_name else o) for o in orig_node.output] attr_kwargs = {a.name: helper.get_attribute_value(a) for a in orig_node.attribute} nodes.append(helper.make_node(orig_node.op_type, new_inputs, new_outputs, **attr_kwargs))dsperse/src/run/runner.py (7)
81-82: Use explicit optional types in__init__parameters.
These default toNone, so make themstr | Nonefor clarity and PEP 484 compliance.♻️ Proposed update
- def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None, parallel_tiles: int = 1, - circuit_cache_dir: str = None, run_dir: str = None, resume: bool = False, resume_run_dir: str = None): + def __init__(self, run_metadata_path: str | None = None, save_metadata_path: str | None = None, parallel_tiles: int = 1, + circuit_cache_dir: str | None = None, run_dir: str | None = None, resume: bool = False, resume_run_dir: str | None = None):
163-163: Makeslice_direxplicitly optional in slice helpers.♻️ Proposed update
- def _run_ezkl_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path = None): + def _run_ezkl_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path | None = None): - def _run_jstprove_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path = None): + def _run_jstprove_slice(self, meta: RunSliceMetadata, input_tensor_path, output_witness_path, slice_dir: Path | None = None):Also applies to: 202-202
860-865: Avoid list allocation for first output tensor.♻️ Proposed tweak
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None)
26-75: Add JST→EZKL→ONNX fallback in the parallel tile worker.
Whenhas_jstis true, the worker returns failure immediately on JST errors and never tries EZKL or ONNX, even though sequential execution does. This makes parallel runs strictly less robust and ignores thetile_onnx_pathargument.Please mirror the sequential fallback chain (JST → EZKL → ONNX) and use
tile_onnx_pathwhen both circuit backends fail.
247-248: Guard againstNonein tile method counting.
TileResult.methodis optional; calling.startswith(...)onNonewill raise.🐛 Proposed fix
- def _count_tile_method(tiles, method_prefix): - return sum(1 for t in tiles if (t.method if isinstance(t, TileResult) else t.get("method", "")).startswith(method_prefix)) + def _count_tile_method(tiles, method_prefix): + prefix = str(method_prefix) + return sum( + 1 + for t in tiles + if str((t.method if isinstance(t, TileResult) else t.get("method", "")) or "").startswith(prefix) + )
645-685: Respectforce_backendin channel-split execution.
Whenforce_backendisJSTPROVE, the current flow can still fall back to EZKL. Align this logic with the tiled path by gating backend availability onforce_backend.🐛 Suggested adjustment
- forced = self.force_backend - has_jst = bool(group.jstprove_circuit_path) and self.jstprove_runner - has_ezkl = bool(group.ezkl_circuit_path) and group.vk_path + forced = self.force_backend or Backend.AUTO + has_jst = bool(group.jstprove_circuit_path) and self.jstprove_runner and forced in (Backend.JSTPROVE, Backend.AUTO) + has_ezkl = bool(group.ezkl_circuit_path) and group.vk_path and forced in (Backend.EZKL, Backend.AUTO)- if (forced == Backend.JSTPROVE and has_jst) or (has_jst and forced != Backend.EZKL): + if has_jst: ... - if has_ezkl: + if has_ezkl: ...
951-954: Demo still uses dict access onExecutionInfo.
slice_resultsvalues areExecutionInfoobjects, soinfo['method']will raise.🐛 Proposed fix
- for slice_id, info in results["slice_results"].items(): - print(f" {slice_id}: {info['method']}") + for slice_id, info in results["slice_results"].items(): + method = info.method if hasattr(info, "method") else info.get("method", "unknown") + print(f" {slice_id}: {method}")
🧹 Nitpick comments (5)
tests/unit/test_autotiler.py (2)
329-336: Silence unused locals in tiled runner.These locals aren’t used; prefixing with underscores keeps lint noise down.
♻️ Proposed tweak
- sh, sw = stride + _sh, _sw = stride @@ - _, c, h, w = input_tensor.shape + _, _c, _h, _w = input_tensor.shape
585-588: Prefernext(...)for single-element selection.♻️ Proposed tweak
- conv_node = [n for n in tile_model.graph.node if n.op_type == "Conv"][0] + conv_node = next(n for n in tile_model.graph.node if n.op_type == "Conv")tests/integration/test_slice.py (1)
256-285: Drop unusedcapfdparameter.♻️ Proposed tweak
- def test_slice_with_channel_splitting(self, tmp_path, capfd): + def test_slice_with_channel_splitting(self, tmp_path):dsperse/src/slice/utils/onnx_utils.py (1)
695-722: Rename unused parameter to make intent explicit.
rootisn’t used in_normalize_slice_dirs; prefixing with_(or removing it) avoids ARG004 noise.♻️ Suggested tweak
- def _normalize_slice_dirs(idx: int, seg: dict, root: Path) -> Tuple[Path | None, Path | None]: + def _normalize_slice_dirs(idx: int, seg: dict, _root: Path) -> Tuple[Path | None, Path | None]:dsperse/src/slice/autotiler.py (1)
99-123: Remove/underscore unused parameters to silence lint.
c_out/min_tileincalculate_channel_split_configandweightsin_calculate_tile_slice_dimsare unused. Prefix with_(or drop them) to make intent explicit.♻️ Suggested tweak
- def calculate_channel_split_config( - c_in: int, - c_out: int, - spatial_h: int, - spatial_w: int, - tile_size: int, - min_tile: int = 7, - ) -> tuple[int | None, int | None, str | None]: + def calculate_channel_split_config( + c_in: int, + _c_out: int, + spatial_h: int, + spatial_w: int, + tile_size: int, + _min_tile: int = 7, + ) -> tuple[int | None, int | None, str | None]:- def _calculate_tile_slice_dims(tile_size: int, conv_params: dict, weights: np.ndarray) -> dict: + def _calculate_tile_slice_dims(tile_size: int, conv_params: dict, _weights: np.ndarray) -> dict:Also applies to: 521-539
| @staticmethod | ||
| def _get_tensor_shape(slice_path) -> TensorShape: | ||
| inputs = [] | ||
| outputs = [] | ||
|
|
||
| if slice_path: | ||
| onnx_model = onnx.load(slice_path) | ||
| graph = onnx_model.graph | ||
| for init in graph.initializer: | ||
| tensor_shape["input"].append(list(init.dims)) | ||
| inputs.append(list(init.dims)) | ||
| for inp in graph.input: | ||
| # Convert Dimension objects to simple values (string or number) | ||
| dimensions = [] | ||
| for dim in inp.type.tensor_type.shape.dim: | ||
| if dim.HasField('dim_param'): | ||
| dimensions.append(dim.dim_param) # Use the string parameter directly | ||
| dimensions.append(dim.dim_param) | ||
| else: | ||
| dimensions.append(dim.dim_value) # Use the numeric value directly | ||
| tensor_shape["input"].append(dimensions) | ||
| dimensions.append(dim.dim_value) | ||
| inputs.append(dimensions) | ||
| for out in graph.output: | ||
| # Convert Dimension objects to simple values (string or number) | ||
| dimensions = [] | ||
| for dim in out.type.tensor_type.shape.dim: | ||
| if dim.HasField('dim_param'): | ||
| dimensions.append(dim.dim_param) # Use the string parameter directly | ||
| dimensions.append(dim.dim_param) | ||
| else: | ||
| dimensions.append(dim.dim_value) # Use the numeric value directly | ||
| tensor_shape["output"].append(dimensions) | ||
| dimensions.append(dim.dim_value) | ||
| outputs.append(dimensions) | ||
|
|
||
| return tensor_shape | ||
| return TensorShape(input=inputs, output=outputs) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not include initializers in TensorShape inputs.
Lines 551‑553 append initializer dims to TensorShape.input. Per schema, TensorShape represents graph I/O shapes; weights belong in WeightShape. Including initializers can corrupt downstream input_shape checks.
🛠️ Proposed fix
- for init in graph.initializer:
- inputs.append(list(init.dims))
for inp in graph.input:
dimensions = []🤖 Prompt for AI Agents
In `@dsperse/src/analyzers/onnx_analyzer.py` around lines 543 - 571, The
_get_tensor_shape function is incorrectly appending graph.initializer dims into
TensorShape.input (in the loop over graph.initializer); remove that behavior so
initializers are not added to inputs, and instead, if you need to represent
weights, collect initializer shapes into a separate WeightShape structure or
list (e.g., weight_shapes) and return/use it where expected; update references
to TensorShape(input=..., output=...) so only graph.input shapes and
graph.output shapes are included and initializers are not mixed into inputs.
| has_jst = bool(meta.jstprove_circuit_path) and getattr(runner, "jstprove_runner", None) | ||
| has_ezkl = bool(meta.ezkl_circuit_path) and bool(meta.vk_path) | ||
|
|
||
| available = [] | ||
| if has_jst: | ||
| available.append(Backend.JSTPROVE) | ||
| if has_ezkl: | ||
| available.append(Backend.EZKL) | ||
| available.append(Backend.ONNX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the actual code in runner_utils.py at lines 180-188
wc -l dsperse/src/run/utils/runner_utils.pyRepository: inference-labs-inc/dsperse
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Get the exact code at the specified lines
sed -n '175,195p' dsperse/src/run/utils/runner_utils.pyRepository: inference-labs-inc/dsperse
Length of output: 1048
🏁 Script executed:
#!/bin/bash
# Search for RunSliceMetadata class definition to understand the structure
rg -n "class RunSliceMetadata" --type=pyRepository: inference-labs-inc/dsperse
Length of output: 129
🏁 Script executed:
#!/bin/bash
# Search for circuit path population and usage
rg -n "jstprove_circuit_path|ezkl_circuit_path|circuit_path" --type=py -B2 -A2 | head -100Repository: inference-labs-inc/dsperse
Length of output: 7486
🏁 Script executed:
#!/bin/bash
# Get RunSliceMetadata class definition
sed -n '429,500p' dsperse/src/analyzers/schema.pyRepository: inference-labs-inc/dsperse
Length of output: 3113
🏁 Script executed:
#!/bin/bash
# Search for circuit_path usage in runner context
rg -n "circuit_path" dsperse/src/run/ --type=py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 9698
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation or comments about circuit_path fields
rg -n "jstprove_circuit_path|ezkl_circuit_path" dsperse/src/analyzers/schema.py --type=py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 1606
🏁 Script executed:
#!/bin/bash
# Look at the full context around lines 180-188 in runner_utils.py
sed -n '170,220p' dsperse/src/run/utils/runner_utils.pyRepository: inference-labs-inc/dsperse
Length of output: 2562
🏁 Script executed:
#!/bin/bash
# Check where prepare_jstprove_metadata and prepare_ezkl_metadata are called
rg -n "prepare_jstprove_metadata|prepare_ezkl_metadata" dsperse/src/run/ --type=py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
#!/bin/bash
# Search for _prepare_jstprove_meta and _prepare_ezkl_meta calls in entire codebase
rg -n "_prepare_jstprove_meta|_prepare_ezkl_meta" --type=pyRepository: inference-labs-inc/dsperse
Length of output: 813
🏁 Script executed:
#!/bin/bash
# Let's see the full implementation of those prepare methods
sed -n '240,260p' dsperse/src/run/utils/runner_utils.pyRepository: inference-labs-inc/dsperse
Length of output: 1020
🏁 Script executed:
#!/bin/bash
# Check the run_onnx_slice, _run_jstprove_slice, and _run_ezkl_slice methods
rg -n "def run_onnx_slice|def _run_jstprove_slice|def _run_ezkl_slice" dsperse/src/run/ --type=py -A5Repository: inference-labs-inc/dsperse
Length of output: 1904
Backend availability gating skips valid circuits with fallback paths.
Lines 180–181 check only meta.jstprove_circuit_path and meta.ezkl_circuit_path, but the backend preparation methods (_prepare_jstprove_meta and _prepare_ezkl_meta) explicitly fall back to meta.circuit_path when the specific paths are absent. This creates an inconsistency: backends won't be marked as available if only circuit_path is populated, causing the code to skip to ONNX even when ZK backends could be used. The verifier.py code already uses this fallback pattern.
🛠️ Proposed fix
- has_jst = bool(meta.jstprove_circuit_path) and getattr(runner, "jstprove_runner", None)
- has_ezkl = bool(meta.ezkl_circuit_path) and bool(meta.vk_path)
+ has_jst = bool(meta.jstprove_circuit_path or meta.circuit_path) and getattr(runner, "jstprove_runner", None)
+ has_ezkl = bool(meta.ezkl_circuit_path or meta.circuit_path) and bool(meta.vk_path)🤖 Prompt for AI Agents
In `@dsperse/src/run/utils/runner_utils.py` around lines 180 - 188, The current
availability checks (has_jst / has_ezkl) only consider
meta.jstprove_circuit_path and meta.ezkl_circuit_path but the preparers
(_prepare_jstprove_meta and _prepare_ezkl_meta) accept meta.circuit_path as a
fallback; update the gating logic so backends are marked available when either
their specific circuit path OR meta.circuit_path is set and the corresponding
runner exists (e.g., include meta.circuit_path in the has_jst and has_ezkl
predicates alongside runner.jstprove_runner / runner.ezkl_runner), then preserve
adding Backend.JSTPROVE, Backend.EZKL, and Backend.ONNX to available as before.
| logger.info("Initializing analysis and tracing shapes...") | ||
| self.traced_shapes, self.traced_dtypes = OnnxUtils.trace_shapes(self.onnx_model) | ||
| OnnxUtils.apply_traced_shapes(self.onnx_model, self.traced_shapes, self.traced_dtypes) | ||
|
|
||
| # Save a temporary traced model for the analyzer and tensor graph | ||
| self._traced_model_path = self.onnx_path.replace(".onnx", "_traced.onnx") | ||
| onnx.save(self.onnx_model, self._traced_model_path) | ||
| self.onnx_path = self._traced_model_path | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup leaves onnx_path pointing at a deleted file.
After slice_model finishes, the traced model is removed but self.onnx_path still points to it. Reusing the same OnnxSlicer instance will fail when it tries to read the deleted path. Restore the original path (and reset analysis) on cleanup, or keep the traced file for the object’s lifetime.
🛠️ Proposed fix
class OnnxSlicer:
def __init__(self, onnx_path, save_path=None):
self.onnx_path = onnx_path
+ self._original_onnx_path = onnx_path
self.save_path = save_path
@@
def _ensure_analysis(self):
@@
- self._traced_model_path = self.onnx_path.replace(".onnx", "_traced.onnx")
+ self._traced_model_path = self._original_onnx_path.replace(".onnx", "_traced.onnx")
onnx.save(self.onnx_model, self._traced_model_path)
self.onnx_path = self._traced_model_path
@@
finally:
if self._traced_model_path and os.path.exists(self._traced_model_path):
logger.info(f"Cleaning up temporary traced model: {self._traced_model_path}")
os.remove(self._traced_model_path)
+ # Restore original path and reset analysis for safe reuse
+ self.onnx_path = self._original_onnx_path
+ self.onnx_model = onnx.load(self._original_onnx_path)
+ self.analysis = None
+ self.onnx_analyzer = NoneAlso applies to: 189-201
🤖 Prompt for AI Agents
In `@dsperse/src/slice/onnx_slicer.py` around lines 33 - 41, The cleanup currently
deletes the temporary traced file but leaves self.onnx_path pointing to that
deleted path; update the cleanup logic in OnnxSlicer (e.g., in slice_model's
teardown or a dedicated cleanup/reset method) to restore the original path by
saving the original value before replacing it (store e.g.,
self._original_onnx_path when you set self._traced_model_path) and then after
removing the traced file set self.onnx_path = self._original_onnx_path and
clear/reset any traced analysis state (self.traced_shapes, self.traced_dtypes,
self._traced_model_path, and self._original_onnx_path), or alternatively avoid
deleting the traced file and keep it for the instance lifetime—pick one
consistent approach and implement it in the cleanup code that currently removes
self._traced_model_path so subsequent calls to slice_model can reuse the object
without file-not-found errors.
| def apply_symbolic_shape_inference(onnx_model: onnx.ModelProto) -> onnx.ModelProto: | ||
| """Apply symbolic shape inference to the model.""" | ||
| from onnxruntime.tools import symbolic_shape_infer | ||
| logger.info("Applying shape inference to model...") | ||
| try: | ||
| onnx_model = symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model) | ||
| logger.info("Shape inference applied successfully") | ||
| except Exception as e: | ||
| logger.warning(f"Shape inference failed: {e}, continuing with original model") | ||
| return onnx_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n dsperse/src/slice/utils/onnx_utils.py | sed -n '60,80p'Repository: inference-labs-inc/dsperse
Length of output: 1229
🏁 Script executed:
rg "from onnxruntime" dsperse/src --type py -nRepository: inference-labs-inc/dsperse
Length of output: 169
🏁 Script executed:
find dsperse -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | head -5Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
find . -maxdepth 2 -type f \( -name "*.txt" -o -name "*.toml" -o -name "*.cfg" -o -name "*.ini" \) | grep -E "(require|setup|pyproject|poetry)" | head -10Repository: inference-labs-inc/dsperse
Length of output: 105
🏁 Script executed:
cat requirements.txt && echo "---" && cat pyproject.toml | grep -A 20 -B 5 "onnx"Repository: inference-labs-inc/dsperse
Length of output: 50382
🏁 Script executed:
rg "logger\.(exception|error|warning)" dsperse/src --type py -B 2 -A 2 | head -80Repository: inference-labs-inc/dsperse
Length of output: 4695
🏁 Script executed:
rg "logger\.exception" dsperse/src --type pyRepository: inference-labs-inc/dsperse
Length of output: 427
🏁 Script executed:
rg "logger\.exception" dsperse/src/backends -B 3 -A 1 --type pyRepository: inference-labs-inc/dsperse
Length of output: 1850
Handle missing ORT shape inference module and preserve exception stack traces.
If onnxruntime.tools.symbolic_shape_infer is unavailable, the import will fail before the try block and won't be caught. Additionally, using logger.warning for unexpected inference failures discards the stack trace. Consider moving the import inside the try block, handling ImportError explicitly with logger.warning (matching the pattern for optional tools), and using logger.exception to preserve stack context for actual inference failures.
Suggested fix
- from onnxruntime.tools import symbolic_shape_infer
- logger.info("Applying shape inference to model...")
- try:
- onnx_model = symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model)
- logger.info("Shape inference applied successfully")
- except Exception as e:
- logger.warning(f"Shape inference failed: {e}, continuing with original model")
+ try:
+ from onnxruntime.tools import symbolic_shape_infer
+ logger.info("Applying shape inference to model...")
+ onnx_model = symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model)
+ logger.info("Shape inference applied successfully")
+ except ImportError as e:
+ logger.warning(f"Symbolic shape inference unavailable: {e}; continuing with original model")
+ except Exception:
+ logger.exception("Shape inference failed; continuing with original model")🧰 Tools
🪛 Ruff (0.14.13)
73-73: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@dsperse/src/slice/utils/onnx_utils.py` around lines 66 - 75, Move the
onnxruntime.tools import into the try block inside
apply_symbolic_shape_inference so ImportError is caught; explicitly catch
ImportError and call logger.warning with a clear message that the optional ORT
shape inference tool is unavailable, and for other exceptions use
logger.exception to preserve stack traces when
symbolic_shape_infer.SymbolicShapeInference.infer_shapes(onnx_model) fails; keep
returning the original onnx_model on failure.
| def isolate_conv(slice_points, model_metadata): | ||
| """ | ||
| Ensure each Conv layer gets its own isolated slice by adding slice points | ||
| immediately after each Conv node. | ||
| """ | ||
| updated_points = set(slice_points) | ||
| nodes_dict = model_metadata.get("nodes", {}) | ||
| max_idx = max((n.get("index", 0) for n in nodes_dict.values()), default=0) | ||
|
|
||
| for _node_name, node_info in nodes_dict.items(): | ||
| if node_info.get("node_type") == "Conv": | ||
| idx = node_info.get("index") | ||
| if idx is None: | ||
| continue | ||
| updated_points.add(idx) | ||
| if idx + 1 <= max_idx: | ||
| updated_points.add(idx + 1) | ||
| return updated_points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return deterministic, sorted slice points.
set iteration order is nondeterministic; downstream slicing expects ordered indices. Return a sorted list to avoid unstable slice boundaries.
🔧 Suggested fix
- return updated_points
+ return sorted(updated_points)🤖 Prompt for AI Agents
In `@dsperse/src/slice/utils/onnx_utils.py` around lines 213 - 230, The
isolate_conv function currently returns an unordered set (updated_points) which
leads to nondeterministic slice ordering; change it to return a
deterministically ordered collection by converting updated_points to a sorted
list of integers (e.g., sorted(updated_points)) before returning so downstream
slicing receives stable, ascending slice indices; ensure the return type matches
callers expecting a list and that the sorting is numeric (use key=int if needed)
while keeping the logic around nodes_dict, max_idx and index additions intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/test_compile.py (1)
13-23: Silence unused fixture args (model_name,capfd).
These are currently unused and will trigger Ruff ARG002.🧹 Proposed fix
if not jstprove_available or not ezkl_available: pytest.skip("Skipping compile happy path: required backends unavailable") + _ = (model_name, capfd)
🤖 Fix all issues with AI agents
In `@dsperse/src/analyzers/runner_analyzer.py`:
- Around line 559-607: The function initialize_run_metadata has implicit
Optional types for parameters run_dir and output_path; update the signature to
use explicit optional types (e.g., run_dir: Optional[Path] or run_dir: Path |
None and output_path: Optional[str] or output_path: str | None depending on
project Python typing style), add the corresponding import from typing if using
Optional, and ensure all internal references to run_dir and output_path remain
unchanged; keep default values None and adjust any other annotated usages
(return type remains tuple[Path, bool, RunMetadata]) to eliminate implicit
Optional annotations.
In `@dsperse/src/backends/ezkl.py`:
- Around line 78-90: The os.makedirs call should be guarded against an empty
dirname to avoid calling makedirs("") when output_file is just a filename;
replace the direct os.makedirs(os.path.dirname(output_file), exist_ok=True) with
logic that computes dirname = os.path.dirname(output_file) and only calls
os.makedirs(dirname, exist_ok=True) if dirname is non-empty (apply the same fix
to the other occurrence around lines 126-128), referencing the output_file
variable and the existing os.path.dirname usage in dsperse/src/backends/ezkl.py.
In `@dsperse/src/backends/utils/jstprove_utils.py`:
- Around line 45-61: The compatibility check in is_compatible currently calls
onnx.load(...) which attempts to load external tensor sidecars and can raise a
LOAD_ERROR; update the onnx.load call in is_compatible (where model is assigned)
to disable external data loading (pass the parameter to turn off
external/sidecar loading, e.g., load_external_data=False or the equivalent
supported by the onnx API) so the function only inspects
model.graph.node/op_type against JSTPROVE_SUPPORTED_OPS and avoids false
negatives when sidecar files are missing; keep the existing exception handling
and return values intact.
In `@dsperse/src/cli/compile.py`:
- Around line 80-82: Remove the unused local variable by deleting the assignment
to layers (the line using getattr(args, 'layers', None)) in compile.py; keep the
backend assignment (backend = getattr(args, 'backend', None)) as-is or only
remove layers if you intend to use it elsewhere—ensure no other code relies on
the local variable layers (search for 'layers' in the same module) before
removal to avoid breaking references.
- Around line 89-99: The current logic allows compilation to proceed with "all
layers" when the user supplied layers but validation returned None; change the
flow so that if args.layers was provided (getattr(args, 'layers', None) is
truthy), target_path is a directory, and _check_layers(...) returned no valid
layers (validated_layers is falsy/None), the command aborts with a clear error
(raise/exit/log) instead of falling back to all layers; update the block that
sets validated_layers/effective_layers to check for this condition and surface a
descriptive error before attempting to combine backend into effective_layers.
In `@dsperse/src/compile/compiler.py`:
- Around line 34-52: The tuple parameter named slice_data in the static method
_compile_slice_worker is unused and triggers a RUF059 lint warning; rename it to
_slice_data (or _ ) in the function signature and update the local unpacking of
args (idx, _slice_data, base_path, slice_dir, backends_to_build, tiling_info,
channel_split_info, compilation_slice_data) so the rest of the logic (calls to
Compiler.compile_channel_split_groups and Compiler.compile_slice and the results
dict) remains unchanged.
In `@dsperse/src/run/runner.py`:
- Around line 286-291: The local binding input_name returned from
RunnerUtils.prepare_channel_split_config in run_channel_split_inference is
unused and triggers Ruff RUF059; rename it to a throwaway variable (e.g.,
_input_name or _) in the tuple assignment to silence the linter while keeping
the call and returned values intact, i.e., update the unpacking of
RunnerUtils.prepare_channel_split_config in run_channel_split_inference
accordingly.
In `@dsperse/src/run/utils/runner_utils.py`:
- Around line 785-786: The parameter annotation for backend in
get_tile_execution_config should use the explicit union syntax instead of an
implicit Optional; update the function signature (get_tile_execution_config) to
annotate backend as str | None (or typing.Union[str, None] if older typing is
required) and ensure any references or overloads using the same parameter (e.g.,
has_jst_runner usage) are updated accordingly to match the explicit union type.
- Around line 632-633: The signature of execute_channel_group uses an implicit
Optional by declaring backend: str = None; update the annotation to be explicit
by changing the backend parameter to either backend: str | None = None (Python
3.10+ union syntax) or backend: Optional[str] = None with a corresponding import
from typing, ensuring the function execute_channel_group and any callers remain
type-correct.
In `@tests/integration/test_compile.py`:
- Around line 468-471: The loop in tests/integration/test_compile.py uses an
unused loop variable g_idx which triggers Ruff B007; change the loop to not bind
the unused index by either iterating values directly (replace "for g_idx, files
in group_files.items():" with "for files in group_files.values():") or use a
throwaway name (e.g., "for _, files in group_files.items():") so only the used
symbol files remains; update the loop that asserts "compiled" in files
accordingly.
- Around line 565-577: The test currently defines a pytest parameter
"model_name" but the function test_compile_resume_tiled does not use it, causing
Ruff ARG002; update both the parametrize call and the test signature to use an
unused-name convention (e.g., change "model_name" to "_model_name" in the
pytest.mark.parametrize decorator and rename the function parameter to
_model_name) so the linter recognizes the parameter is intentionally unused
without changing test behavior.
- Around line 475-485: Rename the unused test parameters to start with an
underscore to silence Ruff ARG002: change the function signature in
test_compile_parallel from (self, model_name, model_dir, slices_output_dir,
jstprove_available, capfd) to (self, _model_name, model_dir, slices_output_dir,
_jstprove_available, capfd) and update the pytest.mark.parametrize decorator to
parametrize("_model_name", ["doom"]) so the param name matches; no other logic
changes are required in test_compile_parallel.
In `@tests/integration/test_pipeline.py`:
- Line 24: Assign or otherwise silence the unused test parameter `model_name` in
each affected test function (e.g., `test_sliced_output_matches_original`, and
the other tests at the listed locations) to avoid Ruff ARG002; for example add a
single statement like `_ = model_name` (or an inline `# noqa: ARG002` comment)
at the start of each test body so the parameter is referenced and the linter
warning is suppressed.
🧹 Nitpick comments (12)
dsperse/src/backends/utils/jstprove_utils.py (1)
102-108: Potential issue with non-numeric input data.
torch.tensor(data)will fail ifdatacontains non-numeric types or is a nested structure that PyTorch cannot infer. Consider adding basic validation or wrapping in a try-except with a meaningful error message.💡 Suggested defensive handling
`@staticmethod` def convert_to_logits(data: Any) -> torch.Tensor: """Helper to convert data to logits tensor with batch dimension.""" - logits = torch.tensor(data) + if not isinstance(data, (list, tuple, torch.Tensor)) and not hasattr(data, '__iter__'): + raise TypeError(f"Cannot convert {type(data).__name__} to logits tensor") + logits = torch.tensor(data) if not isinstance(data, torch.Tensor) else data if logits.dim() == 1: logits = logits.unsqueeze(0) return logitsdsperse/src/backends/jstprove.py (1)
133-133: Minor style: Multiple statements on one line.Consider separating the raise statement for readability.
📝 Suggested fix
- if not ok: raise RuntimeError(f"Circuit compilation failed: {err}") + if not ok: + raise RuntimeError(f"Circuit compilation failed: {err}")dsperse/src/run/utils/runner_utils.py (2)
820-820: Multiple statements on one line.For consistency and readability, separate the raise statement.
📝 Suggested fix
- if tile_tensor is None: raise ValueError(f"Missing tile input tensor '{cache_input_name}'") + if tile_tensor is None: + raise ValueError(f"Missing tile input tensor '{cache_input_name}'")
736-767: Remove unusedexec_infoparameter.The
exec_infoparameter is declared but never used in the method body. It can be safely removed since the caller (runner.pyline 389) already retains access toexec_infofrom its own context and uses it after the method returns.🔧 Proposed fix
`@staticmethod` - def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, exec_info, tensor_cache: dict) -> torch.Tensor | None: + def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, tensor_cache: dict) -> torch.Tensor | None:Update call site in
dsperse/src/run/runner.py:389-390:- final_tensor = RunnerUtils.process_inference_result(current_slice_id, info, ok, result, exec_info, + final_tensor = RunnerUtils.process_inference_result(current_slice_id, info, ok, result, tensor_cache)dsperse/src/analyzers/schema.py (1)
556-568: Consider adding error handling for JSON operations.The
loadandsavemethods perform file I/O without error handling. Consider whether callers expect exceptions to propagate or if wrapping with meaningful error messages would improve debugging.dsperse/src/analyzers/runner_analyzer.py (2)
94-98: Closure does not bind loop variableslice_key.The
_normfunction capturesslice_keyby reference. If this function were to be called after the loop iteration changesslice_key, it would use the wrong value. In the current code,_normis only called within the same iteration, so this is safe, but it's fragile.💡 Suggested fix using default argument binding
- def _norm(rel: Optional[str]) -> Optional[str]: + def _norm(rel: Optional[str], _slice_key: str = slice_key) -> Optional[str]: if not rel: return None - return RunnerAnalyzer.with_slice_prefix(RunnerAnalyzer.rel_from_payload(rel) or rel, slice_key) + return RunnerAnalyzer.with_slice_prefix(RunnerAnalyzer.rel_from_payload(rel) or rel, _slice_key)
206-208: Same closure binding issue withparent_dir.The
_joinfunction capturesparent_dirby reference. Apply the same fix pattern.💡 Suggested fix
- def _join(rel: Optional[str]) -> Optional[str]: - return os.path.join(parent_dir, rel) if rel else None + def _join(rel: Optional[str], _parent_dir: str = parent_dir) -> Optional[str]: + return os.path.join(_parent_dir, rel) if rel else Nonetests/integration/test_full_run.py (1)
152-157: Suppress unused fixture parameters to satisfy Ruff ARG002.The
model_name,run_output_dir, andcapfdparameters are provided via pytest fixtures/parametrize but unused in the test body. Add a no-op assignment to acknowledge intentional binding.🧹 Suggested fix
def test_tiled_full_run(self, model_name: str, model_dir: Path, slices_output_dir: Path, run_output_dir: Path, capfd, jstprove_available): """Verify end-to-end full cycle with tiling: slice -> compile -> run -> prove -> verify.""" if not jstprove_available: pytest.skip("JSTprove unavailable") + _ = (model_name, run_output_dir, capfd) # Unused but required by fixturesdsperse/src/backends/utils/ezkl_utils.py (3)
35-59: Improve exception chaining and type hints in SRS check wrapper.A few improvements would address static analysis hints and improve debuggability:
- Line 37: Use explicit
Noneunion type per PEP 604- Line 58: Chain the exception with
from efor proper traceback preservation🧹 Suggested fix
`@staticmethod` def run_ezkl_command_with_srs_check( - cmd_list: list, env: dict = None, check: bool = True, capture_output: bool = True, text: bool = True, **kwargs + cmd_list: list, env: dict | None = None, check: bool = True, capture_output: bool = True, text: bool = True, **kwargs ) -> subprocess.CompletedProcess: """Wrapper for subprocess.run that detects SRS errors and bubbles them up.""" try: process = subprocess.run( cmd_list, env=env, check=check, capture_output=capture_output, text=text, **kwargs ) if process.stderr: srs_error = EZKLUtils.detect_srs_error(process.stderr) if srs_error: logger.error(f"EZKL SRS Error in command '{' '.join(cmd_list)}': {srs_error}") raise RuntimeError(f"DSperse detected SRS error: {srs_error}") return process except subprocess.CalledProcessError as e: if e.stderr: srs_error = EZKLUtils.detect_srs_error(e.stderr) if srs_error: - logger.error(f"EZKL SRS Error in command '{' '.join(cmd_list)}': {srs_error}") - raise RuntimeError(f"DSperse detected SRS error: {srs_error}") + logger.exception(f"EZKL SRS Error in command '{' '.join(cmd_list)}': {srs_error}") + raise RuntimeError(f"DSperse detected SRS error: {srs_error}") from e raise
61-85: Unusedmodel_pathparameter.The
model_pathparameter is declared but never used ininitialize_compilation_artifacts. If it's reserved for future use or API consistency, consider prefixing with underscore or adding a brief comment.🧹 Suggested fix
`@staticmethod` - def initialize_compilation_artifacts(model_path: Path, output_path: Path, input_file_path: Optional[str]) -> Dict[str, Any]: + def initialize_compilation_artifacts(_model_path: Path, output_path: Path, input_file_path: Optional[str]) -> Dict[str, Any]: """Initialize paths and metadata for EZKL compilation."""
87-97: Consider usinglogging.exceptionfor automatic traceback.When logging errors within an exception handler,
logging.exceptionautomatically includes the traceback, which aids debugging.🧹 Suggested fix
except (KeyError, TypeError, IndexError) as e: - logger.error(f"Could not process witness data: {e}") + logger.exception(f"Could not process witness data: {e}") return Nonetests/integration/test_run.py (1)
399-462: Suppress unusedmodel_namein multiple tiling tests.Several parametrized tests (
test_run_tiling_ezkl,test_run_tiling_onnx_fallback,test_run_resume_complex) have unusedmodel_nameparameters. Consider adding_ = model_nameat the start of each test body to satisfy Ruff ARG002.🧹 Batch fix for all affected tests
def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available): """12. Verify that tiling works with EZKL backend.""" if not ezkl_available: pytest.skip("EZKL unavailable") + _ = model_name ... def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd): """13. Verify that tiling falls back to ONNX if no circuits are available.""" + _ = model_name ... def test_run_resume_complex(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd): """14. Verify that resuming works for tiled/channel-split runs.""" + _ = model_name
| @staticmethod | ||
| def initialize_run_metadata(slices_path: Path, run_dir: Path = None, output_path: str = None, format: str = "dirs") -> tuple[Path, bool, RunMetadata]: | ||
| """Set up the run directory and prepare the run metadata, handling resumes if applicable.""" | ||
| import time | ||
| from dsperse.src.run.utils.runner_utils import RunnerUtils | ||
|
|
||
| # 1. Determine base run directory logic | ||
| requested_dir = Path(output_path) if output_path else run_dir | ||
| resume = False | ||
|
|
||
| if requested_dir: | ||
| if (requested_dir / "metadata.json").exists(): | ||
| actual_run_dir = requested_dir | ||
| resume = True | ||
| elif requested_dir.exists() and not (requested_dir / "metadata.json").exists(): | ||
| latest = RunnerUtils.find_most_recent_run(requested_dir) | ||
| if latest: | ||
| actual_run_dir = latest | ||
| resume = True | ||
| else: | ||
| ts = time.strftime('%Y%m%d_%H%M%S') | ||
| actual_run_dir = requested_dir / f"run_{ts}" | ||
| resume = False | ||
| else: | ||
| actual_run_dir = requested_dir | ||
| resume = actual_run_dir.exists() and (actual_run_dir / "metadata.json").exists() | ||
| else: | ||
| ts = time.strftime('%Y%m%d_%H%M%S') | ||
| base_dir = slices_path.parent | ||
| if base_dir.name == "slices": | ||
| base_dir = base_dir.parent | ||
| actual_run_dir = base_dir / "run" / f"run_{ts}" | ||
| resume = False | ||
|
|
||
| # 2. Load or generate metadata | ||
| if resume: | ||
| metadata_path = actual_run_dir / "metadata.json" | ||
| with open(metadata_path, 'r') as f: | ||
| run_metadata = RunMetadata.from_dict(json.load(f)) | ||
| logger.info(f"Resuming from existing run directory: {actual_run_dir}") | ||
| else: | ||
| actual_run_dir.mkdir(parents=True, exist_ok=True) | ||
| save_path = actual_run_dir / "metadata.json" | ||
| run_metadata = RunMetadata.from_dict( | ||
| RunnerAnalyzer.generate_run_metadata(slices_path, save_path, format) | ||
| ) | ||
| logger.info(f"Started new run in directory: {actual_run_dir}") | ||
|
|
||
| return actual_run_dir, resume, run_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix implicit Optional type annotations.
PEP 484 prohibits implicit Optional. Both run_dir and output_path parameters should use explicit union syntax.
🔧 Proposed fix
`@staticmethod`
- def initialize_run_metadata(slices_path: Path, run_dir: Path = None, output_path: str = None, format: str = "dirs") -> tuple[Path, bool, RunMetadata]:
+ def initialize_run_metadata(slices_path: Path, run_dir: Path | None = None, output_path: str | None = None, format: str = "dirs") -> tuple[Path, bool, RunMetadata]:🧰 Tools
🪛 Ruff (0.14.14)
560-560: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
560-560: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
In `@dsperse/src/analyzers/runner_analyzer.py` around lines 559 - 607, The
function initialize_run_metadata has implicit Optional types for parameters
run_dir and output_path; update the signature to use explicit optional types
(e.g., run_dir: Optional[Path] or run_dir: Path | None and output_path:
Optional[str] or output_path: str | None depending on project Python typing
style), add the corresponding import from typing if using Optional, and ensure
all internal references to run_dir and output_path remain unchanged; keep
default values None and adjust any other annotated usages (return type remains
tuple[Path, bool, RunMetadata]) to eliminate implicit Optional annotations.
| input_file, model_path, output_file, vk_path = str(input_file), str(model_path), str(output_file), str(vk_path) | ||
| if not os.path.exists(input_file): raise FileNotFoundError(f"Input file not found: {input_file}") | ||
| if not os.path.exists(model_path): raise FileNotFoundError(f"Model file not found: {model_path}") | ||
| if not os.path.exists(vk_path): raise FileNotFoundError(f"Verification key file not found: {vk_path}") | ||
|
|
||
| # --- SRS Setup --- | ||
| if settings_path and os.path.exists(settings_path): | ||
| logrows = get_logrows_from_settings(settings_path) | ||
| if logrows: | ||
| if not ensure_srs(logrows): | ||
| return False, f"Failed to ensure SRS for logrows={logrows}" | ||
| logrows = get_logrows_from_settings(str(settings_path)) | ||
| if logrows and not ensure_srs(logrows): return False, f"Failed to ensure SRS for logrows={logrows}" | ||
|
|
||
| # --- Execution --- | ||
| os.makedirs(os.path.dirname(output_file), exist_ok=True) | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard os.makedirs for empty dirname.
If the output path is just a filename, os.path.dirname(...) is empty and makedirs("") will fail.
🧹 Proposed fix
- os.makedirs(os.path.dirname(output_file), exist_ok=True)
+ os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True)
...
- os.makedirs(os.path.dirname(proof_path), exist_ok=True)
+ os.makedirs(os.path.dirname(proof_path) or ".", exist_ok=True)Also applies to: 126-128
🧰 Tools
🪛 Ruff (0.14.14)
79-79: Multiple statements on one line (colon)
(E701)
79-79: Avoid specifying long messages outside the exception class
(TRY003)
80-80: Multiple statements on one line (colon)
(E701)
80-80: Avoid specifying long messages outside the exception class
(TRY003)
81-81: Multiple statements on one line (colon)
(E701)
81-81: Avoid specifying long messages outside the exception class
(TRY003)
86-86: Multiple statements on one line (colon)
(E701)
🤖 Prompt for AI Agents
In `@dsperse/src/backends/ezkl.py` around lines 78 - 90, The os.makedirs call
should be guarded against an empty dirname to avoid calling makedirs("") when
output_file is just a filename; replace the direct
os.makedirs(os.path.dirname(output_file), exist_ok=True) with logic that
computes dirname = os.path.dirname(output_file) and only calls
os.makedirs(dirname, exist_ok=True) if dirname is non-empty (apply the same fix
to the other occurrence around lines 126-128), referencing the output_file
variable and the existing os.path.dirname usage in dsperse/src/backends/ezkl.py.
| @staticmethod | ||
| def is_compatible(model_path: Union[str, Path]) -> Tuple[bool, set]: | ||
| """Check if an ONNX model contains only JSTprove-supported operations.""" | ||
| model_path = Path(model_path) | ||
| if not model_path.exists(): | ||
| return False, {"FILE_NOT_FOUND"} | ||
|
|
||
| try: | ||
| model = onnx.load(str(model_path)) | ||
| ops = {node.op_type for node in model.graph.node} | ||
| unsupported = ops - JSTPROVE_SUPPORTED_OPS | ||
| if unsupported: | ||
| return False, unsupported | ||
| return True, set() | ||
| except Exception as e: | ||
| logger.warning(f"Failed to check JSTprove compatibility: {e}") | ||
| return False, {"LOAD_ERROR"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid false negatives when external data sidecars are missing.
The onnx.load() call attempts to load external tensor data by default. Since this compatibility check only examines graph structure (operation types), missing external data files cause unnecessary LOAD_ERROR returns. Disable external data loading to prevent false negatives.
🛠️ Suggested fix
try:
- model = onnx.load(str(model_path))
+ model = onnx.load(str(model_path), load_external_data=False)
ops = {node.op_type for node in model.graph.node}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @staticmethod | |
| def is_compatible(model_path: Union[str, Path]) -> Tuple[bool, set]: | |
| """Check if an ONNX model contains only JSTprove-supported operations.""" | |
| model_path = Path(model_path) | |
| if not model_path.exists(): | |
| return False, {"FILE_NOT_FOUND"} | |
| try: | |
| model = onnx.load(str(model_path)) | |
| ops = {node.op_type for node in model.graph.node} | |
| unsupported = ops - JSTPROVE_SUPPORTED_OPS | |
| if unsupported: | |
| return False, unsupported | |
| return True, set() | |
| except Exception as e: | |
| logger.warning(f"Failed to check JSTprove compatibility: {e}") | |
| return False, {"LOAD_ERROR"} | |
| `@staticmethod` | |
| def is_compatible(model_path: Union[str, Path]) -> Tuple[bool, set]: | |
| """Check if an ONNX model contains only JSTprove-supported operations.""" | |
| model_path = Path(model_path) | |
| if not model_path.exists(): | |
| return False, {"FILE_NOT_FOUND"} | |
| try: | |
| model = onnx.load(str(model_path), load_external_data=False) | |
| ops = {node.op_type for node in model.graph.node} | |
| unsupported = ops - JSTPROVE_SUPPORTED_OPS | |
| if unsupported: | |
| return False, unsupported | |
| return True, set() | |
| except Exception as e: | |
| logger.warning(f"Failed to check JSTprove compatibility: {e}") | |
| return False, {"LOAD_ERROR"} |
🧰 Tools
🪛 Ruff (0.14.14)
58-58: Consider moving this statement to an else block
(TRY300)
59-59: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@dsperse/src/backends/utils/jstprove_utils.py` around lines 45 - 61, The
compatibility check in is_compatible currently calls onnx.load(...) which
attempts to load external tensor sidecars and can raise a LOAD_ERROR; update the
onnx.load call in is_compatible (where model is assigned) to disable external
data loading (pass the parameter to turn off external/sidecar loading, e.g.,
load_external_data=False or the equivalent supported by the onnx API) so the
function only inspects model.graph.node/op_type against JSTPROVE_SUPPORTED_OPS
and avoids false negatives when sidecar files are missing; keep the existing
exception handling and return values intact.
| backend = getattr(args, 'backend', None) | ||
| layers = getattr(args, 'layers', None) | ||
|
|
||
| if not layers: | ||
| print(f"{Fore.CYAN}No layers specified. Will compile all layers with default fallback (jstprove -> ezkl -> onnx)...{Style.RESET_ALL}") | ||
| logger.info("No layers specified - compiling all layers with default fallback") | ||
| elif backend: | ||
| if ':' in backend: | ||
| print(f"{Fore.CYAN}Compiling specified layers with mixed backends...{Style.RESET_ALL}") | ||
| else: | ||
| backend_name = 'JSTprove' if backend == 'jstprove' else 'EZKL' | ||
| print(f"{Fore.CYAN}Compiling specified layers with {backend_name}...{Style.RESET_ALL}") | ||
| else: | ||
| print(f"{Fore.CYAN}Compiling specified layers (trying jstprove & ezkl, fallback to onnx)...{Style.RESET_ALL}") | ||
| logger.info(f"Starting slices compilation") | ||
|
|
||
| # Resolve path (slices dir or .dsperse/.dslice file) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused layers local.
This triggers Ruff F841 and adds noise.
🧹 Proposed fix
- layers = getattr(args, 'layers', None)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| backend = getattr(args, 'backend', None) | |
| layers = getattr(args, 'layers', None) | |
| if not layers: | |
| print(f"{Fore.CYAN}No layers specified. Will compile all layers with default fallback (jstprove -> ezkl -> onnx)...{Style.RESET_ALL}") | |
| logger.info("No layers specified - compiling all layers with default fallback") | |
| elif backend: | |
| if ':' in backend: | |
| print(f"{Fore.CYAN}Compiling specified layers with mixed backends...{Style.RESET_ALL}") | |
| else: | |
| backend_name = 'JSTprove' if backend == 'jstprove' else 'EZKL' | |
| print(f"{Fore.CYAN}Compiling specified layers with {backend_name}...{Style.RESET_ALL}") | |
| else: | |
| print(f"{Fore.CYAN}Compiling specified layers (trying jstprove & ezkl, fallback to onnx)...{Style.RESET_ALL}") | |
| logger.info(f"Starting slices compilation") | |
| # Resolve path (slices dir or .dsperse/.dslice file) | |
| backend = getattr(args, 'backend', None) | |
🧰 Tools
🪛 Ruff (0.14.14)
81-81: Local variable layers is assigned to but never used
Remove assignment to unused variable layers
(F841)
🤖 Prompt for AI Agents
In `@dsperse/src/cli/compile.py` around lines 80 - 82, Remove the unused local
variable by deleting the assignment to layers (the line using getattr(args,
'layers', None)) in compile.py; keep the backend assignment (backend =
getattr(args, 'backend', None)) as-is or only remove layers if you intend to use
it elsewhere—ensure no other code relies on the local variable layers (search
for 'layers' in the same module) before removal to avoid breaking references.
| # --- Layer Validation --- | ||
| validated_layers = _check_layers(target_path, args.layers) if (getattr(args, 'layers', None) and os.path.isdir(target_path)) else getattr(args, 'layers', None) | ||
|
|
||
| # Combine layers and backend into a single spec for Compiler.compile | ||
| effective_layers = validated_layers | ||
| if backend: | ||
| if not effective_layers: | ||
| effective_layers = backend | ||
| elif ':' not in str(effective_layers) and effective_layers not in ['ezkl', 'jstprove', 'onnx']: | ||
| effective_layers = f"{effective_layers}:{backend}" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abort when validation yields no valid layers.
If a user supplies only missing layers, validated_layers becomes None and the compile falls back to “all layers,” which is surprising and potentially expensive.
🐛 Proposed fix
- validated_layers = _check_layers(target_path, args.layers) if (getattr(args, 'layers', None) and os.path.isdir(target_path)) else getattr(args, 'layers', None)
+ validated_layers = _check_layers(target_path, args.layers) if (getattr(args, 'layers', None) and os.path.isdir(target_path)) else getattr(args, 'layers', None)
+ if getattr(args, 'layers', None) and validated_layers is None:
+ print(f"{Fore.RED}No valid layers found in --layers. Aborting.{Style.RESET_ALL}")
+ return🤖 Prompt for AI Agents
In `@dsperse/src/cli/compile.py` around lines 89 - 99, The current logic allows
compilation to proceed with "all layers" when the user supplied layers but
validation returned None; change the flow so that if args.layers was provided
(getattr(args, 'layers', None) is truthy), target_path is a directory, and
_check_layers(...) returned no valid layers (validated_layers is falsy/None),
the command aborts with a clear error (raise/exit/log) instead of falling back
to all layers; update the block that sets validated_layers/effective_layers to
check for this condition and surface a descriptive error before attempting to
combine backend into effective_layers.
| group_files = comp.get("group_files", {}) | ||
| assert len(group_files) > 0 | ||
| for g_idx, files in group_files.items(): | ||
| assert "compiled" in files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename unused loop var g_idx.
Ruff B007 will flag this.
🧹 Proposed fix
- for g_idx, files in group_files.items():
+ for _g_idx, files in group_files.items():📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| group_files = comp.get("group_files", {}) | |
| assert len(group_files) > 0 | |
| for g_idx, files in group_files.items(): | |
| assert "compiled" in files | |
| group_files = comp.get("group_files", {}) | |
| assert len(group_files) > 0 | |
| for _g_idx, files in group_files.items(): | |
| assert "compiled" in files |
🧰 Tools
🪛 Ruff (0.14.14)
470-470: Loop control variable g_idx not used within loop body
Rename unused g_idx to _g_idx
(B007)
🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 468 - 471, The loop in
tests/integration/test_compile.py uses an unused loop variable g_idx which
triggers Ruff B007; change the loop to not bind the unused index by either
iterating values directly (replace "for g_idx, files in group_files.items():"
with "for files in group_files.values():") or use a throwaway name (e.g., "for
_, files in group_files.items():") so only the used symbol files remains; update
the loop that asserts "compiled" in files accordingly.
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_compile_parallel(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd): | ||
| """Verify that parallel compilation works without errors.""" | ||
| slice_model(SimpleNamespace( | ||
| model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs" | ||
| )) | ||
|
|
||
| # Compile with 2 parallel processes | ||
| compile_model(SimpleNamespace( | ||
| path=str(slices_output_dir), input_file=None, layers=None, backend=None, parallel=2 | ||
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silence unused args (model_name, jstprove_available).
These are unused and will trigger Ruff ARG002.
🧹 Proposed fix
slice_model(SimpleNamespace(
model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs"
))
+ _ = (model_name, jstprove_available)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize("model_name", ["doom"]) | |
| def test_compile_parallel(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd): | |
| """Verify that parallel compilation works without errors.""" | |
| slice_model(SimpleNamespace( | |
| model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs" | |
| )) | |
| # Compile with 2 parallel processes | |
| compile_model(SimpleNamespace( | |
| path=str(slices_output_dir), input_file=None, layers=None, backend=None, parallel=2 | |
| )) | |
| `@pytest.mark.parametrize`("model_name", ["doom"]) | |
| def test_compile_parallel(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd): | |
| """Verify that parallel compilation works without errors.""" | |
| slice_model(SimpleNamespace( | |
| model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs" | |
| )) | |
| _ = (model_name, jstprove_available) | |
| # Compile with 2 parallel processes | |
| compile_model(SimpleNamespace( | |
| path=str(slices_output_dir), input_file=None, layers=None, backend=None, parallel=2 | |
| )) |
🧰 Tools
🪛 Ruff (0.14.14)
476-476: Unused method argument: model_name
(ARG002)
476-476: Unused method argument: jstprove_available
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 475 - 485, Rename the unused
test parameters to start with an underscore to silence Ruff ARG002: change the
function signature in test_compile_parallel from (self, model_name, model_dir,
slices_output_dir, jstprove_available, capfd) to (self, _model_name, model_dir,
slices_output_dir, _jstprove_available, capfd) and update the
pytest.mark.parametrize decorator to parametrize("_model_name", ["doom"]) so the
param name matches; no other logic changes are required in
test_compile_parallel.
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_compile_resume_tiled(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd): | ||
| """Verify that resume mode works for tiled slices.""" | ||
| if not jstprove_available: | ||
| pytest.skip("JSTprove unavailable") | ||
|
|
||
| slice_model(SimpleNamespace( | ||
| model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs", tile_size=1000 | ||
| )) | ||
|
|
||
| # 1. Compile tiled slice 0 | ||
| compile_model(SimpleNamespace( | ||
| path=str(slices_output_dir), input_file=None, layers="0", backend="jstprove" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silence unused model_name.
This triggers Ruff ARG002.
🧹 Proposed fix
slice_model(SimpleNamespace(
model_dir=str(model_dir), output_dir=str(slices_output_dir), save_file=None, output_type="dirs", tile_size=1000
))
+ _ = model_name🧰 Tools
🪛 Ruff (0.14.14)
566-566: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 565 - 577, The test currently
defines a pytest parameter "model_name" but the function
test_compile_resume_tiled does not use it, causing Ruff ARG002; update both the
parametrize call and the test signature to use an unused-name convention (e.g.,
change "model_name" to "_model_name" in the pytest.mark.parametrize decorator
and rename the function parameter to _model_name) so the linter recognizes the
parameter is intentionally unused without changing test behavior.
| """Verify that slicing preserves model semantics.""" | ||
|
|
||
| @pytest.mark.parametrize("model_name", ["net", "doom"]) | ||
| def test_sliced_output_matches_original(self, model_name: str, model_dir: Path, slices_output_dir: Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silence unused model_name parameters.
These are unused in the bodies and will trigger Ruff ARG002. Consider _ = model_name (or # noqa: ARG002) inside each test.
🧹 Example fix pattern
def test_sliced_output_matches_original(self, model_name: str, model_dir: Path, slices_output_dir: Path):
"""Sliced model chain should produce same output as original model."""
+ _ = model_nameAlso applies to: 60-60, 100-100, 130-130, 172-172, 228-228, 288-288
🧰 Tools
🪛 Ruff (0.14.14)
24-24: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_pipeline.py` at line 24, Assign or otherwise silence
the unused test parameter `model_name` in each affected test function (e.g.,
`test_sliced_output_matches_original`, and the other tests at the listed
locations) to avoid Ruff ARG002; for example add a single statement like `_ =
model_name` (or an inline `# noqa: ARG002` comment) at the start of each test
body so the parameter is referenced and the linter warning is suppressed.
Updated test cases to improve channel splitting and tiling workflows by adjusting spatial dimensions, tile sizes, and test parameters. Enhanced logging in parallel tiling tests and refined integration with inference methods to improve coverage and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@tests/integration/test_compile.py`:
- Around line 491-492: The test parameter model_name in test_compile_resume is
unused causing Ruff ARG002; rename it to _model_name (or prefix with underscore)
in the test signature and any fixture references (i.e., change def
test_compile_resume(self, model_name, ...) to def test_compile_resume(self,
_model_name, ...)) so the `@pytest.mark.parametrize`("model_name", ["doom"]) still
supplies values but the linter recognizes it as intentionally unused.
In `@tests/integration/test_run.py`:
- Around line 440-442: The test function test_run_tiling_onnx_fallback declares
a pytest parameter model_name that is never used; rename the parameter to
_model_name (e.g., def test_run_tiling_onnx_fallback(self, _model_name: str,
model_dir: Path, slices_output_dir: Path, capfd):) so Ruff ARG002 is suppressed
and the pytest.mark.parametrize("model_name", ["doom"]) can remain unchanged;
update only the function signature (and any internal references if you later use
it) to avoid the unused-argument warning.
- Around line 468-470: The test function test_run_resume_complex declares an
unused parameter model_name which triggers Ruff ARG002; to fix it either remove
the parameter from the function signature and the
`@pytest.mark.parametrize`("model_name", ["doom"]) decorator, or keep the
parametrize but rename the parameter to _model_name (prefix with underscore) so
linters treat it as intentionally unused; update references in the local scope
if any and ensure the decorator and function signature remain consistent (look
for test_run_resume_complex and the `@pytest.mark.parametrize` entry).
- Around line 403-408: The test function test_run_tiling_ezkl currently declares
an unused parameter model_name (triggering Ruff ARG002); fix by renaming the
parameter to _model_name (or prefixing with an underscore) in the test signature
and in the pytest.mark.parametrize decorator to match (e.g.,
`@pytest.mark.parametrize`("_model_name", ["doom"])) so the parameter is
intentionally marked unused and the linter warning is suppressed.
- Around line 361-366: The test function test_run_parallel_tiling declares an
unused parameter model_name which triggers Ruff ARG002; fix it by renaming the
function parameter to _model_name (or prefix it with an underscore) so the
pytest.mark.parametrize decorator can stay intact while signaling the parameter
is intentionally unused—update the function signature for
test_run_parallel_tiling accordingly and ensure no other references to
model_name remain inside the function.
🧹 Nitpick comments (1)
tests/integration/test_run.py (1)
14-17: Consider consolidating_get_run_dirhelper across test files.This helper method is duplicated in
tests/integration/test_prove.pyandtests/integration/test_verify.py(as shown in relevant code snippets). The version here lacks the fallback regex pattern that those files have.♻️ Suggested improvement
def _get_run_dir(self, output: str) -> Path: match = re.search(r"Run data saved to (.+)", output) + if not match: + match = re.search(r"within the run directory (.+)", output) assert match, f"Could not find run directory in output: {output}" - return Path(match.group(1).strip()) + path_str = match.group(1).strip() + path_str = re.sub(r'\x1b\[[0-9;]*m', '', path_str).split()[0] + return Path(path_str)
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_compile_resume(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress unused model_name parameter (Ruff ARG002).
🧹 Proposed fix
def test_compile_resume(self, model_name, model_dir, slices_output_dir, jstprove_available, capfd):
"""Verify that resume mode skips already compiled slices."""
+ _ = model_name
slice_model(SimpleNamespace(🧰 Tools
🪛 Ruff (0.14.14)
492-492: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_compile.py` around lines 491 - 492, The test parameter
model_name in test_compile_resume is unused causing Ruff ARG002; rename it to
_model_name (or prefix with underscore) in the test signature and any fixture
references (i.e., change def test_compile_resume(self, model_name, ...) to def
test_compile_resume(self, _model_name, ...)) so the
`@pytest.mark.parametrize`("model_name", ["doom"]) still supplies values but the
linter recognizes it as intentionally unused.
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available): | ||
| """11. Verify that parallel execution (threads > 1) works for tiling.""" | ||
| if not jstprove_available: | ||
| pytest.skip("JSTprove unavailable") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress unused model_name parameter (Ruff ARG002).
Same issue as test_run_with_tiling.
🧹 Proposed fix
def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available):
"""11. Verify that parallel execution (threads > 1) works for tiling."""
if not jstprove_available:
pytest.skip("JSTprove unavailable")
+ _ = model_name📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize("model_name", ["doom"]) | |
| def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available): | |
| """11. Verify that parallel execution (threads > 1) works for tiling.""" | |
| if not jstprove_available: | |
| pytest.skip("JSTprove unavailable") | |
| `@pytest.mark.parametrize`("model_name", ["doom"]) | |
| def test_run_parallel_tiling(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, caplog, jstprove_available): | |
| """11. Verify that parallel execution (threads > 1) works for tiling.""" | |
| if not jstprove_available: | |
| pytest.skip("JSTprove unavailable") | |
| _ = model_name |
🧰 Tools
🪛 Ruff (0.14.14)
362-362: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 361 - 366, The test function
test_run_parallel_tiling declares an unused parameter model_name which triggers
Ruff ARG002; fix it by renaming the function parameter to _model_name (or prefix
it with an underscore) so the pytest.mark.parametrize decorator can stay intact
while signaling the parameter is intentionally unused—update the function
signature for test_run_parallel_tiling accordingly and ensure no other
references to model_name remain inside the function.
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available): | ||
| """12. Verify that tiling works with EZKL backend.""" | ||
| if not ezkl_available: | ||
| pytest.skip("EZKL unavailable") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress unused model_name parameter (Ruff ARG002).
🧹 Proposed fix
def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available):
"""12. Verify that tiling works with EZKL backend."""
if not ezkl_available:
pytest.skip("EZKL unavailable")
+ _ = model_name📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize("model_name", ["doom"]) | |
| def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available): | |
| """12. Verify that tiling works with EZKL backend.""" | |
| if not ezkl_available: | |
| pytest.skip("EZKL unavailable") | |
| `@pytest.mark.parametrize`("model_name", ["doom"]) | |
| def test_run_tiling_ezkl(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd, ezkl_available): | |
| """12. Verify that tiling works with EZKL backend.""" | |
| if not ezkl_available: | |
| pytest.skip("EZKL unavailable") | |
| _ = model_name |
🧰 Tools
🪛 Ruff (0.14.14)
404-404: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 403 - 408, The test function
test_run_tiling_ezkl currently declares an unused parameter model_name
(triggering Ruff ARG002); fix by renaming the parameter to _model_name (or
prefixing with an underscore) in the test signature and in the
pytest.mark.parametrize decorator to match (e.g.,
`@pytest.mark.parametrize`("_model_name", ["doom"])) so the parameter is
intentionally marked unused and the linter warning is suppressed.
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd): | ||
| """13. Verify that tiling falls back to ONNX if no circuits are available.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress unused model_name parameter (Ruff ARG002).
🧹 Proposed fix
def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
"""13. Verify that tiling falls back to ONNX if no circuits are available."""
+ _ = model_name
slice_model(SimpleNamespace(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize("model_name", ["doom"]) | |
| def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd): | |
| """13. Verify that tiling falls back to ONNX if no circuits are available.""" | |
| `@pytest.mark.parametrize`("model_name", ["doom"]) | |
| def test_run_tiling_onnx_fallback(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd): | |
| """13. Verify that tiling falls back to ONNX if no circuits are available.""" | |
| _ = model_name |
🧰 Tools
🪛 Ruff (0.14.14)
441-441: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 440 - 442, The test function
test_run_tiling_onnx_fallback declares a pytest parameter model_name that is
never used; rename the parameter to _model_name (e.g., def
test_run_tiling_onnx_fallback(self, _model_name: str, model_dir: Path,
slices_output_dir: Path, capfd):) so Ruff ARG002 is suppressed and the
pytest.mark.parametrize("model_name", ["doom"]) can remain unchanged; update
only the function signature (and any internal references if you later use it) to
avoid the unused-argument warning.
| @pytest.mark.parametrize("model_name", ["doom"]) | ||
| def test_run_resume_complex(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd): | ||
| """14. Verify that resuming works for tiled/channel-split runs.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress unused model_name parameter (Ruff ARG002).
🧹 Proposed fix
def test_run_resume_complex(self, model_name: str, model_dir: Path, slices_output_dir: Path, capfd):
"""14. Verify that resuming works for tiled/channel-split runs."""
+ _ = model_name
slice_model(SimpleNamespace(🧰 Tools
🪛 Ruff (0.14.14)
469-469: Unused method argument: model_name
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integration/test_run.py` around lines 468 - 470, The test function
test_run_resume_complex declares an unused parameter model_name which triggers
Ruff ARG002; to fix it either remove the parameter from the function signature
and the `@pytest.mark.parametrize`("model_name", ["doom"]) decorator, or keep the
parametrize but rename the parameter to _model_name (prefix with underscore) so
linters treat it as intentionally unused; update references in the local scope
if any and ensure the decorator and function signature remain consistent (look
for test_run_resume_complex and the `@pytest.mark.parametrize` entry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@dsperse/src/run/channel_split_executor.py`:
- Around line 72-76: The code calls ort.InferenceSession with
str(group_onnx_path) without checking that _resolve_path(group.path) returned a
valid path; add a guard after calling self._resolve_path(group.path) in both
ONNX branches (the block that sets group_onnx_path and the other ONNX branch
around outputs/session creation) to verify group_onnx_path is not None/empty,
and if it is, raise a clear exception or log and return an appropriate
error/ExecutionMethod instead of proceeding to ort.InferenceSession; update the
blocks that reference group_onnx_path and session to use this validation and
avoid constructing a session with "None".
In `@dsperse/src/run/runner.py`:
- Around line 68-77: The tiled and channel-split branches return None as the
result tensor, causing downstream functions (process_inference_result /
finalize_run_results) to use the wrong tensor; change the return values in both
run_channel_split_inference and run_tiled_inference call sites to return the
actual output tensor carried by exec_info (e.g., exec_info.output_tensor)
instead of None, i.e., return exec_info.success, exec_info.output_tensor,
exec_info so process_inference_result receives the real slice output.
In `@dsperse/src/run/utils/runner_utils.py`:
- Around line 350-352: _count_tile_method currently calls .startswith() on
tile.method which may be None; update the function to coalesce the method value
to an empty string before calling startswith for both TileResult instances and
dict tiles (i.e., use (t.method or "") for TileResult and t.get("method", "") or
"" for dicts) so the check is safe and save_run_results won't abort on None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dsperse/src/run/channel_split_executor.py`:
- Around line 108-114: In apply_bias, guard against _resolve_path returning None
before calling Path(...).exists(): ensure you check that channel_split.bias_path
is truthy and that bias_path returned from _resolve_path is not None (or
non-empty) before constructing Path(bias_path) and calling exists(); if the
resolved path is None or empty, skip loading the bias. This references the
apply_bias method, ChannelSplitInfo.bias_path, and the _resolve_path helper—add
a simple None/empty check around the Path(...).exists() usage and early-return
or skip the bias load when absent.
🧹 Nitpick comments (8)
dsperse/src/run/channel_split_executor.py (2)
44-47: Use explicitOptionaltype forbackendparameter.The
backend: str = Noneparameter uses implicit Optional which violates PEP 484.♻️ Proposed fix
def execute_group( self, group: ChannelGroupInfo, group_input: torch.Tensor, run_dir: Path, - output_shape: tuple, backend: str = None + output_shape: tuple, backend: str | None = None ) -> tuple[torch.Tensor, str]:
124-133: Remove duplicate_extract_output_tensor- delegate toRunnerUtils.This method is identical to
RunnerUtils.extract_output_tensor(lines 20-33 in runner_utils.py) and also duplicated intile_executor.py. Consider delegating to the centralized utility.♻️ Proposed fix
- `@staticmethod` - def _extract_output_tensor(result): - if result is None: - return None - if isinstance(result, dict): - for key in ['output', 'logits', 'output_tensor']: - if key in result and result[key] is not None: - return result[key] - return None - return result + `@staticmethod` + def _extract_output_tensor(result): + from dsperse.src.run.utils.runner_utils import RunnerUtils + return RunnerUtils.extract_output_tensor(result)dsperse/src/run/runner.py (4)
29-46: Use explicitOptionaltype forrun_dirparameter.The
run_dir: str = Noneparameter uses implicit Optional which violates PEP 484. Same applies to line 33'son_slice_completewhich already usesOptional.♻️ Proposed fix
def __init__( self, - run_dir: str = None, + run_dir: str | None = None, threads: int = 1, on_slice_complete: Optional[SliceCompleteCallback] = None ):
100-100: Use explicitOptionaltype forbackendparameter.♻️ Proposed fix
- def run_circuit_with_fallback(self, meta: RunSliceMetadata, in_file: Path, out_file: Path, slice_dir: Path, backend: str = None): + def run_circuit_with_fallback(self, meta: RunSliceMetadata, in_file: Path, out_file: Path, slice_dir: Path, backend: str | None = None):
164-164: Use explicitOptionaltypes forslice_dirparameters.Both
run_ezkl_inference(line 164) andrun_jstprove_inference(line 194) use implicit Optional forslice_dir.♻️ Proposed fix
- def run_ezkl_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path = None): + def run_ezkl_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path | None = None): ... - def run_jstprove_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path = None): + def run_jstprove_inference(self, meta: RunSliceMetadata, input_tensor_path: Path, output_witness_path: Path, slice_dir: Path | None = None):Also applies to: 194-194
393-393: Split multiple statements on one line for readability.Line 393 has multiple statements on one line which reduces readability.
♻️ Proposed fix
- for oname in info.dependencies.output: tensor_cache[oname] = cached_tensor + for oname in info.dependencies.output: + tensor_cache[oname] = cached_tensordsperse/src/run/utils/runner_utils.py (2)
406-418: Log exceptions incheck_slice_completion_statusinstead of silently passing.The
try-except-passpattern at lines 416-417 silently swallows all exceptions, which can hide bugs. At minimum, log the exception.♻️ Proposed fix
try: with open(output_file, 'r') as f: data = json.load(f) return True, data - except Exception: - pass + except (json.JSONDecodeError, KeyError) as e: + logger.debug(f"Failed to load slice completion data for {slice_id}: {e}") return False, None
452-465: Address unused parameter and use efficient iterator access.
- The
exec_infoparameter (line 453) is unused- Line 465 uses
list(...)[0]which is less efficient thannext(iter(...))♻️ Proposed fix
`@staticmethod` - def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, exec_info, tensor_cache: dict) -> torch.Tensor | None: + def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, _exec_info, tensor_cache: dict) -> torch.Tensor | None: """Handles reshaping, tensor cache updates, and final tensor extraction from inference results.""" ... if isinstance(result, dict) and 'output_tensors' in result: for oname, tensor in result['output_tensors'].items(): tensor_cache[oname] = tensor - first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🤖 Fix all issues with AI agents
In `@dsperse/src/prove/prover.py`:
- Around line 71-104: The tiled-proof branch in _prepare_work_items drops the
output_path so tiled proofs always write under run_path; update
_prepare_work_items to include the provided output_path in the work_items tuple
(alongside the existing entries) so downstream callers know the override, and
then update ProverUtils.execute_slice_proving and ProverUtils.prove_tile_batch
to accept that output_path parameter and use it when constructing tile proof
output locations (instead of defaulting to run_path). Refer to the symbols
_prepare_work_items, work_items (the tuple appended),
ProverUtils.execute_slice_proving, and ProverUtils.prove_tile_batch when making
the changes.
- Around line 34-44: The JSTprove initialization currently catches all Exception
and swallows unexpected errors; change the except block in the prover
initialization to only catch RuntimeError (matching the EZKL pattern) and call
logger.exception(...) when catching it so the stack trace is preserved;
specifically update the try/except around creating self.jstprove_runner =
JSTprove() to except RuntimeError as e, set self.jstprove_runner = None, and log
the failure with logger.exception("JSTprove CLI not available. JSTprove backend
will be disabled.", exc_info=True) or equivalent to retain error details.
- Around line 199-213: The temporary dirs created when converting model_dir in
_handle_single_slice_proving aren't guaranteed to be cleaned up if
_prove_single_slice raises; wrap the call to self._prove_single_slice in a
try/finally so that if detected != "dirs" you always call
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))),
output_type=detected, cleanup=True) in the finally block; ensure you still
return the result when no exception and re-raise any exception after cleanup so
behavior of _handle_single_slice_proving, _prove_single_slice, and variables
dirs_model_path/detected remain consistent.
In `@dsperse/src/prove/utils/prover_utils.py`:
- Around line 14-43: In resolve_prove_artifacts, backend-specific PK/settings
fields are not being used; update the selection logic so when b == Backend.EZKL
you use meta.ezkl_pk_path (fallback to meta.pk_path) and meta.ezkl_settings_path
(fallback to meta.settings_path), and when b == Backend.JSTPROVE use
meta.jstprove_pk_path and meta.jstprove_settings_path similarly; then call
Utils.resolve_under_slice for pk_path and settings_path as already done and keep
the existing existence check for settings_path. Reference:
resolve_prove_artifacts, Backend.EZKL, Backend.JSTPROVE, meta.ezkl_pk_path,
meta.ezkl_settings_path, meta.jstprove_pk_path, meta.jstprove_settings_path,
Utils.resolve_under_slice.
In `@dsperse/src/run/channel_split_executor.py`:
- Around line 51-52: The current has_ezkl check (has_ezkl =
bool(group.ezkl_circuit_path) and (group.vk_path or group.ezkl_vk_path)) doesn’t
verify that an EZKL runner exists, so calls to
self.ezkl_runner.generate_witness() can hit None; update the condition to also
require self.ezkl_runner (e.g., include "and self.ezkl_runner") or add an
explicit guard before invoking generate_witness() so groups with EZKL files but
no runner fall back safely; locate uses of has_ezkl and the call to
generate_witness() to apply the fix (symbols: has_ezkl, self.ezkl_runner,
generate_witness).
- Around line 108-113: In apply_bias, the bias tensor loaded from numpy is
created on CPU with default dtype which can cause device/dtype mismatches when
adding to summed; modify the creation of bias_tensor in apply_bias
(ChannelSplitInfo.bias_path handling) to construct it on the same device and
dtype as summed (e.g., use torch.as_tensor or torch.from_numpy followed by
to(device=summed.device, dtype=summed.dtype)) and then reshape to (1, -1, 1, 1)
before adding so the addition uses matching device and dtype.
In `@dsperse/src/verify/utils/verifier_utils.py`:
- Around line 237-260: In verify_slice_logic, the tiled branch reconstructs
RunSliceMetadata.from_dict({'tiling': tiling_info}) losing resolved artifact
paths so downstream verify_tile/verify_tile_batch get None; fix by constructing
RunSliceMetadata with the full metadata (include circuit_path, settings_path,
vk_path, input_path, output_path, witness_path, proof_path or their resolved
filenames) or pass the already-resolved Path values directly into
VerifierUtils.verify_tile_batch/verify_tile so tile verification receives the
correct artifact paths; update the call site in verify_slice_logic (and the
tiling variable usage) to supply the complete metadata or direct paths rather
than only tiling_info.
In `@dsperse/src/verify/verifier.py`:
- Around line 198-213: The call to _verify_single_slice in
_handle_single_slice_verifying can raise and currently skips the cleanup
Converter.convert(..., cleanup=True); wrap the _verify_single_slice invocation
in a try/finally block and move the Converter.convert cleanup call into the
finally so temporary "dirs" output is always removed when detected != "dirs";
keep the initial conversion that sets dirs_model_path as-is and use
Utils.dirs_root_from(Path(dirs_model_path)) in the finally to compute the path
for Converter.convert.
- Around line 27-45: The JSTprove initialization in Verifier.__init__ is
catching a broad Exception and swallowing tracebacks; change the second
try/except to catch RuntimeError (like the EZKL block) by using "except
RuntimeError as e" for JSTprove(), set self.jstprove_runner = None, and call
logger.exception("JSTprove CLI not available. JSTprove backend will be
disabled.") so the stack trace is preserved and unexpected errors aren't hidden
(refer to jstprove_runner and Verifier.__init__).
- Around line 244-255: The single-slice verifier (_verify_single_slice) resolves
settings_path and vk_path but doesn't validate they exist before calling
VerifierUtils.verify_with_backend; add the same pre-checks used in
_prepare_work_items: after computing settings_path and vk_path (via
Utils.resolve_under_slice using slice_dir and meta.settings_path/meta.vk_path),
if preferred indicates the EZKL backend (check the preferred variable or backend
string used elsewhere) then verify settings_path.exists() and vk_path.exists()
and raise FileNotFoundError with clear messages (or fail fast) if missing so
verification errors are surfaced early before calling
VerifierUtils.verify_with_backend.
🧹 Nitpick comments (8)
dsperse/src/prove/prover.py (3)
107-124: Align sequential error handling with the parallel path and keep stack traces.Sequential mode currently aborts on the first failure, while parallel mode records failures and continues. Also,
logger.errordrops the stack trace.🛠️ Suggested change
- for future in as_completed(futures): + for future in as_completed(futures): slice_id = futures[future] try: results.append(future.result()) except Exception as e: - logger.error(f"Slice {slice_id} proving failed: {e}") + logger.exception(f"Slice {slice_id} proving failed: {e}") results.append({'slice_id': slice_id, 'success': False, 'method': None, 'error': str(e)}) return results else: - return [ProverUtils.prove_slice_logic(item) for item in work_items] + results = [] + for item in work_items: + slice_id = item[0] + try: + results.append(ProverUtils.prove_slice_logic(item)) + except Exception as e: + logger.exception(f"Slice {slice_id} proving failed: {e}") + results.append({'slice_id': slice_id, 'success': False, 'method': None, 'error': str(e)}) + return results
215-217: Drop or underscore the unuseddetectedparameter.🛠️ Suggested change
- def _prove_single_slice(run_path: Path, model_dir: str | Path, detected: str, backend: str, + def _prove_single_slice(run_path: Path, model_dir: str | Path, _detected: str, backend: str, tiles_range: range | list[int] | None = None) -> dict:
256-306: Consider moving the ad-hoc test harness to a dedicated script.Keeping the demo block in a separate script makes the module slimmer and avoids shipping debug paths.
dsperse/src/verify/utils/verifier_utils.py (2)
16-36: Don't silently swallow run_results failures.Returning
Noneon any exception makes root-cause debugging hard; log the exception at least once.🛠️ Suggested change
- except Exception: - return None + except Exception: + logger.exception("Failed to load run_results for %s", run_path) + return None
90-100: Log failures when proof-path extraction fails.The empty
except ...: passhides malformed run_results.json.🛠️ Suggested change
- except Exception: - pass + except Exception: + logger.exception("Failed to extract proof paths from run results")dsperse/src/verify/verifier.py (2)
114-131: Align sequential error handling with the parallel path and keep stack traces.Sequential mode currently aborts on the first failure, while parallel mode records failures and continues. Also,
logger.errordrops the stack trace.🛠️ Suggested change
- for future in as_completed(futures): + for future in as_completed(futures): slice_id = futures[future] try: results.append(future.result()) except Exception as e: - logger.error(f"Slice {slice_id} verification failed: {e}") + logger.exception(f"Slice {slice_id} verification failed: {e}") results.append(SliceResult(slice_id=slice_id, success=False, error=str(e)).to_dict()) return results else: - return [VerifierUtils.verify_slice_logic(item) for item in work_items] + results = [] + for item in work_items: + slice_id = item[0] + try: + results.append(VerifierUtils.verify_slice_logic(item)) + except Exception as e: + logger.exception(f"Slice {slice_id} verification failed: {e}") + results.append(SliceResult(slice_id=slice_id, success=False, error=str(e)).to_dict()) + return results
272-320: Consider moving the ad-hoc test harness to a dedicated script.dsperse/src/prove/utils/prover_utils.py (1)
282-287: Prefix the unusedslice_dirbinding.🛠️ Suggested change
- (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, run_path, slice_dir) = args + (slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, tiling_info, run_path, _slice_dir) = args
| try: | ||
| self.ezkl_runner = EZKL() | ||
| except RuntimeError: | ||
| self.ezkl_runner = None | ||
| logger.warning("EZKL CLI not available. EZKL backend will be disabled.") | ||
|
|
||
| try: | ||
| self.jstprove_runner = JSTprove() | ||
| except Exception: | ||
| self.jstprove_runner = None | ||
| logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find and examine the prover.py file
find . -name "prover.py" -type f | head -20Repository: inference-labs-inc/dsperse
Length of output: 99
🏁 Script executed:
# Look for JSTprove import and definition
rg "JSTprove" -A 5 -B 5 | head -100Repository: inference-labs-inc/dsperse
Length of output: 7920
🏁 Script executed:
# Check the full __init__ method context
fd "prover.py" --exec cat -n {} \;Repository: inference-labs-inc/dsperse
Length of output: 16372
🏁 Script executed:
# Find the JSTprove implementation
find . -path "*/backends/jstprove*" -type fRepository: inference-labs-inc/dsperse
Length of output: 104
🏁 Script executed:
# Search for JSTprove class definition
rg "class JSTprove" -A 30Repository: inference-labs-inc/dsperse
Length of output: 5013
🏁 Script executed:
# Look for what exceptions could be raised during initialization
rg "def __init__" -A 20 | grep -A 20 "class JSTprove" | head -50Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Get the full JSTprove.__init__ method
cat -n ./dsperse/src/backends/jstprove.py | head -80Repository: inference-labs-inc/dsperse
Length of output: 3269
🏁 Script executed:
# Look for the exception handling in JSTprove.__init__
rg "subprocess.run" -A 15 dsperse/src/backends/jstprove.py | head -40Repository: inference-labs-inc/dsperse
Length of output: 1377
Avoid swallowing unexpected JSTprove initialization failures.
Catching Exception here hides real setup bugs. JSTprove's __init__ documents it raises RuntimeError for missing CLI. Catch only RuntimeError to align with the EZKL pattern above, and use logger.exception() to preserve the stack trace for debugging.
🛠️ Suggested change
- try:
- self.jstprove_runner = JSTprove()
- except Exception:
- self.jstprove_runner = None
- logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
+ try:
+ self.jstprove_runner = JSTprove()
+ except RuntimeError:
+ self.jstprove_runner = None
+ logger.exception("JSTprove CLI not available. JSTprove backend will be disabled.")🧰 Tools
🪛 Ruff (0.14.14)
42-42: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@dsperse/src/prove/prover.py` around lines 34 - 44, The JSTprove
initialization currently catches all Exception and swallows unexpected errors;
change the except block in the prover initialization to only catch RuntimeError
(matching the EZKL pattern) and call logger.exception(...) when catching it so
the stack trace is preserved; specifically update the try/except around creating
self.jstprove_runner = JSTprove() to except RuntimeError as e, set
self.jstprove_runner = None, and log the failure with logger.exception("JSTprove
CLI not available. JSTprove backend will be disabled.", exc_info=True) or
equivalent to retain error details.
| @staticmethod | ||
| def _prepare_work_items(slices_iter: list, dirs_path: Path, run_path: Path, output_path: str | Path | None) -> list: | ||
| """Prepare proving tasks for each slice.""" | ||
| work_items = [] | ||
| for slice_id, meta in slices_iter: | ||
| slice_dir = Utils.slice_dirs_path(dirs_path, slice_id) | ||
| preferred = ProverUtils.select_backend(run_path, slice_id, meta) | ||
| circuit_path, pk_path, settings_path = ProverUtils.resolve_prove_artifacts(slice_dir, meta, preferred) | ||
|
|
||
| tiling = meta.tiling | ||
| if tiling: | ||
| witness_path = None | ||
| proof_path = None | ||
| else: | ||
| if preferred == Backend.JSTPROVE: | ||
| wf = ProverUtils.get_witness_file(run_path, slice_id) | ||
| witness_path = Path(wf) if wf else (run_path / slice_id / "output_witness.bin") | ||
| else: | ||
| witness_path = Utils.witness_path_for(run_path, slice_id) | ||
|
|
||
| if not witness_path.exists(): | ||
| logger.warning(f"Skipping {slice_id}: witness not found at {witness_path}") | ||
| continue | ||
|
|
||
| proof_path = Utils.proof_output_path(run_path, slice_id, output_path) | ||
|
|
||
| if not circuit_path or not os.path.exists(circuit_path): | ||
| logger.warning(f"Skipping {slice_id}: compiled circuit not found ({circuit_path})") | ||
| continue | ||
|
|
||
| work_items.append(( | ||
| slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path, | ||
| tiling.to_dict() if tiling else None, str(run_path), str(slice_dir) | ||
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiled proofs ignore the output_path override.
For tiled slices, output_path is dropped, so tile proofs always land under run_path. Pass output_path through the work-item tuple and into ProverUtils.execute_slice_proving/prove_tile_batch so custom output locations are honored.
🛠️ Suggested change (plus propagate to ProverUtils)
- work_items.append((
- slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path,
- tiling.to_dict() if tiling else None, str(run_path), str(slice_dir)
- ))
+ work_items.append((
+ slice_id, preferred, witness_path, circuit_path, proof_path, pk_path, settings_path,
+ tiling.to_dict() if tiling else None, str(run_path), str(slice_dir), output_path
+ ))🤖 Prompt for AI Agents
In `@dsperse/src/prove/prover.py` around lines 71 - 104, The tiled-proof branch in
_prepare_work_items drops the output_path so tiled proofs always write under
run_path; update _prepare_work_items to include the provided output_path in the
work_items tuple (alongside the existing entries) so downstream callers know the
override, and then update ProverUtils.execute_slice_proving and
ProverUtils.prove_tile_batch to accept that output_path parameter and use it
when constructing tile proof output locations (instead of defaulting to
run_path). Refer to the symbols _prepare_work_items, work_items (the tuple
appended), ProverUtils.execute_slice_proving, and ProverUtils.prove_tile_batch
when making the changes.
| def _handle_single_slice_proving(self, run_path: Path, model_dir: str | Path, detected: str, | ||
| backend: str | None, tiles_range: range | list[int] | None) -> dict: | ||
| """Internal: Handle proving for a single slice run directory.""" | ||
| if backend not in (Backend.JSTPROVE, Backend.EZKL): | ||
| raise ValueError("Single-slice proving requires explicit backend: 'jstprove' or 'ezkl'.") | ||
|
|
||
| dirs_model_path = model_dir | ||
| if detected != "dirs": | ||
| dirs_model_path = Converter.convert(str(model_dir), output_type="dirs", cleanup=False) | ||
|
|
||
| result = self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range) | ||
|
|
||
| if detected != "dirs": | ||
| Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True) | ||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
fd -t f "prover.py" | head -20Repository: inference-labs-inc/dsperse
Length of output: 97
🏁 Script executed:
cat -n dsperse/src/prove/prover.py | sed -n '199,213p'Repository: inference-labs-inc/dsperse
Length of output: 1080
🏁 Script executed:
cat -n dsperse/src/prove/prover.py | grep -A 30 "def _prove_single_slice"Repository: inference-labs-inc/dsperse
Length of output: 1922
🏁 Script executed:
cat -n dsperse/src/prove/prover.py | sed -n '216,260p'Repository: inference-labs-inc/dsperse
Length of output: 2524
🏁 Script executed:
rg "raise|except" dsperse/src/prove/prover.py -B 2 -A 2 | head -60Repository: inference-labs-inc/dsperse
Length of output: 1836
🏁 Script executed:
rg "ProverUtils\.|execute_slice_proving|initialize_prove_metadata" dsperse/src/prove/prover.py -A 3 | head -40Repository: inference-labs-inc/dsperse
Length of output: 1975
🏁 Script executed:
rg "def initialize_prove_metadata|def execute_slice_proving" dsperse -A 10 | head -80Repository: inference-labs-inc/dsperse
Length of output: 1982
🏁 Script executed:
cat -n dsperse/src/prove/utils/prover_utils.py | grep -A 60 "def execute_slice_proving"Repository: inference-labs-inc/dsperse
Length of output: 2238
🏁 Script executed:
rg "Converter.convert" dsperse -B 2 -A 2 | head -50Repository: inference-labs-inc/dsperse
Length of output: 3556
🏁 Script executed:
cat -n dsperse/src/verify/verifier.py | grep -B 5 -A 15 "_verify_packaged"Repository: inference-labs-inc/dsperse
Length of output: 1860
🏁 Script executed:
cat -n dsperse/src/verify/verifier.py | grep -B 5 -A 15 "_handle_single_slice_verifying"Repository: inference-labs-inc/dsperse
Length of output: 2173
🏁 Script executed:
cat -n dsperse/src/prove/prover.py | sed -n '216,254p'Repository: inference-labs-inc/dsperse
Length of output: 2340
🏁 Script executed:
cat -n dsperse/src/prove/prover.py | sed -n '183,215p'Repository: inference-labs-inc/dsperse
Length of output: 2105
Wrap conversion cleanup in try/finally to prevent resource leaks.
If _prove_single_slice raises an exception during execution (possible at line 220-230), the converted temporary directories will not be cleaned up. Use try/finally to ensure cleanup always runs, matching the pattern already used in _prove_packaged at lines 158-166.
🛠️ Suggested change
- result = self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
-
- if detected != "dirs":
- Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
- return result
+ try:
+ return self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
+ finally:
+ if detected != "dirs":
+ Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _handle_single_slice_proving(self, run_path: Path, model_dir: str | Path, detected: str, | |
| backend: str | None, tiles_range: range | list[int] | None) -> dict: | |
| """Internal: Handle proving for a single slice run directory.""" | |
| if backend not in (Backend.JSTPROVE, Backend.EZKL): | |
| raise ValueError("Single-slice proving requires explicit backend: 'jstprove' or 'ezkl'.") | |
| dirs_model_path = model_dir | |
| if detected != "dirs": | |
| dirs_model_path = Converter.convert(str(model_dir), output_type="dirs", cleanup=False) | |
| result = self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range) | |
| if detected != "dirs": | |
| Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True) | |
| return result | |
| def _handle_single_slice_proving(self, run_path: Path, model_dir: str | Path, detected: str, | |
| backend: str | None, tiles_range: range | list[int] | None) -> dict: | |
| """Internal: Handle proving for a single slice run directory.""" | |
| if backend not in (Backend.JSTPROVE, Backend.EZKL): | |
| raise ValueError("Single-slice proving requires explicit backend: 'jstprove' or 'ezkl'.") | |
| dirs_model_path = model_dir | |
| if detected != "dirs": | |
| dirs_model_path = Converter.convert(str(model_dir), output_type="dirs", cleanup=False) | |
| try: | |
| return self._prove_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range) | |
| finally: | |
| if detected != "dirs": | |
| Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True) |
🧰 Tools
🪛 Ruff (0.14.14)
203-203: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@dsperse/src/prove/prover.py` around lines 199 - 213, The temporary dirs
created when converting model_dir in _handle_single_slice_proving aren't
guaranteed to be cleaned up if _prove_single_slice raises; wrap the call to
self._prove_single_slice in a try/finally so that if detected != "dirs" you
always call Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))),
output_type=detected, cleanup=True) in the finally block; ensure you still
return the result when no exception and re-raise any exception after cleanup so
behavior of _handle_single_slice_proving, _prove_single_slice, and variables
dirs_model_path/detected remain consistent.
| @staticmethod | ||
| def resolve_prove_artifacts( | ||
| slice_dir: Path, | ||
| meta: RunSliceMetadata, | ||
| backend: str | None = None, | ||
| ) -> tuple[str | None, str | None, str | None]: | ||
| """Resolve circuit, pk, and settings paths under a given slice directory.""" | ||
| b = (backend or "").lower() | ||
| if b == Backend.EZKL: | ||
| circuit_rel = meta.ezkl_circuit_path or meta.circuit_path | ||
| pk_rel = meta.pk_path | ||
| settings_rel = meta.settings_path | ||
| elif b == Backend.JSTPROVE: | ||
| circuit_rel = meta.jstprove_circuit_path or meta.circuit_path | ||
| pk_rel = meta.pk_path | ||
| settings_rel = meta.settings_path | ||
| else: | ||
| circuit_rel = meta.circuit_path | ||
| pk_rel = meta.pk_path | ||
| settings_rel = meta.settings_path | ||
|
|
||
| circuit_path = Utils.resolve_under_slice(slice_dir, circuit_rel) | ||
| pk_path = Utils.resolve_under_slice(slice_dir, pk_rel) | ||
| settings_path = Utils.resolve_under_slice(slice_dir, settings_rel) | ||
|
|
||
| if settings_path and not os.path.exists(settings_path): | ||
| logger.warning(f"Settings file not found at {settings_path}; proceeding without it.") | ||
| settings_path = None | ||
|
|
||
| return circuit_path, pk_path, settings_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use backend-specific settings/PK paths when available.
resolve_prove_artifacts ignores ezkl_pk_path / ezkl_settings_path (and jstprove_settings_path), which can force pk_path=None even when backend-specific artifacts exist.
🛠️ Suggested change
if b == Backend.EZKL:
circuit_rel = meta.ezkl_circuit_path or meta.circuit_path
- pk_rel = meta.pk_path
- settings_rel = meta.settings_path
+ pk_rel = meta.ezkl_pk_path or meta.pk_path
+ settings_rel = meta.ezkl_settings_path or meta.settings_path
elif b == Backend.JSTPROVE:
circuit_rel = meta.jstprove_circuit_path or meta.circuit_path
pk_rel = meta.pk_path
- settings_rel = meta.settings_path
+ settings_rel = meta.jstprove_settings_path or meta.settings_path📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @staticmethod | |
| def resolve_prove_artifacts( | |
| slice_dir: Path, | |
| meta: RunSliceMetadata, | |
| backend: str | None = None, | |
| ) -> tuple[str | None, str | None, str | None]: | |
| """Resolve circuit, pk, and settings paths under a given slice directory.""" | |
| b = (backend or "").lower() | |
| if b == Backend.EZKL: | |
| circuit_rel = meta.ezkl_circuit_path or meta.circuit_path | |
| pk_rel = meta.pk_path | |
| settings_rel = meta.settings_path | |
| elif b == Backend.JSTPROVE: | |
| circuit_rel = meta.jstprove_circuit_path or meta.circuit_path | |
| pk_rel = meta.pk_path | |
| settings_rel = meta.settings_path | |
| else: | |
| circuit_rel = meta.circuit_path | |
| pk_rel = meta.pk_path | |
| settings_rel = meta.settings_path | |
| circuit_path = Utils.resolve_under_slice(slice_dir, circuit_rel) | |
| pk_path = Utils.resolve_under_slice(slice_dir, pk_rel) | |
| settings_path = Utils.resolve_under_slice(slice_dir, settings_rel) | |
| if settings_path and not os.path.exists(settings_path): | |
| logger.warning(f"Settings file not found at {settings_path}; proceeding without it.") | |
| settings_path = None | |
| return circuit_path, pk_path, settings_path | |
| `@staticmethod` | |
| def resolve_prove_artifacts( | |
| slice_dir: Path, | |
| meta: RunSliceMetadata, | |
| backend: str | None = None, | |
| ) -> tuple[str | None, str | None, str | None]: | |
| """Resolve circuit, pk, and settings paths under a given slice directory.""" | |
| b = (backend or "").lower() | |
| if b == Backend.EZKL: | |
| circuit_rel = meta.ezkl_circuit_path or meta.circuit_path | |
| pk_rel = meta.ezkl_pk_path or meta.pk_path | |
| settings_rel = meta.ezkl_settings_path or meta.settings_path | |
| elif b == Backend.JSTPROVE: | |
| circuit_rel = meta.jstprove_circuit_path or meta.circuit_path | |
| pk_rel = meta.pk_path | |
| settings_rel = meta.jstprove_settings_path or meta.settings_path | |
| else: | |
| circuit_rel = meta.circuit_path | |
| pk_rel = meta.pk_path | |
| settings_rel = meta.settings_path | |
| circuit_path = Utils.resolve_under_slice(slice_dir, circuit_rel) | |
| pk_path = Utils.resolve_under_slice(slice_dir, pk_rel) | |
| settings_path = Utils.resolve_under_slice(slice_dir, settings_rel) | |
| if settings_path and not os.path.exists(settings_path): | |
| logger.warning(f"Settings file not found at {settings_path}; proceeding without it.") | |
| settings_path = None | |
| return circuit_path, pk_path, settings_path |
🤖 Prompt for AI Agents
In `@dsperse/src/prove/utils/prover_utils.py` around lines 14 - 43, In
resolve_prove_artifacts, backend-specific PK/settings fields are not being used;
update the selection logic so when b == Backend.EZKL you use meta.ezkl_pk_path
(fallback to meta.pk_path) and meta.ezkl_settings_path (fallback to
meta.settings_path), and when b == Backend.JSTPROVE use meta.jstprove_pk_path
and meta.jstprove_settings_path similarly; then call Utils.resolve_under_slice
for pk_path and settings_path as already done and keep the existing existence
check for settings_path. Reference: resolve_prove_artifacts, Backend.EZKL,
Backend.JSTPROVE, meta.ezkl_pk_path, meta.ezkl_settings_path,
meta.jstprove_pk_path, meta.jstprove_settings_path, Utils.resolve_under_slice.
| has_ezkl = bool(group.ezkl_circuit_path) and (group.vk_path or group.ezkl_vk_path) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard EZKL execution when no runner is configured.
has_ezkl does not check self.ezkl_runner, so a group with EZKL paths but no runner will hit None.generate_witness() and crash instead of falling back.
🛠️ Proposed fix
- has_ezkl = bool(group.ezkl_circuit_path) and (group.vk_path or group.ezkl_vk_path)
+ has_ezkl = (
+ bool(group.ezkl_circuit_path)
+ and (group.vk_path or group.ezkl_vk_path)
+ and self.ezkl_runner is not None
+ )🤖 Prompt for AI Agents
In `@dsperse/src/run/channel_split_executor.py` around lines 51 - 52, The current
has_ezkl check (has_ezkl = bool(group.ezkl_circuit_path) and (group.vk_path or
group.ezkl_vk_path)) doesn’t verify that an EZKL runner exists, so calls to
self.ezkl_runner.generate_witness() can hit None; update the condition to also
require self.ezkl_runner (e.g., include "and self.ezkl_runner") or add an
explicit guard before invoking generate_witness() so groups with EZKL files but
no runner fall back safely; locate uses of has_ezkl and the call to
generate_witness() to apply the fix (symbols: has_ezkl, self.ezkl_runner,
generate_witness).
| def apply_bias(self, summed: torch.Tensor, channel_split: ChannelSplitInfo) -> torch.Tensor: | ||
| if channel_split.bias_path: | ||
| bias_path = self._resolve_path(channel_split.bias_path) | ||
| if bias_path and Path(bias_path).exists(): | ||
| bias_tensor = torch.from_numpy(np.load(bias_path)).reshape(1, -1, 1, 1) | ||
| summed = summed + bias_tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align bias tensor device/dtype with summed.
If summed is on GPU or uses a non-default dtype, adding a CPU float tensor will error or upcast unexpectedly.
🛠️ Proposed fix
if bias_path and Path(bias_path).exists():
bias_tensor = torch.from_numpy(np.load(bias_path)).reshape(1, -1, 1, 1)
- summed = summed + bias_tensor
+ bias_tensor = bias_tensor.to(device=summed.device, dtype=summed.dtype)
+ summed = summed + bias_tensor🤖 Prompt for AI Agents
In `@dsperse/src/run/channel_split_executor.py` around lines 108 - 113, In
apply_bias, the bias tensor loaded from numpy is created on CPU with default
dtype which can cause device/dtype mismatches when adding to summed; modify the
creation of bias_tensor in apply_bias (ChannelSplitInfo.bias_path handling) to
construct it on the same device and dtype as summed (e.g., use torch.as_tensor
or torch.from_numpy followed by to(device=summed.device, dtype=summed.dtype))
and then reshape to (1, -1, 1, 1) before adding so the addition uses matching
device and dtype.
| def verify_slice_logic(args: tuple) -> dict: | ||
| """Core logic for verifying a single slice (handles tuple for pool compatibility).""" | ||
| (slice_id, preferred, proof_path, circuit_path, settings_path, vk_path, input_path, output_path, witness_path, tiling_info, run_path, slice_dir) = args | ||
|
|
||
| result = SliceResult(slice_id=slice_id, success=False) | ||
| start_time = time.time() | ||
|
|
||
| if tiling_info: | ||
| tiling = TilingInfo.from_dict(tiling_info) | ||
| success, tile_verifs = VerifierUtils.verify_tile_batch(slice_id, Path(run_path), tiling.num_tiles, preferred, Path(slice_dir), RunSliceMetadata.from_dict({'tiling': tiling_info})) | ||
| result.success = success | ||
| result.method = ExecutionMethod.JSTPROVE_VERIFY if preferred == Backend.JSTPROVE else ExecutionMethod.EZKL_VERIFY | ||
| result.tiles = [TileResult.from_dict(v) for v in tile_verifs] | ||
| result.error = None if success else "One or more tiles failed verification" | ||
| else: | ||
| try: | ||
| ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), circuit_path, input_path, output_path, witness_path, settings_path, vk_path) | ||
| result.success = ok | ||
| result.method = ExecutionMethod.JSTPROVE_VERIFY if preferred == Backend.JSTPROVE else ExecutionMethod.EZKL_VERIFY | ||
| result.error = None if ok else "verification_failed" | ||
| except Exception as e: | ||
| result.error = str(e) | ||
| result.method = ExecutionMethod.JSTPROVE_VERIFY if preferred == Backend.JSTPROVE else ExecutionMethod.EZKL_VERIFY | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiled verification drops resolved artifact paths.
In the tiled branch, RunSliceMetadata is rebuilt with only tiling_info, so circuit/settings/vk paths are lost and verify_tile sees None. Use the already-resolved paths from args or pass full metadata.
🛠️ Suggested change
- success, tile_verifs = VerifierUtils.verify_tile_batch(slice_id, Path(run_path), tiling.num_tiles, preferred, Path(slice_dir), RunSliceMetadata.from_dict({'tiling': tiling_info}))
+ meta_dict = {
+ "tiling": tiling_info,
+ "circuit_path": circuit_path,
+ "jstprove_circuit_path": circuit_path,
+ "settings_path": settings_path,
+ "vk_path": vk_path,
+ }
+ success, tile_verifs = VerifierUtils.verify_tile_batch(
+ slice_id,
+ Path(run_path),
+ tiling.num_tiles,
+ preferred,
+ Path(slice_dir),
+ RunSliceMetadata.from_dict(meta_dict),
+ )🧰 Tools
🪛 Ruff (0.14.14)
257-257: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@dsperse/src/verify/utils/verifier_utils.py` around lines 237 - 260, In
verify_slice_logic, the tiled branch reconstructs
RunSliceMetadata.from_dict({'tiling': tiling_info}) losing resolved artifact
paths so downstream verify_tile/verify_tile_batch get None; fix by constructing
RunSliceMetadata with the full metadata (include circuit_path, settings_path,
vk_path, input_path, output_path, witness_path, proof_path or their resolved
filenames) or pass the already-resolved Path values directly into
VerifierUtils.verify_tile_batch/verify_tile so tile verification receives the
correct artifact paths; update the call site in verify_slice_logic (and the
tiling variable usage) to supply the complete metadata or direct paths rather
than only tiling_info.
| def __init__(self, parallel: int = 1): | ||
| """ | ||
| Initialize the verifier. | ||
|
|
||
| Args: | ||
| parallel: Number of parallel processes for verification (default: 1) | ||
| """ | ||
| self.parallel = max(1, parallel) | ||
| try: | ||
| self.ezkl_runner = EZKL() | ||
| except RuntimeError: | ||
| self.ezkl_runner = None | ||
| logger.warning("EZKL CLI not available. EZKL backend will be disabled.") | ||
|
|
||
| try: | ||
| self.jstprove_runner = JSTprove() | ||
| except Exception: | ||
| self.jstprove_runner = None | ||
| logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the JSTprove class definition
find . -type f -name "*.py" | xargs grep -l "class JSTprove"Repository: inference-labs-inc/dsperse
Length of output: 151
🏁 Script executed:
#!/bin/bash
# Search for imports and usage of JSTprove to understand the exception types
rg "JSTprove" --type py -A 3 -B 3Repository: inference-labs-inc/dsperse
Length of output: 50382
🏁 Script executed:
#!/bin/bash
# Check the verifier.py file to see the full context
cat -n dsperse/src/verify/verifier.py | head -50Repository: inference-labs-inc/dsperse
Length of output: 2052
🏁 Script executed:
cat -n dsperse/src/backends/jstprove.py | sed -n '35,55p'Repository: inference-labs-inc/dsperse
Length of output: 946
🏁 Script executed:
# Check what exceptions JSTprove.__init__ can raise
rg "class JSTprove" -A 40 dsperse/src/backends/jstprove.py | head -50Repository: inference-labs-inc/dsperse
Length of output: 1602
🏁 Script executed:
# Compare exception handling in Prover (similar pattern)
cat -n dsperse/src/prove/prover.py | sed -n '27,46p'Repository: inference-labs-inc/dsperse
Length of output: 935
🏁 Script executed:
# Check Runner to see if it has different exception handling for JSTprove
cat -n dsperse/src/run/runner.py | sed -n '27,46p'Repository: inference-labs-inc/dsperse
Length of output: 1011
🏁 Script executed:
# Verify what exceptions can actually be raised by JSTprove.__init__
cat -n dsperse/src/backends/jstprove.py | sed -n '36,65p'Repository: inference-labs-inc/dsperse
Length of output: 1385
Avoid swallowing unexpected JSTprove init failures.
Catching broad Exception hides real setup bugs. JSTprove only documents RuntimeError, so catch specific exception types to surface unexpected failures, and use logger.exception() to preserve the stack trace.
🛠️ Suggested change
- try:
- self.jstprove_runner = JSTprove()
- except Exception:
- self.jstprove_runner = None
- logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.")
+ try:
+ self.jstprove_runner = JSTprove()
+ except (RuntimeError, OSError):
+ self.jstprove_runner = None
+ logger.exception("JSTprove CLI not available. JSTprove backend will be disabled.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, parallel: int = 1): | |
| """ | |
| Initialize the verifier. | |
| Args: | |
| parallel: Number of parallel processes for verification (default: 1) | |
| """ | |
| self.parallel = max(1, parallel) | |
| try: | |
| self.ezkl_runner = EZKL() | |
| except RuntimeError: | |
| self.ezkl_runner = None | |
| logger.warning("EZKL CLI not available. EZKL backend will be disabled.") | |
| try: | |
| self.jstprove_runner = JSTprove() | |
| except Exception: | |
| self.jstprove_runner = None | |
| logger.warning("JSTprove CLI not available. JSTprove backend will be disabled.") | |
| def __init__(self, parallel: int = 1): | |
| """ | |
| Initialize the verifier. | |
| Args: | |
| parallel: Number of parallel processes for verification (default: 1) | |
| """ | |
| self.parallel = max(1, parallel) | |
| try: | |
| self.ezkl_runner = EZKL() | |
| except RuntimeError: | |
| self.ezkl_runner = None | |
| logger.warning("EZKL CLI not available. EZKL backend will be disabled.") | |
| try: | |
| self.jstprove_runner = JSTprove() | |
| except (RuntimeError, OSError): | |
| self.jstprove_runner = None | |
| logger.exception("JSTprove CLI not available. JSTprove backend will be disabled.") |
🧰 Tools
🪛 Ruff (0.14.14)
43-43: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@dsperse/src/verify/verifier.py` around lines 27 - 45, The JSTprove
initialization in Verifier.__init__ is catching a broad Exception and swallowing
tracebacks; change the second try/except to catch RuntimeError (like the EZKL
block) by using "except RuntimeError as e" for JSTprove(), set
self.jstprove_runner = None, and call logger.exception("JSTprove CLI not
available. JSTprove backend will be disabled.") so the stack trace is preserved
and unexpected errors aren't hidden (refer to jstprove_runner and
Verifier.__init__).
| def _handle_single_slice_verifying(self, run_path: Path, model_path: str | Path, detected: str, | ||
| backend: str | None, tiles_range: range | list[int] | None) -> dict: | ||
| """Internal: Handle verification for a single slice run directory.""" | ||
| if backend not in (Backend.JSTPROVE, Backend.EZKL): | ||
| raise ValueError("Single-slice verification requires explicit backend.") | ||
|
|
||
| dirs_model_path = model_path | ||
| if detected != "dirs": | ||
| dirs_model_path = Converter.convert(str(model_path), output_type="dirs", cleanup=False) | ||
|
|
||
| result = self._verify_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range) | ||
|
|
||
| if detected != "dirs": | ||
| Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True) | ||
| return result | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code in verifier.py
wc -l dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 104
🏁 Script executed:
# Read the specific lines mentioned in the review
sed -n '195,220p' dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 1577
🏁 Script executed:
# Let's also check the _verify_single_slice method to understand if it raises exceptions
rg -A 10 "def _verify_single_slice" dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 673
🏁 Script executed:
# Check for similar try/finally patterns in the same file for error handling
rg -B 2 -A 5 "try:" dsperse/src/verify/verifier.py | head -50Repository: inference-labs-inc/dsperse
Length of output: 1347
Wrap _verify_single_slice call in try/finally to ensure temp dirs are cleaned up on failures.
The conversion creates temporary directories with cleanup=False, but if _verify_single_slice() raises an exception, the cleanup code never executes, leaving temporary files behind. The codebase already uses this try/finally pattern in a similar verification method.
🛠️ Suggested change
- result = self._verify_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
-
- if detected != "dirs":
- Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
- return result
+ try:
+ return self._verify_single_slice(run_path, dirs_model_path, detected, backend, tiles_range=tiles_range)
+ finally:
+ if detected != "dirs":
+ Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)🧰 Tools
🪛 Ruff (0.14.14)
202-202: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@dsperse/src/verify/verifier.py` around lines 198 - 213, The call to
_verify_single_slice in _handle_single_slice_verifying can raise and currently
skips the cleanup Converter.convert(..., cleanup=True); wrap the
_verify_single_slice invocation in a try/finally block and move the
Converter.convert cleanup call into the finally so temporary "dirs" output is
always removed when detected != "dirs"; keep the initial conversion that sets
dirs_model_path as-is and use Utils.dirs_root_from(Path(dirs_model_path)) in the
finally to compute the path for Converter.convert.
| proof_path = run_path / "proof.json" | ||
| if not proof_path.exists(): raise FileNotFoundError(f"Proof file not found at {proof_path}") | ||
|
|
||
| input_path, output_path = run_path / "input.json", run_path / "output.json" | ||
| wf = VerifierUtils.get_witness_file(run_path, slice_id) | ||
| witness_path = Path(wf) if wf else (run_path / "output_witness.bin") | ||
|
|
||
| circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path) | ||
| settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path) | ||
| vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path) | ||
|
|
||
| ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path), settings_path, vk_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "verifier.py" -type fRepository: inference-labs-inc/dsperse
Length of output: 102
🏁 Script executed:
find . -path "*/verify/*" -name "*.py" -type f | head -20Repository: inference-labs-inc/dsperse
Length of output: 219
🏁 Script executed:
wc -l dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 104
🏁 Script executed:
sed -n '240,260p' dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 1336
🏁 Script executed:
# Search for run-root verification pattern
rg -n "verify_run_root" dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Search for settings_path and vk_path validation patterns
rg -n "settings_path\|vk_path" dsperse/src/verify/verifier.py -A 2Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Get the full context of the verifier.py file structure
head -100 dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 4311
🏁 Script executed:
# Search for run-root or root verification
rg -n "run.root\|run_root\|root_path" dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Search for Backend.EZKL references
rg -n "Backend\|EZKL" dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Get the full context around lines 244-255
sed -n '200,280p' dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 4164
🏁 Script executed:
# Find all method definitions to understand code structure
rg -n "^\s*def " dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 1268
🏁 Script executed:
# Look at the full _prepare_work_items to see the validation
sed -n '75,112p' dsperse/src/verify/verifier.pyRepository: inference-labs-inc/dsperse
Length of output: 2198
🏁 Script executed:
# Find verify_with_backend definition
rg -n "def verify_with_backend" dsperse/src/verify/Repository: inference-labs-inc/dsperse
Length of output: 145
🏁 Script executed:
# Check verify_with_backend implementation
sed -n '115,160p' dsperse/src/verify/utils/verifier_utils.pyRepository: inference-labs-inc/dsperse
Length of output: 1736
🏁 Script executed:
# Check if there are any other calls to verify_with_backend
rg -n "verify_with_backend" dsperse/src/verify/Repository: inference-labs-inc/dsperse
Length of output: 979
🏁 Script executed:
# Check EZKL backend verify method to see how it uses settings_path and vk_path
find . -path "*/backends/ezkl*" -name "*.py" -type fRepository: inference-labs-inc/dsperse
Length of output: 100
🏁 Script executed:
# Get more details on EZKL verify method
rg -n "def verify" dsperse/src/backends/ezkl.py -A 15Repository: inference-labs-inc/dsperse
Length of output: 1047
Add explicit settings/vk file validation to single-slice EZKL verification.
The run-root verification path (_prepare_work_items) validates settings and vk files for EZKL before verification, skipping gracefully with warnings. The single-slice path (_verify_single_slice) skips these checks, relying on the backend to fail later. Add explicit validation to match the run-root pattern and fail fast with clear errors.
🛠️ Suggested change
circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path)
settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path)
vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path)
+ if preferred == Backend.EZKL:
+ if not settings_path or not os.path.exists(settings_path):
+ raise FileNotFoundError(f"Settings file not found at {settings_path}")
+ if not vk_path or not os.path.exists(vk_path):
+ raise FileNotFoundError(f"Verification key not found at {vk_path}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| proof_path = run_path / "proof.json" | |
| if not proof_path.exists(): raise FileNotFoundError(f"Proof file not found at {proof_path}") | |
| input_path, output_path = run_path / "input.json", run_path / "output.json" | |
| wf = VerifierUtils.get_witness_file(run_path, slice_id) | |
| witness_path = Path(wf) if wf else (run_path / "output_witness.bin") | |
| circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path) | |
| settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path) | |
| vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path) | |
| ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path), settings_path, vk_path) | |
| proof_path = run_path / "proof.json" | |
| if not proof_path.exists(): raise FileNotFoundError(f"Proof file not found at {proof_path}") | |
| input_path, output_path = run_path / "input.json", run_path / "output.json" | |
| wf = VerifierUtils.get_witness_file(run_path, slice_id) | |
| witness_path = Path(wf) if wf else (run_path / "output_witness.bin") | |
| circuit_path = Utils.resolve_under_slice(slice_dir, meta.jstprove_circuit_path or meta.circuit_path) | |
| settings_path = Utils.resolve_under_slice(slice_dir, meta.settings_path) | |
| vk_path = Utils.resolve_under_slice(slice_dir, meta.vk_path) | |
| if preferred == Backend.EZKL: | |
| if not settings_path or not os.path.exists(settings_path): | |
| raise FileNotFoundError(f"Settings file not found at {settings_path}") | |
| if not vk_path or not os.path.exists(vk_path): | |
| raise FileNotFoundError(f"Verification key not found at {vk_path}") | |
| ok = VerifierUtils.verify_with_backend(preferred, str(proof_path), str(circuit_path), str(input_path), str(output_path), str(witness_path), settings_path, vk_path) |
🧰 Tools
🪛 Ruff (0.14.14)
245-245: Multiple statements on one line (colon)
(E701)
245-245: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@dsperse/src/verify/verifier.py` around lines 244 - 255, The single-slice
verifier (_verify_single_slice) resolves settings_path and vk_path but doesn't
validate they exist before calling VerifierUtils.verify_with_backend; add the
same pre-checks used in _prepare_work_items: after computing settings_path and
vk_path (via Utils.resolve_under_slice using slice_dir and
meta.settings_path/meta.vk_path), if preferred indicates the EZKL backend (check
the preferred variable or backend string used elsewhere) then verify
settings_path.exists() and vk_path.exists() and raise FileNotFoundError with
clear messages (or fail fast) if missing so verification errors are surfaced
early before calling VerifierUtils.verify_with_backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dsperse/src/backends/jstprove.py`:
- Around line 383-389: The code redundantly calls .long() when building scaled
and fails to validate untrusted scale parameters; remove the extra conversion by
changing the final conversion to use flat.tolist() (since flat is already long
after rounding) and add validation checks for scale_base and scale_exponent
before computing scale = scale_base ** scale_exponent (e.g., ensure scale_base
is within a safe positive range like [0.1, 1e6] and scale_exponent is an integer
within a bounded range like [-100, 100], reject or clamp values outside these
bounds), then only perform the scaling/rounding when parameters pass validation
to prevent overflow/underflow/precision loss (references: variables/functions
raw_output, flat, scale_base, scale_exponent, scaled).
🧹 Nitpick comments (19)
dsperse/src/backends/jstprove.py (5)
112-114: Unused parameters should be prefixed with underscore or removed.The
vk_pathandsettings_pathparameters are declared but never used ingenerate_witness. If kept for API compatibility, prefix them with underscore to indicate intentional non-use.♻️ Suggested fix
def generate_witness( self, input_file: Union[str, Path], model_path: Union[str, Path], # This is the circuit path in JSTprove context output_file: Union[str, Path], - vk_path: Optional[Union[str, Path]] = None, - settings_path: Optional[Union[str, Path]] = None + _vk_path: Optional[Union[str, Path]] = None, + _settings_path: Optional[Union[str, Path]] = None ) -> Tuple[bool, Any]:
133-133: Multiple statements on one line reduces readability.Split the conditional and raise onto separate lines per PEP 8.
♻️ Suggested fix
- if not ok: raise RuntimeError(f"Circuit compilation failed: {err}") + if not ok: + raise RuntimeError(f"Circuit compilation failed: {err}")
147-148: Uselogger.exceptioninstead oflogger.errorwhen logging in exception handlers.
logger.exceptionautomatically includes the traceback, providing better diagnostics for debugging failures.♻️ Suggested fix
except RuntimeError as e: - logger.error(f"Witness generation failed: {e}") + logger.exception(f"Witness generation failed: {e}") return False, str(e) @@ except Exception as e: - logger.error(f"Failed to process witness output: {e}") + logger.exception(f"Failed to process witness output: {e}") return False, str(e)Also applies to: 156-158
221-222: Silently swallowing exceptions in batch processing loses error context.The inner try-except in the batch result processing loop catches all exceptions and converts them to
(False, str(e)). Consider logging these errors to aid debugging when batch jobs fail partially.♻️ Suggested improvement
except Exception as e: + logger.warning(f"Failed to process batch output for job: {e}") results.append((False, str(e)))
333-334: Unused parameters inverifyandcompile_circuitshould be marked.The
settings_pathandvk_pathparameters inverify, andsettings_pathincompile_circuitare unused. Prefix with underscore for clarity.Also applies to: 401-401
dsperse/src/run/tile_executor.py (3)
92-94: Use explicitOptionaltype annotation forbackendparameter.PEP 484 prohibits implicit
Optional. The parameter defaults toNonebut is typed asstr.♻️ Suggested fix
def get_execution_config( - self, tiling: TilingInfo, meta: RunSliceMetadata, backend: str = None, has_jst_runner: bool = False + self, tiling: TilingInfo, meta: RunSliceMetadata, backend: str | None = None, has_jst_runner: bool = False ) -> dict:
239-251:collect_outputsraises on any tile failure, preventing partial result collection.When a tile fails,
RuntimeErroris raised immediately, losing information about tiles that succeeded. For debugging purposes, consider collecting all results first, then raising with a summary of failures.💡 Alternative approach
def collect_outputs(self, results_map: dict, num_tiles: int, slice_idx: int, tiling: TilingInfo) -> list[TileResult]: tile_exec_infos = [] failures = [] for tile_idx in range(num_tiles): result = results_map[tile_idx] if result['success']: output = self.process_result(result, tiling, slice_idx, tile_idx) if output is not None: tile_exec_infos.append(TileResult(tile_idx=tile_idx, success=True, method=result.get('method', 'unknown'))) else: failures.append(f"Tile {tile_idx}: no output tensor") else: failures.append(f"Tile {tile_idx}: {result.get('error', 'unknown')}") if failures: raise RuntimeError(f"Tile execution failed:\n" + "\n".join(failures)) return tile_exec_infos
253-256: Lazy import inside method may cause repeated import overhead.The import of
RunnerUtilsinside_resolve_pathis executed on every call. Consider moving it to the top of the file or caching at class level if there's a circular import concern.♻️ Suggested fix if no circular import
Move to top-level imports:
from dsperse.src.run.utils.runner_utils import RunnerUtilsdsperse/src/run/runner.py (6)
29-35: Use explicitOptionaltype annotations for constructor parameters.Parameters that default to
Noneshould use explicit union syntax for PEP 484 compliance.♻️ Suggested fix
def __init__( self, - run_dir: str = None, + run_dir: str | None = None, threads: int = 1, on_slice_complete: Optional[SliceCompleteCallback] = None, batch: bool = False, ):
351-351: Addstrict=Truetozip()for length mismatch detection.When zipping
tile_args_listandresults, a length mismatch would silently truncate. Usingstrict=Trueensures both iterables have equal length.♻️ Suggested fix
- for args, (success, output) in zip(tile_args_list, results): + for args, (success, output) in zip(tile_args_list, results, strict=True):
413-413: Rename unusedinput_namevariable to satisfy linter.The unpacked variable is never used. Prefix with underscore.
♻️ Suggested fix
- input_tensor, output_shape, input_name = cs_executor.prepare_config(meta) + input_tensor, output_shape, _input_name = cs_executor.prepare_config(meta)
497-497: Multiple statements on one line reduces readability.Split the for-loop body onto separate lines.
♻️ Suggested fix
- for oname in info.dependencies.output: tensor_cache[oname] = cached_tensor + for oname in info.dependencies.output: + tensor_cache[oname] = cached_tensor
543-544: Use a custom exception class instead of genericException.Raising bare
Exceptionmakes it harder for callers to catch specific errors. Consider usingValueErroror a customSlicePathError.♻️ Suggested fix
if slice_path is None or not Path(slice_path).exists(): - raise Exception("A valid path must be provided for slices") + raise ValueError("A valid path must be provided for slices")
167-167: Multiple methods have implicit Optional parameters.Several methods use
= Nonedefaults without explicit| Nonetype annotations. Consider updating for PEP 484 compliance:run_ezkl_inference,run_jstprove_inference,run_tiled_inference,run_tiles,run_channel_split_inference,run_onnx_single,_run, andrun.Also applies to: 197-197, 227-227, 243-243, 405-405, 459-459, 476-476, 530-530
dsperse/src/run/utils/runner_utils.py (5)
175-175: Remove unused parameters frompreprocess_input.The
model_directoryandsave_reshapeparameters are declared but never used.♻️ Suggested fix
`@staticmethod` - def preprocess_input(input_path: str, model_directory: str = None, save_reshape: bool = False) -> torch.Tensor: + def preprocess_input(input_path: str) -> torch.Tensor: """Preprocess input data from JSON."""
411-418: Silent exception swallowing hides errors.The
try-except-passpattern at lines 416-417 silently ignores JSON parsing errors or other issues when checking slice completion. At minimum, log the exception.♻️ Suggested fix
try: with open(output_file, 'r') as f: data = json.load(f) return True, data - except Exception: - pass + except Exception as e: + logger.debug(f"Failed to load output for {slice_id}: {e}") return False, None
453-453: Remove unusedexec_infoparameter fromprocess_inference_result.The
exec_infoparameter is passed but never used in the method body.♻️ Suggested fix
`@staticmethod` - def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, exec_info, tensor_cache: dict) -> torch.Tensor | None: + def process_inference_result(slice_id: str, info: RunSliceMetadata, ok: bool, result, tensor_cache: dict) -> torch.Tensor | None:Note: This requires updating the call site in
runner.pyline 511.
465-465: Usenext(iter(...))instead oflist(...)[0]for efficiency.Creating a full list just to access the first element is wasteful.
♻️ Suggested fix
- first_output = list(result['output_tensors'].values())[0] if result['output_tensors'] else None + first_output = next(iter(result['output_tensors'].values()), None) if result['output_tensors'] else None
420-441: Consider using structured return type fortry_resume_slice.The method returns a 3-tuple
(bool, Tensor | None, ExecutionInfo | None). A named tuple or dataclass would improve readability and prevent positional errors at call sites.
| flat = torch.tensor(raw_output).flatten() | ||
|
|
||
| if flat.is_floating_point(): | ||
| scale = scale_base ** scale_exponent | ||
| flat = torch.round(flat * scale).long() | ||
|
|
||
| scaled = flat.long().tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the context around the flagged lines
cat -n dsperse/src/backends/jstprove.py | sed -n '370,400p'Repository: inference-labs-inc/dsperse
Length of output: 1190
🏁 Script executed:
# Check how scale_base and scale_exponent are set in metadata
rg -n "scale_base|scale_exponent" dsperse/ --type py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 886
🏁 Script executed:
# Search for where these values come from and how they're validated
rg -n "scale_base\s*=" dsperse/ --type py -B1 -A1Repository: inference-labs-inc/dsperse
Length of output: 285
🏁 Script executed:
# Look for any test files related to this module
fd -e py ".*test.*jstprove" dsperse/Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Let's see the full function to understand the context better
cat -n dsperse/src/backends/jstprove.py | sed -n '360,395p'Repository: inference-labs-inc/dsperse
Length of output: 1472
🏁 Script executed:
# Search for where this metadata is created or where prepare_veri_output is called
rg -n "prepare_veri_output" dsperse/ --type py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Search for any documentation or constants related to scale_base/scale_exponent
rg -n "scale_base|scale_exponent" . --type py --type md --type txt -B3 -A3Repository: inference-labs-inc/dsperse
Length of output: 1240
🏁 Script executed:
# Search for where the metadata file is created or where scale_base/scale_exponent are set
rg -n "_metadata.json|scale_base|scale_exponent" dsperse/ --type py -B3 -A3 | head -60Repository: inference-labs-inc/dsperse
Length of output: 4411
🏁 Script executed:
# Look for any examples, tests, or config files that show typical values
fd -e json . dsperse/ | head -20Repository: inference-labs-inc/dsperse
Length of output: 227
🏁 Script executed:
# Search for documentation or comments about the scaling operation
rg -n "scale" dsperse/src/backends/jstprove.py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 1570
🏁 Script executed:
# Search for where scale_base and scale_exponent are set/generated in the codebase
rg -n "scale_base.*=" dsperse/ --type py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 452
🏁 Script executed:
# Look for any circuit metadata generation
rg -n "circuit.*metadata|_metadata.json" dsperse/ --type py -B1 -A1 | head -40Repository: inference-labs-inc/dsperse
Length of output: 1906
🏁 Script executed:
# Check if there are example metadata files or any documentation
find dsperse/ -name "*metadata*" -type fRepository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Search for where circuit metadata is generated with scale parameters
rg -n "scale_base|scale_exponent" . --type py -B5 -A5 | grep -A10 -B10 "scale_base.*:"Repository: inference-labs-inc/dsperse
Length of output: 1206
🏁 Script executed:
# Look for any generation or compilation logic that might create these metadata files
rg -n "compile_circuit|generate.*metadata|_metadata" dsperse/src/backends/ --type py -A5 | head -50Repository: inference-labs-inc/dsperse
Length of output: 3559
🏁 Script executed:
# Check if there's documentation or README about the metadata format
find dsperse/ -name "README*" -o -name "*.md" | xargs grep -l "scale_base\|scale_exponent" 2>/dev/null || echo "No matches in docs"Repository: inference-labs-inc/dsperse
Length of output: 88
🏁 Script executed:
# Look at the compile_circuit function mentioned in the file to understand metadata better
cat -n dsperse/src/backends/jstprove.py | sed -n '397,430p'Repository: inference-labs-inc/dsperse
Length of output: 1477
🏁 Script executed:
# Continue reading the compile_circuit function to see if it generates metadata
cat -n dsperse/src/backends/jstprove.py | sed -n '425,480p'Repository: inference-labs-inc/dsperse
Length of output: 2903
🏁 Script executed:
# Search for where _metadata.json files are created or written
rg -n "_metadata\.json" dsperse/ --type py -B3 -A3Repository: inference-labs-inc/dsperse
Length of output: 3272
🏁 Script executed:
# Look for the _run_command method to see what external commands are called
rg -n "_run_command" dsperse/src/backends/jstprove.py -B2 -A8 | head -60Repository: inference-labs-inc/dsperse
Length of output: 2027
🏁 Script executed:
# Look for test files or examples that might show metadata structure
find dsperse/ -type f -name "*.json" | xargs grep -l "scale_base\|scale_exponent" 2>/dev/null | head -5Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Search for JSTproveUtils which might have more context
rg -n "class JSTproveUtils|def.*scale" dsperse/src/backends/jstprove.py -A5Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Check if there are any validation or bounds checking for the scale parameters elsewhere
rg -n "scale.*base|scale.*exponent|scale_base|scale_exponent" dsperse/ --type py | grep -v "metadata.get"Repository: inference-labs-inc/dsperse
Length of output: 249
🏁 Script executed:
# Look at the JSTproveUtils module to understand the scaling context better
cat -n dsperse/src/backends/jstprove.py | sed -n '1,60p'Repository: inference-labs-inc/dsperse
Length of output: 2446
🏁 Script executed:
# Search for any usage of the _prepare_verification_output function
rg -n "_prepare_verification_output" dsperse/ --type py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 854
🏁 Script executed:
# Look for the JSTproveUtils class to understand what convert_to_logits and other utilities do
rg -n "class JSTproveUtils" dsperse/src/backends/utils/ --type py -A20Repository: inference-labs-inc/dsperse
Length of output: 2125
🏁 Script executed:
# Check if there are any tests for the jstprove backend
find dsperse/ -type f -name "*test*jstprove*" -o -name "*jstprove*test*"Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Look at lines 385-389 more carefully - check if there's redundancy
cat -n dsperse/src/backends/jstprove.py | sed -n '383,395p'Repository: inference-labs-inc/dsperse
Length of output: 560
🏁 Script executed:
# Search for where circuit metadata files are actually created or populated
rg -n "circuit.*metadata\|_metadata.*write\|json.dump.*scale" dsperse/ --type py -B2 -A2Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Check if there are any example metadata files or documentation
find dsperse/ -name "*metadata*" -type f -exec file {} \;Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Look for any external JSTprove tool integration that might create these metadata files
rg -n "jstprove|JSTprove" dsperse/src/backends/jstprove.py | grep -i "metadata\|scale" | head -10Repository: inference-labs-inc/dsperse
Length of output: 254
🏁 Script executed:
# Verify the line 389 behavior - is .long() on an already-long tensor problematic?
cat -n dsperse/src/backends/jstprove.py | sed -n '383,395p'Repository: inference-labs-inc/dsperse
Length of output: 560
🏁 Script executed:
# Search for any constants or defaults related to scale parameters in the entire codebase
rg -n "scale_base|scale_exponent" dsperse/ --type py --type jsonRepository: inference-labs-inc/dsperse
Length of output: 427
🏁 Script executed:
# Look for any documentation or comments about the metadata format in the jstprove_utils
rg -n "metadata\|scale" dsperse/src/backends/utils/jstprove_utils.py -B2 -A2 | head -50Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Check if the metadata file is created by the external JSTprove command
cat -n dsperse/src/backends/jstprove.py | sed -n '428,445p'Repository: inference-labs-inc/dsperse
Length of output: 812
🏁 Script executed:
# Search the entire repo for any README or documentation mentioning scale or metadata
find dsperse/ -type f \( -name "*.md" -o -name "README*" -o -name "*.txt" \) -exec grep -l "scale\|metadata\|JSTprove" {} \;Repository: inference-labs-inc/dsperse
Length of output: 52
🏁 Script executed:
# Check if there's a git history or comments about why .long() is called twice
rg -n "\.long()" dsperse/src/backends/jstprove.py -B3 -A1Repository: inference-labs-inc/dsperse
Length of output: 276
Remove redundant .long() call and add validation for untrusted scale parameters.
Line 389 redundantly calls .long() on a tensor that was already converted to long on line 387. Additionally, the scale_base and scale_exponent parameters from metadata lack validation—they are untrusted external input from the JSTprove CLI with no bounds checking. Large exponents or extreme base values could cause numeric overflow/underflow or precision loss. Add parameter validation with reasonable bounds before the scaling operation.
🤖 Prompt for AI Agents
In `@dsperse/src/backends/jstprove.py` around lines 383 - 389, The code
redundantly calls .long() when building scaled and fails to validate untrusted
scale parameters; remove the extra conversion by changing the final conversion
to use flat.tolist() (since flat is already long after rounding) and add
validation checks for scale_base and scale_exponent before computing scale =
scale_base ** scale_exponent (e.g., ensure scale_base is within a safe positive
range like [0.1, 1e6] and scale_exponent is an integer within a bounded range
like [-100, 100], reject or clamp values outside these bounds), then only
perform the scaling/rounding when parameters pass validation to prevent
overflow/underflow/precision loss (references: variables/functions raw_output,
flat, scale_base, scale_exponent, scaled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dsperse/src/backends/onnx_models.py (1)
281-287:⚠️ Potential issue | 🟠 MajorError handling may raise
NameErrorand mask original exceptions.If
_create_sessionfails at line 154,sessionis never assigned, causingNameErrorat line 284. Similarly,input_numpymay be undefined if tensor conversion fails. This masks the original exception.🐛 Proposed fix with defensive checks
except Exception as e: logger.error(f"Error in apply_onnx_shape: {e}") # In case of error, return the original tensor with the first input name - if len(session.get_inputs()) > 0: - return {session.get_inputs()[0].name: input_numpy} - else: - return {"input": input_numpy} + try: + # Attempt to get input name from session if available + if 'session' in dir() and session is not None: + inputs = session.get_inputs() + if inputs: + input_name = inputs[0].name + else: + input_name = "input" + else: + input_name = "input" + + # Use input_numpy if available, otherwise convert input_tensor + if 'input_numpy' not in dir() or input_numpy is None: + if isinstance(input_tensor, torch.Tensor): + input_numpy = input_tensor.detach().cpu().numpy() + else: + input_numpy = np.asarray(input_tensor) + + return {input_name: input_numpy} + except Exception: + # Last resort fallback + logger.exception("Error in fallback handling") + raise e
🤖 Fix all issues with AI agents
In `@dsperse/src/backends/onnx_models.py`:
- Around line 112-113: The code silently truncates the flattened array when
flat.size > elements_needed (flat = flat[:elements_needed]), which can hide
input mismatches; update the branch to emit a warning before truncation using
the module/logger used elsewhere in this file (match the logger pattern used
around line 204), including the array name/shape and elements_needed, then
perform the slice (keep the same variable names: flat and elements_needed) so
callers and logs show the discrepancy.
🧹 Nitpick comments (2)
dsperse/src/backends/onnx_models.py (2)
31-31: Use explicitstr | Nonetype hints for optional parameters.Static analysis flags implicit
Optionalon lines 31, 50, 63, and 119. PEP 484 recommends explicit union types.♻️ Suggested fix
`@staticmethod` - def _process_outputs(session: ort.InferenceSession, raw_output: list, output_file: str = None) -> dict: + def _process_outputs(session: ort.InferenceSession, raw_output: list, output_file: str | None = None) -> dict:Apply the same pattern to
run_inference,run_inference_tensor, andrun_inference_multi.
53-57: Double session creation per inference call is inefficient.
run_inferencecreates a session at line 53, thenapply_onnx_shape(line 55) creates another at line 154. The same pattern exists inrun_inference_tensor. For large models, this doubles the model loading overhead.Consider either passing the session to
apply_onnx_shapeor extracting input metadata without full session creation.
| elif flat.size > elements_needed: | ||
| flat = flat[:elements_needed] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent data truncation can mask input errors.
When the input array has more elements than expected, data is silently discarded. Consider logging a warning to aid debugging, similar to the pattern used elsewhere in this file (e.g., line 204).
🐛 Suggested fix
elif flat.size > elements_needed:
+ logger.warning(
+ f"Truncating input: expected {elements_needed} elements, got {flat.size}"
+ )
flat = flat[:elements_needed]🤖 Prompt for AI Agents
In `@dsperse/src/backends/onnx_models.py` around lines 112 - 113, The code
silently truncates the flattened array when flat.size > elements_needed (flat =
flat[:elements_needed]), which can hide input mismatches; update the branch to
emit a warning before truncation using the module/logger used elsewhere in this
file (match the logger pattern used around line 204), including the array
name/shape and elements_needed, then perform the slice (keep the same variable
names: flat and elements_needed) so callers and logs show the discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dsperse/src/backends/jstprove.py`:
- Around line 197-201: The fixed manifest filename (manifest_path =
manifest_parent / "batch_witness_manifest.json") can be clobbered by concurrent
runs; update the code that computes manifest_path (located around
manifest_parent, manifest_dir, circuit_path, and where json.dump({"jobs": jobs},
f) is called) to generate a unique filename per run (e.g., append a UUID,
timestamp+PID, or use tempfile to create a unique temp file) and write the
manifest to that unique path, ensuring manifest_parent.mkdir(...) remains and
the rest of the code that consumes the manifest uses the new unique
manifest_path.
| manifest_parent = Path(manifest_dir) if manifest_dir else circuit_path.parent | ||
| manifest_parent.mkdir(parents=True, exist_ok=True) | ||
| manifest_path = manifest_parent / "batch_witness_manifest.json" | ||
| with open(manifest_path, "w") as f: | ||
| json.dump({"jobs": jobs}, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch manifest filename can collide across concurrent runs.
Using a fixed name (batch_witness_manifest.json) risks clobbering when multiple batches run in the same directory. Consider a unique filename (UUID / tempfile) to avoid races.
💡 Example tweak
- manifest_path = manifest_parent / "batch_witness_manifest.json"
+ manifest_path = manifest_parent / f"batch_witness_manifest_{os.getpid()}.json"🤖 Prompt for AI Agents
In `@dsperse/src/backends/jstprove.py` around lines 197 - 201, The fixed manifest
filename (manifest_path = manifest_parent / "batch_witness_manifest.json") can
be clobbered by concurrent runs; update the code that computes manifest_path
(located around manifest_parent, manifest_dir, circuit_path, and where
json.dump({"jobs": jobs}, f) is called) to generate a unique filename per run
(e.g., append a UUID, timestamp+PID, or use tempfile to create a unique temp
file) and write the manifest to that unique path, ensuring
manifest_parent.mkdir(...) remains and the rest of the code that consumes the
manifest uses the new unique manifest_path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
install.sh (1)
213-214:⚠️ Potential issue | 🟡 MinorUpdate or remove stale JSTprove references in Open MPI installation.
The
install_openmpifunction (line 213) and its call site (line 436) still reference JSTprove, but JSTprove installation has been removed from this script.If Open MPI is still needed for other parallel execution features in the new backend, update the comments to reflect the actual purpose. If it's no longer needed, consider removing this installation step.
Proposed fix if Open MPI is still needed
-# Install Open MPI (required for JSTprove) +# Install Open MPI (for parallel execution backends) install_openmpi() {- # Install Open MPI (required for JSTprove) + # Install Open MPI (for parallel execution) install_openmpiAlso applies to: 436-437
🤖 Fix all issues with AI agents
In `@install.sh`:
- Line 7: Update the stale comment "Open MPI (required for JSTprove)" in
install.sh: either remove that line entirely if Open MPI is no longer required,
or replace it with an accurate dependency note (e.g., "Open MPI" or the actual
reason it's needed). Locate the exact comment string "Open MPI (required for
JSTprove)" and edit or delete it so the header no longer references JSTprove.
| # - Dsperse CLI (python package, console script: dsperse) | ||
| # - EZKL CLI (if missing) | ||
| # - EZKL lookup tables (if missing) | ||
| # - Open MPI (required for JSTprove) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale reference to JSTprove in header comment.
The script description on line 2 was updated to remove JSTprove, but line 7 still mentions "Open MPI (required for JSTprove)". Since JSTprove installation was removed from this script, this comment is now misleading.
Proposed fix
-# - Open MPI (required for JSTprove)
+# - Open MPI (optional, for parallel execution)Or remove the line entirely if Open MPI is no longer a dependency.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # - Open MPI (required for JSTprove) | |
| # - Open MPI (optional, for parallel execution) |
🤖 Prompt for AI Agents
In `@install.sh` at line 7, Update the stale comment "Open MPI (required for
JSTprove)" in install.sh: either remove that line entirely if Open MPI is no
longer required, or replace it with an accurate dependency note (e.g., "Open
MPI" or the actual reason it's needed). Locate the exact comment string "Open
MPI (required for JSTprove)" and edit or delete it so the header no longer
references JSTprove.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.