Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking by kalidke · Pull Request #10 · JuliaSMLM/SMLMBoxer.jl

kalidke · 2026-02-02T00:37:59Z

Summary

Major update adding ecosystem-aligned patterns and memory improvements:

Tuple-pattern return: getboxes() now returns (ROIBatch, BoxesInfo) tuple
BoxerConfig struct: Config-based calling convention alongside kwargs
CPU memory batching: Memory-aware batching for CPU path (mirrors GPU)
Batch tracking: BoxesInfo includes n_rois, batch_size, n_batches, memory_per_batch
Timing in seconds: Changed elapsed_ns to elapsed_s::Float64
Refactored batching: Extracted _process_with_batching() helper to deduplicate GPU/CPU paths

Breaking Changes

getboxes() returns (ROIBatch, BoxesInfo) instead of just ROIBatch
BoxesInfo.elapsed_ns renamed to BoxesInfo.elapsed_s (now Float64 seconds)

New Features

BoxerConfig

config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
(rois, info) = getboxes(data, camera, config)

Two Calling Conventions

# Config-based (recommended for reusable settings)
(rois, info) = getboxes(data, camera, config)

# Kwargs-based (convenient for one-off calls)
(rois, info) = getboxes(data, camera; psf_sigma=0.13, min_photons=500.0)

BoxesInfo with Batch Tracking

BoxesInfo(2 ROIs, 12.3 ms, cpu, 1 batches × 10, 234.4 KB/batch)

Test Plan

All 80 unit tests pass
GPU tests pass on descent (92x-406x speedup)
Performance benchmarks pass
GPU wait/timeout tests pass

🤖 Generated with Claude Code

BREAKING CHANGE: getboxes now returns a tuple instead of a single value. Changes: - Add BoxesInfo struct with backend, elapsed_ns, device_id fields - Capture wall time via time_ns() during processing - Track GPU device ID (0-based) or -1 for CPU - Export BoxesInfo from SMLMBoxer module - Update all tests, examples, and documentation Part of JuliaSMLM ecosystem-wide tuple-pattern rollout for consistent metadata return across packages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The CPU path in getboxes previously processed entire image stacks at once, causing OOM on large inputs. This mirrors the GPU batching logic: - Use Sys.free_memory() to determine available RAM - Calculate batch size based on memory requirements (6x for standard, 10x for sCMOS) - Process large stacks in batches with proper frame offset tracking - Call GC.gc(false) between batches to release allocations Fixes issue #11. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Aligns with GaussMLE FitInfo struct. BoxesInfo now includes: - n_rois: Number of ROIs detected - batch_size: Frames per batch during processing - n_batches: Number of batches processed - memory_per_batch: Estimated memory per batch in bytes Added show() method displaying: BoxesInfo(N ROIs, X ms, backend, B batches × S, M/batch) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Aligns with ecosystem convention requested by Keith. Updated all references in src, tests, docs, and examples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…U paths Reduces ~90 lines of duplicated batching logic to ~50 lines. Both GPU and CPU paths now call the same helper with different parameters: - GPU: no batch_cleanup - CPU: batch_cleanup=() -> GC.gc(false) Inspired by similar refactoring in GaussMLE. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Two calling conventions now supported: - getboxes(data, camera, config::BoxerConfig) - getboxes(data, camera; kwargs...) # kwargs forward to Config BoxerConfig fields match kwargs: - PSF-aware: psf_sigma, min_photons - Advanced: sigma_small, sigma_large, minval - Box params: boxsize, overlap - Backend: backend, auto_timeout, gpu_timeout Added tests for BoxerConfig calling convention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

README.md: - Added BoxerConfig section with usage examples - Documented both config-based and kwargs-based conventions api_overview.md: - Added BoxerConfig to exports list (now 6 exports) - Added Configuration section with struct definition - Updated getboxes documentation for both conventions - Updated workflow examples showing both conventions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…oxesInfo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ext switching find_best_gpu() and recommend_batch_size() previously called CUDA.device!(i) to iterate over GPUs, which triggers cuDevicePrimaryCtxRetain and can OOM under multi-process contention (5+ parallel Julia processes). Now uses CUDA.NVML.memory_info() to query free memory without creating CUDA contexts, only calling CUDA.device!() once on the selected device. Fixes #13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When backend=:auto, catch CUDA errors during GPU processing (dog_filter, findlocalmax, etc.) and fall back to CPU with a warning. This handles cases where GPU memory is initially available but lost to competing processes during execution (CUDA error 999). :gpu mode still hard-errors as expected since the user explicitly requested GPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Layer 1 (avoidance): New poll_gpu_nvml() scans ALL GPUs via NVML without creating CUDA contexts. Checks free memory, process count, and compute utilization per device. Polls through timeout with jittered backoff. First GPU that clears contention threshold wins. select_backend() now returns (backend, device_id) tuple and handles device selection. Layer 2 (recovery): Runtime try/catch in _getboxes_impl for :auto mode catches CUDA errors during GPU processing and falls back to CPU. Safety net for the inherent race between NVML check and CUDA allocation. Also fixes find_best_gpu() and recommend_batch_size() to use NVML for memory queries instead of iterating CUDA.device!() which triggers cuDevicePrimaryCtxRetain OOM under multi-process contention. Fixes #13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add on_wait::Union{Function,Nothing} field to BoxerConfig - Remove deprecated use_gpu kwarg from getboxes() and recommend_batch_size() - Config-based getboxes() now reads on_wait from config instead of separate kwarg - Update all tests: use_gpu=false -> backend=:cpu Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds GC.gc() + CUDA.reclaim() in the :auto mode try/catch to free dead CuArrays and return memory to the pool, preventing deadlock when multiple processes contend for GPU resources. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…llback On GPU OOM in :auto mode, release memory via GC.gc + CUDA.reclaim, then re-poll NVML with remaining auto_timeout budget. Only fall back to CPU when timeout expires with no GPU available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace split retry logic (select_backend + Layer 2 try/catch) with one loop in _getboxes_impl that handles all GPU failure modes: - No free memory (NVML poll waits) - TOCTOU context creation race (catch + re-poll) - Runtime OOM during processing (catch + release + re-poll) Loop retries until timeout expires. On timeout: :auto falls back to CPU, :gpu errors. select_backend no longer called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CUDA.jl's memory pool retains GPU memory after arrays are freed. Without explicit reclaim, finished processes block others polling via NVML. Now calls GC.gc + CUDA.reclaim after both success and failure paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kalidke and others added 2 commits February 1, 2026 17:37

kalidke mentioned this pull request Feb 2, 2026

Add memory-aware batching to CPU path #12

Closed

3 tasks

kalidke and others added 5 commits February 2, 2026 15:42

Change elapsed_ns to elapsed_s (Float64 seconds) in BoxesInfo

b3ba4e4

Aligns with ecosystem convention requested by Keith. Updated all references in src, tests, docs, and examples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

kalidke changed the title ~~Implement tuple-pattern: getboxes returns (ROIBatch, BoxesInfo)~~ Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking Feb 4, 2026

kalidke and others added 11 commits February 6, 2026 09:44

Add AbstractSMLMConfig/AbstractSMLMInfo inheritance for BoxerConfig/B…

47b3346

…oxesInfo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update SMLMData compat to 0.7 (registered version, no path dev)

435286f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update docs for BoxerConfig and two calling conventions

f373ea6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kalidke merged commit 72a6aeb into main Feb 7, 2026
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking#10

Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking#10
kalidke merged 18 commits intomainfrom
feature/tuple-pattern

kalidke commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kalidke commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking Changes

New Features

BoxerConfig

Two Calling Conventions

BoxesInfo with Batch Tracking

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kalidke commented Feb 2, 2026 •

edited

Loading