Skip to content

Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking#10

Merged
kalidke merged 18 commits intomainfrom
feature/tuple-pattern
Feb 7, 2026
Merged

Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking#10
kalidke merged 18 commits intomainfrom
feature/tuple-pattern

Conversation

@kalidke
Copy link
Copy Markdown
Member

@kalidke kalidke commented Feb 2, 2026

Summary

Major update adding ecosystem-aligned patterns and memory improvements:

  • Tuple-pattern return: getboxes() now returns (ROIBatch, BoxesInfo) tuple
  • BoxerConfig struct: Config-based calling convention alongside kwargs
  • CPU memory batching: Memory-aware batching for CPU path (mirrors GPU)
  • Batch tracking: BoxesInfo includes n_rois, batch_size, n_batches, memory_per_batch
  • Timing in seconds: Changed elapsed_ns to elapsed_s::Float64
  • Refactored batching: Extracted _process_with_batching() helper to deduplicate GPU/CPU paths

Breaking Changes

  • getboxes() returns (ROIBatch, BoxesInfo) instead of just ROIBatch
  • BoxesInfo.elapsed_ns renamed to BoxesInfo.elapsed_s (now Float64 seconds)

New Features

BoxerConfig

config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
(rois, info) = getboxes(data, camera, config)

Two Calling Conventions

# Config-based (recommended for reusable settings)
(rois, info) = getboxes(data, camera, config)

# Kwargs-based (convenient for one-off calls)
(rois, info) = getboxes(data, camera; psf_sigma=0.13, min_photons=500.0)

BoxesInfo with Batch Tracking

BoxesInfo(2 ROIs, 12.3 ms, cpu, 1 batches × 10, 234.4 KB/batch)

Test Plan

  • All 80 unit tests pass
  • GPU tests pass on descent (92x-406x speedup)
  • Performance benchmarks pass
  • GPU wait/timeout tests pass

🤖 Generated with Claude Code

kalidke and others added 2 commits February 1, 2026 17:37
BREAKING CHANGE: getboxes now returns a tuple instead of a single value.

Changes:
- Add BoxesInfo struct with backend, elapsed_ns, device_id fields
- Capture wall time via time_ns() during processing
- Track GPU device ID (0-based) or -1 for CPU
- Export BoxesInfo from SMLMBoxer module
- Update all tests, examples, and documentation

Part of JuliaSMLM ecosystem-wide tuple-pattern rollout for
consistent metadata return across packages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The CPU path in getboxes previously processed entire image stacks at once,
causing OOM on large inputs. This mirrors the GPU batching logic:

- Use Sys.free_memory() to determine available RAM
- Calculate batch size based on memory requirements (6x for standard, 10x for sCMOS)
- Process large stacks in batches with proper frame offset tracking
- Call GC.gc(false) between batches to release allocations

Fixes issue #11.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@kalidke kalidke mentioned this pull request Feb 2, 2026
3 tasks
kalidke and others added 5 commits February 2, 2026 15:42
Aligns with GaussMLE FitInfo struct. BoxesInfo now includes:
- n_rois: Number of ROIs detected
- batch_size: Frames per batch during processing
- n_batches: Number of batches processed
- memory_per_batch: Estimated memory per batch in bytes

Added show() method displaying: BoxesInfo(N ROIs, X ms, backend, B batches × S, M/batch)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Aligns with ecosystem convention requested by Keith.
Updated all references in src, tests, docs, and examples.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…U paths

Reduces ~90 lines of duplicated batching logic to ~50 lines.
Both GPU and CPU paths now call the same helper with different parameters:
- GPU: no batch_cleanup
- CPU: batch_cleanup=() -> GC.gc(false)

Inspired by similar refactoring in GaussMLE.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two calling conventions now supported:
- getboxes(data, camera, config::BoxerConfig)
- getboxes(data, camera; kwargs...)  # kwargs forward to Config

BoxerConfig fields match kwargs:
- PSF-aware: psf_sigma, min_photons
- Advanced: sigma_small, sigma_large, minval
- Box params: boxsize, overlap
- Backend: backend, auto_timeout, gpu_timeout

Added tests for BoxerConfig calling convention.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
README.md:
- Added BoxerConfig section with usage examples
- Documented both config-based and kwargs-based conventions

api_overview.md:
- Added BoxerConfig to exports list (now 6 exports)
- Added Configuration section with struct definition
- Updated getboxes documentation for both conventions
- Updated workflow examples showing both conventions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@kalidke kalidke changed the title Implement tuple-pattern: getboxes returns (ROIBatch, BoxesInfo) Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking Feb 4, 2026
kalidke and others added 11 commits February 6, 2026 09:44
…oxesInfo

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ext switching

find_best_gpu() and recommend_batch_size() previously called CUDA.device!(i)
to iterate over GPUs, which triggers cuDevicePrimaryCtxRetain and can OOM
under multi-process contention (5+ parallel Julia processes). Now uses
CUDA.NVML.memory_info() to query free memory without creating CUDA contexts,
only calling CUDA.device!() once on the selected device.

Fixes #13.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When backend=:auto, catch CUDA errors during GPU processing (dog_filter,
findlocalmax, etc.) and fall back to CPU with a warning. This handles
cases where GPU memory is initially available but lost to competing
processes during execution (CUDA error 999). :gpu mode still hard-errors
as expected since the user explicitly requested GPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Layer 1 (avoidance): New poll_gpu_nvml() scans ALL GPUs via NVML without
creating CUDA contexts. Checks free memory, process count, and compute
utilization per device. Polls through timeout with jittered backoff.
First GPU that clears contention threshold wins. select_backend() now
returns (backend, device_id) tuple and handles device selection.

Layer 2 (recovery): Runtime try/catch in _getboxes_impl for :auto mode
catches CUDA errors during GPU processing and falls back to CPU. Safety
net for the inherent race between NVML check and CUDA allocation.

Also fixes find_best_gpu() and recommend_batch_size() to use NVML for
memory queries instead of iterating CUDA.device!() which triggers
cuDevicePrimaryCtxRetain OOM under multi-process contention.

Fixes #13.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add on_wait::Union{Function,Nothing} field to BoxerConfig
- Remove deprecated use_gpu kwarg from getboxes() and recommend_batch_size()
- Config-based getboxes() now reads on_wait from config instead of separate kwarg
- Update all tests: use_gpu=false -> backend=:cpu

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds GC.gc() + CUDA.reclaim() in the :auto mode try/catch to free
dead CuArrays and return memory to the pool, preventing deadlock
when multiple processes contend for GPU resources.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llback

On GPU OOM in :auto mode, release memory via GC.gc + CUDA.reclaim,
then re-poll NVML with remaining auto_timeout budget. Only fall back
to CPU when timeout expires with no GPU available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace split retry logic (select_backend + Layer 2 try/catch) with
one loop in _getboxes_impl that handles all GPU failure modes:
- No free memory (NVML poll waits)
- TOCTOU context creation race (catch + re-poll)
- Runtime OOM during processing (catch + release + re-poll)

Loop retries until timeout expires. On timeout: :auto falls back
to CPU, :gpu errors. select_backend no longer called.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CUDA.jl's memory pool retains GPU memory after arrays are freed.
Without explicit reclaim, finished processes block others polling
via NVML. Now calls GC.gc + CUDA.reclaim after both success and
failure paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kalidke kalidke merged commit 72a6aeb into main Feb 7, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant