Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking#10
Merged
Add tuple-pattern, BoxerConfig, CPU batching, and batch tracking#10
Conversation
BREAKING CHANGE: getboxes now returns a tuple instead of a single value. Changes: - Add BoxesInfo struct with backend, elapsed_ns, device_id fields - Capture wall time via time_ns() during processing - Track GPU device ID (0-based) or -1 for CPU - Export BoxesInfo from SMLMBoxer module - Update all tests, examples, and documentation Part of JuliaSMLM ecosystem-wide tuple-pattern rollout for consistent metadata return across packages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The CPU path in getboxes previously processed entire image stacks at once, causing OOM on large inputs. This mirrors the GPU batching logic: - Use Sys.free_memory() to determine available RAM - Calculate batch size based on memory requirements (6x for standard, 10x for sCMOS) - Process large stacks in batches with proper frame offset tracking - Call GC.gc(false) between batches to release allocations Fixes issue #11. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3 tasks
Aligns with GaussMLE FitInfo struct. BoxesInfo now includes: - n_rois: Number of ROIs detected - batch_size: Frames per batch during processing - n_batches: Number of batches processed - memory_per_batch: Estimated memory per batch in bytes Added show() method displaying: BoxesInfo(N ROIs, X ms, backend, B batches × S, M/batch) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Aligns with ecosystem convention requested by Keith. Updated all references in src, tests, docs, and examples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…U paths Reduces ~90 lines of duplicated batching logic to ~50 lines. Both GPU and CPU paths now call the same helper with different parameters: - GPU: no batch_cleanup - CPU: batch_cleanup=() -> GC.gc(false) Inspired by similar refactoring in GaussMLE. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two calling conventions now supported: - getboxes(data, camera, config::BoxerConfig) - getboxes(data, camera; kwargs...) # kwargs forward to Config BoxerConfig fields match kwargs: - PSF-aware: psf_sigma, min_photons - Advanced: sigma_small, sigma_large, minval - Box params: boxsize, overlap - Backend: backend, auto_timeout, gpu_timeout Added tests for BoxerConfig calling convention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
README.md: - Added BoxerConfig section with usage examples - Documented both config-based and kwargs-based conventions api_overview.md: - Added BoxerConfig to exports list (now 6 exports) - Added Configuration section with struct definition - Updated getboxes documentation for both conventions - Updated workflow examples showing both conventions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…oxesInfo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ext switching find_best_gpu() and recommend_batch_size() previously called CUDA.device!(i) to iterate over GPUs, which triggers cuDevicePrimaryCtxRetain and can OOM under multi-process contention (5+ parallel Julia processes). Now uses CUDA.NVML.memory_info() to query free memory without creating CUDA contexts, only calling CUDA.device!() once on the selected device. Fixes #13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When backend=:auto, catch CUDA errors during GPU processing (dog_filter, findlocalmax, etc.) and fall back to CPU with a warning. This handles cases where GPU memory is initially available but lost to competing processes during execution (CUDA error 999). :gpu mode still hard-errors as expected since the user explicitly requested GPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Layer 1 (avoidance): New poll_gpu_nvml() scans ALL GPUs via NVML without creating CUDA contexts. Checks free memory, process count, and compute utilization per device. Polls through timeout with jittered backoff. First GPU that clears contention threshold wins. select_backend() now returns (backend, device_id) tuple and handles device selection. Layer 2 (recovery): Runtime try/catch in _getboxes_impl for :auto mode catches CUDA errors during GPU processing and falls back to CPU. Safety net for the inherent race between NVML check and CUDA allocation. Also fixes find_best_gpu() and recommend_batch_size() to use NVML for memory queries instead of iterating CUDA.device!() which triggers cuDevicePrimaryCtxRetain OOM under multi-process contention. Fixes #13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add on_wait::Union{Function,Nothing} field to BoxerConfig
- Remove deprecated use_gpu kwarg from getboxes() and recommend_batch_size()
- Config-based getboxes() now reads on_wait from config instead of separate kwarg
- Update all tests: use_gpu=false -> backend=:cpu
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds GC.gc() + CUDA.reclaim() in the :auto mode try/catch to free dead CuArrays and return memory to the pool, preventing deadlock when multiple processes contend for GPU resources. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llback On GPU OOM in :auto mode, release memory via GC.gc + CUDA.reclaim, then re-poll NVML with remaining auto_timeout budget. Only fall back to CPU when timeout expires with no GPU available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace split retry logic (select_backend + Layer 2 try/catch) with one loop in _getboxes_impl that handles all GPU failure modes: - No free memory (NVML poll waits) - TOCTOU context creation race (catch + re-poll) - Runtime OOM during processing (catch + release + re-poll) Loop retries until timeout expires. On timeout: :auto falls back to CPU, :gpu errors. select_backend no longer called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CUDA.jl's memory pool retains GPU memory after arrays are freed. Without explicit reclaim, finished processes block others polling via NVML. Now calls GC.gc + CUDA.reclaim after both success and failure paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major update adding ecosystem-aligned patterns and memory improvements:
getboxes()now returns(ROIBatch, BoxesInfo)tupleBoxesInfoincludesn_rois,batch_size,n_batches,memory_per_batchelapsed_nstoelapsed_s::Float64_process_with_batching()helper to deduplicate GPU/CPU pathsBreaking Changes
getboxes()returns(ROIBatch, BoxesInfo)instead of justROIBatchBoxesInfo.elapsed_nsrenamed toBoxesInfo.elapsed_s(now Float64 seconds)New Features
BoxerConfig
Two Calling Conventions
BoxesInfo with Batch Tracking
Test Plan
🤖 Generated with Claude Code