Particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.
Total exports: 6
getboxes- Main detection functionBoxerConfig- Configuration struct for detection parametersBoxesInfo- Metadata struct returned alongside ROIBatchrecommend_batch_size- Memory-aware batch sizing utilityROIBatch- Re-exported from SMLMData.jlSingleROI- Re-exported from SMLMData.jl
Note: SMLMBoxer.api() is available but not exported to avoid conflicts with other JuliaSMLM packages.
Detects blob-like features by subtracting two Gaussian-blurred versions of the image:
- sigma_small: Matches PSF size for optimal blob detection
- sigma_large: Suppresses background (typically 2× sigma_small)
- minval: Intensity threshold after filtering
Specify physical PSF parameters (in microns), automatically converted to optimal filter settings:
- psf_sigma: PSF sigma in microns (e.g., 0.13 for 130nm PSF)
- min_photons: Total photon threshold (automatically converted to DoG intensity threshold)
When SCMOSCamera is provided, implements SMITE-style inverse variance weighting:
- Each pixel weighted by
gaussian_kernel / variancewhere variance = readnoise² - Low-noise pixels: high weight (strong detection influence)
- High-noise pixels: low weight (reduced false positives)
- GPU-accelerated via KernelAbstractions.jl (device-agnostic kernels)
- Standard DoG: NNlib with cuDNN backend (10-100x speedup)
- Variance-weighted: KernelAbstractions custom kernels (same code for CPU/GPU)
- Multi-GPU support: NVML-based polling selects GPU with most free memory across all devices
Unified GPU retry loop handles multi-process contention:
- Poll all GPUs via NVML (no CUDA context creation) for sufficient free memory and low utilization
- Acquire GPU context and run processing
- On any failure (no memory, TOCTOU context race, runtime OOM): release memory via
GC.gc() + CUDA.reclaim(), re-poll NVML with remaining timeout - On timeout:
:autofalls back to CPU,:gpuerrors - On success: reclaim GPU memory pool so finished jobs don't block other processes
Backend modes:
:cpu- Always CPU, no GPU involvement:gpu- Require GPU, retry untilgpu_timeout(default: Inf), error if unavailable:auto- Try GPU, retry untilauto_timeout(default: 30s), fall back to CPU
Wait progress callback:
config = BoxerConfig(
psf_sigma=0.13,
backend=:auto,
on_wait=(elapsed, available, required) -> @info "Waiting for GPU" elapsed available required
)Configuration struct for ROI detection parameters. Supports @kwdef construction with defaults.
@kwdef struct BoxerConfig
# PSF-aware interface (recommended)
psf_sigma::Union{Float64,Nothing} = nothing # PSF sigma in microns
min_photons::Float64 = 500.0 # Minimum photons for detection
# Advanced interface (direct control)
sigma_small::Float64 = 1.0 # Small Gaussian sigma in pixels
sigma_large::Float64 = 2.0 # Large Gaussian sigma in pixels
minval::Float64 = 0.0 # DoG intensity threshold
# Box parameters
boxsize::Int = 7 # ROI size in pixels
overlap::Float64 = 2.0 # Max overlap between detections
# Backend parameters
backend::Symbol = :auto # :cpu, :gpu, or :auto
auto_timeout::Float64 = 30.0 # Max wait for GPU in :auto mode
gpu_timeout::Float64 = Inf # Max wait in :gpu mode
on_wait::Union{Function,Nothing} = nothing # Optional wait progress callback
endUsage:
# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)
# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)Config-based (recommended for reusable settings):
getboxes(imagestack, camera, config::BoxerConfig) -> (ROIBatch, BoxesInfo)Kwargs-based (convenient for one-off calls):
getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)Both conventions are equivalent - kwargs are forwarded to a BoxerConfig internally.
Main detection function. Applies DoG filtering, finds local maxima, extracts ROI patches.
Arguments:
imagestack::AbstractArray{<:Real}- Input image stack (2D or 3D)camera::Union{AbstractCamera,Nothing}- Camera object (IdealCamera or SCMOSCamera)config::BoxerConfig- Configuration struct (config-based convention)
Kwargs (kwargs-based convention):
PSF-Aware Interface (Recommended):
psf_sigma::Real- PSF sigma in microns (requires camera for pixel size conversion)min_photons::Real- Minimum total photons for detection (default: 500.0)
Advanced Interface (Direct Control):
sigma_small::Real- Small Gaussian sigma in pixels (default: 1.0)sigma_large::Real- Large Gaussian sigma in pixels (default: 2.0)minval::Real- DoG intensity threshold (default: 0.0)
Other Parameters:
boxsize::Int- ROI size in pixels (default: 7)overlap::Real- Maximum overlap between detections in pixels (default: 2.0)backend::Symbol- Compute backend::cpu,:gpu, or:auto(default::auto):cpu- Always use CPU:gpu- Require GPU, wait for memory if needed (waits forever by default):auto- Try GPU with timeout, fall back to CPU if memory unavailable
auto_timeout::Real- Max seconds to wait for GPU in:automode (default: 30.0)gpu_timeout::Real- Max seconds to wait in:gpumode (default: Inf)on_wait::Function- Optional callback(elapsed, available, required) -> nothingfor wait progress
Returns: Tuple of (ROIBatch, BoxesInfo)
ROIBatch with fields:
data- ROI stack (boxsize × boxsize × n_rois)x_corners- Vector of x (column) corner positionsy_corners- Vector of y (row) corner positionsframe_indices- Vector of frame indices for each ROIcamera- Camera object (provided or default IdealCamera)roi_size- Size of each ROI (square)
BoxesInfo with fields:
backend- Compute backend used (:gpuor:cpu)elapsed_s- Wall time in secondsdevice_id- GPU device ID (0-based), or -1 for CPUn_rois- Number of ROIs detectedbatch_size- Frames per batch during processingn_batches- Number of batches processedmemory_per_batch- Estimated memory per batch in bytes
Returns the recommended maximum number of frames to load at once given memory constraints.
Use this when processing very large datasets to determine optimal chunk size before calling getboxes().
Arguments:
height::Int- Image height in pixelswidth::Int- Image width in pixelsbackend::Symbol- Compute backend::cpu,:gpu, or:auto(default::auto)memory_fraction::Real- Fraction of free memory to use (default: 0.8)
Returns: Maximum recommended number of frames to load at once
Memory Model: The processing pipeline requires approximately 6× the raw image size:
- Input imagestack
- Filtered stack (DoG output)
- Local maxima detection intermediates
- Coordinate arrays
- Box extraction workspace
- Broadcast temporaries
Example:
using SMLMBoxer
# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")
# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
chunk_end = min(chunk_start + max_frames - 1, total_frames)
imagestack = load_frames(chunk_start:chunk_end)
(roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)
# ... process results
endContainer for multiple ROIs from particle detection. Supports iteration and indexing.
# Iteration
for roi in roi_batch
# roi is a SingleROI with .data, .corner, .frame_idx, .camera
process(roi.data)
end
# Indexing
roi = roi_batch[1] # Returns SingleROIIndividual ROI with image data, position, frame index, and camera calibration.
using SMLMBoxer, SMLMData
# Create camera with physical pixel size (100nm pixels)
camera = IdealCamera(1:256, 1:256, 0.1f0)
# Config-based (recommended for reusable settings)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
(roi_batch, info) = getboxes(imagestack, camera, config)
# OR kwargs-based (convenient for one-off calls)
(roi_batch, info) = getboxes(imagestack, camera;
psf_sigma = 0.13, # 130nm PSF in microns
min_photons = 500.0, # Minimum 500 photons
boxsize = 11)
# Access results
n_detections = length(roi_batch.x_corners)
boxes = roi_batch.data # 11×11×n ROI patches
positions_x = roi_batch.x_corners # Column positions
positions_y = roi_batch.y_corners # Row positions
frames = roi_batch.frame_indices
# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")using SMLMData
# Create sCMOS camera with readnoise map (Float32 for type consistency)
readnoise_map = Float32.(load_readnoise_calibration("camera_calib.mat"))
camera = SCMOSCamera(256, 256, 0.1f0, readnoise_map)
# Variance-weighted detection (automatically enabled)
(roi_batch, info) = getboxes(imagestack, camera;
psf_sigma = 0.13,
min_photons = 300.0, # Lower threshold possible with noise weighting
backend = :auto) # GPU with CPU fallback# Expert mode: bypass PSF-aware interface
# Config-based
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0, boxsize=9, overlap=1.5)
(roi_batch, info) = getboxes(imagestack, nothing, config)
# OR kwargs-based
(roi_batch, info) = getboxes(imagestack;
sigma_small = 1.5, # Custom filter sigma (pixels)
sigma_large = 3.0,
minval = 10.0, # Direct intensity threshold
boxsize = 9,
overlap = 1.5)(roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)
# Iterate over ROIs
for roi in roi_batch
# SingleROI fields:
# - roi.data: Image patch
# - roi.corner: (x, y) corner position
# - roi.frame_idx: Frame index
# - roi.camera: Camera ROI calibration
fit_gaussian(roi.data)
end
# Direct indexing
first_roi = roi_batch[1]These functions are documented but not part of the public API. Use at your own risk as they may change.
Filtering:
dog_filter(imagestack, args)- Apply DoG filter (routes to standard or variance-weighted)dog_filter_variance_weighted(imagestack, σ_small, σ_large, args)- Variance-weighted DoGconvolve(imagestack, kernel; use_gpu)- Standard convolution via NNlibconvolve_variance_weighted(imagestack, variance_map, σ, use_gpu)- Variance-weighted via KAgaussian_2d(sigma, kernelsize)- Create 2D Gaussian kerneldog_kernel(sigma_small, sigma_large)- Create DoG kernel
Local Maxima Detection:
findlocalmax(imagestack, kernelsize; minval, use_gpu)- Find local maximum coordinatesgenlocalmaximage(imagestack, kernelsize; minval, use_gpu)- Generate local max image
Coordinate Processing:
maxima2coords(imagestack)- Convert non-zero pixels to coordinatesremoveoverlap(coords, args)- Remove overlapping detections
Box Extraction:
getboxstack(imagestack, coords, args)- Extract ROI patches at coordinatesfillbox!(box, imagestack, row, col, im, boxsize)- Fill single ROI patch
Helper Functions:
get_pixel_size(camera)- Extract pixel size from cameraphotons_to_dog_threshold(min_photons, psf_sigma)- Convert photon threshold to DoG thresholdpixels_to_microns(pixel_coords, camera)- Coordinate conversionextract_camera_roi(camera, row_range, col_range)- Extract camera calibration for ROIget_variance_map(camera, imagesize)- Compute variance map from camerareshape_for_flux(arr)- Reshape array for NNlib convolution
- Filtering: Apply DoG filter (standard or variance-weighted based on camera type)
- Local Maxima: Find peaks above threshold using max pooling
- Overlap Removal: Eliminate overlapping detections (keep higher intensity)
- Box Extraction: Cut out ROI patches around each detection
- ROIBatch Construction: Package results with camera calibration
When psf_sigma (microns) is provided:
# Convert to pixels
psf_sigma_pixels = psf_sigma / pixel_size
# Automatic filter sizing
sigma_small = 1.0 × psf_sigma_pixels # Match PSF
sigma_large = 2.0 × psf_sigma_pixels # Background suppression
# Photon threshold → DoG intensity threshold
σ_eff = √(psf_sigma² + sigma_small²) # Effective sigma after filtering
peak_filtered = min_photons / (2π × σ_eff²)
minval = 0.65 × peak_filtered # DoG reduction factorFor sCMOS cameras with readnoise map:
# At each pixel (i,j):
weightsum = sum(gaussian_weight / variance[ii,jj] * input[ii,jj])
varsum = sum(gaussian_weight / variance[ii,jj])
output[i,j] = weightsum / varsumImplements optimal inverse variance weighting for spatially-varying noise.
- GPU Memory Management: Automatically batches frames if image stack exceeds GPU memory
- Type Stability: All inputs converted to Float32 at entry point
- Multi-GPU NVML Polling: Scans all GPUs via NVML without creating CUDA contexts. Checks free memory, process contention, and compute utilization. First GPU with sufficient memory and low contention wins.
- Contention-Safe Retry: Unified retry loop handles all GPU failure modes (insufficient memory, TOCTOU context race, runtime OOM). Releases memory and re-polls with remaining timeout budget.
- Memory Pool Reclaim: Calls
GC.gc() + CUDA.reclaim()after both successful and failed GPU processing to return memory to the system, preventing finished jobs from blocking other processes. - Jittered Backoff: NVML polling uses jittered sleep intervals to avoid thundering herd when multiple processes compete for GPUs.
- Backend Abstraction: KernelAbstractions enables same code for CPU/GPU variance weighting
- Typical Speedup: 10-100x with GPU depending on image size and number of frames