English / 日本語
❗Note❗
The README was written entirely by Claude Code, so it contains non-existent command line options. Please check the options yourself and replace them correctly. Pull requests are welcome.
A lightweight ROI-based hierarchical instance segmentation model for human detection with knowledge distillation from EfficientNet-based teacher models. The model achieves efficient real-time performance through a two-stage hierarchical architecture and temperature progression distillation techniques.
- Architecture Overview
- Architecture Details
- Architecture Diagram
- Model Architecture
- Training Pipeline
- Refinement Mechanism
- Dataset Structure
- Environment Setup
- UNet Distillation Commands
- ROI-Based Hierarchical Training
- ONNX Export
- Test Inference
- License
- Citations and Acknowledgments
The Human Instance Segmentation model employs a sophisticated hierarchical segmentation approach that combines:
- Two-Stage Architecture: Coarse binary segmentation followed by ROI-based instance refinement
- Multi-Architecture Support: B0 (lightweight), B1 (balanced), and B7 (high-accuracy) variants
- Knowledge Distillation: Temperature progression (10→1) for efficient knowledge transfer
- Real-time Processing: Optimized for edge devices with ONNX/TensorRT deployment
- Direct RGB input processing without separate feature extraction
- Pre-trained UNet for robust binary foreground/background segmentation
- ROI-based refinement for precise instance separation
- 3-class output system (background, target instance, non-target instances)
- Optional post-processing with dilation and edge smoothing
- Encoder: EfficientNet-B0 based (timm-efficientnet-b0)
- Parameters: ~5.3M
- ONNX Size: ~71MB
- ROI Size: 64×48 (standard), 80×60 (enhanced)
- Mask Size: 128×96 (standard), 160×120 (enhanced)
- Use Case: Real-time edge deployment, mobile devices
- Encoder: EfficientNet-B1 based (timm-efficientnet-b1)
- Parameters: ~7.8M
- ONNX Size: ~81MB
- ROI Size: 64×48 (standard), 80×60 (enhanced)
- Mask Size: 128×96 (standard), 160×120 (enhanced)
- Use Case: Balanced performance/accuracy trade-off
- Encoder: EfficientNet-B7 based (timm-efficientnet-b7)
- Parameters: ~66M
- ONNX Size: ~90MB
- ROI Size: 64×48 (standard), 80×60 (enhanced), 128×96 (ultra)
- Mask Size: 128×96 (standard), 160×120 (enhanced), 256×192 (ultra)
- Use Case: Maximum accuracy, server deployment
- Architecture: Enhanced UNet with residual blocks
- Normalization: LayerNorm2D for stable training
- Activation: ReLU/SiLU configurable
- Output: Binary foreground/background mask
- Training: Frozen during instance segmentation training
- Input: COCO bounding boxes
- Normalization: Coordinates normalized to [0, 1]
- Pooling: Dynamic RoI Align with configurable output sizes
- Batch Processing: Efficient multi-instance handling
- Architecture: Hierarchical UNet V2 with attention modules
- Classes: 3-class segmentation (background, target, non-target)
- Features:
- Residual blocks for feature refinement
- Attention gating for focus on person boundaries
- Distance-aware loss for better instance separation
- Contour detection auxiliary task
- Primary Loss: Weighted CrossEntropy + Dice Loss
- Class Weights:
- Background: 0.538
- Target: 0.750
- Non-target: 1.712 (1.2× boosted)
- Auxiliary Losses:
- Distance transform loss for boundary awareness
- Contour detection loss for edge refinement
- Separation-aware weighting for instance distinction
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ Input RGB Image │ │ ROIs │
│ [B, 3, H, W] │ │ [N, 5] │
└──────────────┬──────────────┘ │ [batch_idx, x1, y1, x2, y2] │
│ │ (0-1 normalized coordinates) │
│ └──────────────┬───────────────┘
│ │
┌──────────────▼──────────────┐ │
│ Pretrained UNet Module │ │
│ (Frozen during training) │ │
│ Output: Binary FG/BG │ │
└──────────────┬──────────────┘ │
│ │
┌─────────────┴─────────────┐ │
│ │ │
┌───────────▼───────────┐ ┌───────────▼──────────┐ │
│ Binary Mask Output │ │ Feature Maps │ │
│ [B, 1, H, W] │ │ for ROI Pooling │ │
└───────────┬───────────┘ └───────────┬──────────┘ │
│ │ │
└─────────────┬─────────────┘ │
│◀────────────────────────────────────────┘
┌───────────────▼───────────────┐
│ Dynamic RoI Align │
│ Output: [N, C, H_roi, W_roi] │
└───────────────┬───────────────┘
│
┌─────────────┴─────────────┐
│ │
┌────────────▼───────────┐ ┌───────────▼────────────┐
│ EfficientNet │ │ Pretrained UNet Mask │
│ Encoder │ │ (for each ROI) │
│ (B0/B1/B7) │ │ [N, 1, H_roi, W_roi] │
└────────────┬───────────┘ └───────────┬────────────┘
│ │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ Instance Segmentation │
│ Head (UNet V2) │
│ - Attention Modules │
│ - Residual Blocks │
│ - Distance-Aware Loss │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 3-Class Output Logits │
│ [N, 3, mask_h, mask_w] │
│ Classes: │
│ 0: Background │
│ 1: Target Instance │
│ 2: Non-target Instances │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ Post-Processing │
│ (Optional) │
│ - Mask Dilation │
│ - Edge Smoothing │
└───────────────────────────┘
- Teacher Model Training: Train B7 architecture to high accuracy
- Temperature Progression: Gradual temperature reduction (10→1)
- Student Training: Distill to B0/B1 with feature and logit matching
- Fine-tuning: Optional direct training on target dataset
-
Stage 1: UNet Pre-training
- Binary person segmentation on COCO dataset
- Frozen after pre-training for all subsequent stages
-
Stage 2: Knowledge Distillation
-
Stage 3: Instance Segmentation Training
- ROI-based training with 3-class outputs
- Distance-aware loss for instance separation
- Auxiliary tasks for boundary refinement
- Coarse Segmentation: Pretrained UNet provides initial binary mask
- ROI Extraction: Extract regions around detected persons
- Feature Enhancement: Process ROIs through EfficientNet encoder
- Instance Refinement:
- Apply attention-gated refinement
- Use binary mask as prior for background suppression
- Separate overlapping instances via distance transform
- Attention Gating: Focus processing on person boundaries
- Distance Transform: Encode spatial relationships for better separation
- Contour Detection: Auxiliary task for edge preservation
- Separation-Aware Weighting: Boost non-target class for clearer boundaries
data/
├── annotations/
│ ├── instances_train2017_person_only_no_crowd.json # Full training set
│ ├── instances_val2017_person_only_no_crowd.json # Full validation set
│ ├── instances_train2017_person_only_no_crowd_100imgs.json # Dev subset
│ └── instances_val2017_person_only_no_crowd_100imgs.json # Dev subset
├── images/
│ ├── train2017/ # COCO training images
│ └── val2017/ # COCO validation images
└── pretrained/
├── best_model_b0_*.pth # Pretrained B0 models
├── best_model_b1_*.pth # Pretrained B1 models
└── best_model_b7_*.pth # Pretrained B7 models
- Format: COCO JSON format
- Categories: Person only (no crowd annotations)
- Content: Bounding boxes and segmentation polygons
- Filtering: Crowd instances removed for cleaner training
- Full Dataset: ~64K training, ~2.7K validation images
- Development Subsets: 100, 500 image versions
- Class Distribution:
- Background: ~53.8% pixels
- Target instances: ~33.3% pixels
- Non-target instances: ~12.9% pixels
- Python 3.10
- CUDA 11.8+ (for GPU support)
- uv package manager
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv
# Activate environment
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
# Install dependencies
uv pip install -r pyproject.toml
# Install development dependencies (optional)
uv pip install -e ".[dev]"
# Check PyTorch and CUDA
uv run python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
# Check ONNX Runtime
uv run python -c "import onnxruntime as ort; print(f'ONNX Runtime: {ort.__version__}')"
rgb_hierarchical_unet_v2_distillation_b0_from_b7_temp_prog
: B7→B0 distillationrgb_hierarchical_unet_v2_distillation_b1_from_b7_temp_prog
: B7→B1 distillationrgb_hierarchical_unet_v2_distillation_b7_from_b7_temp_prog
: B7 self-distillation
# B7 to B0 distillation with temperature progression
uv run python train_distillation_staged.py \
--config rgb_hierarchical_unet_v2_distillation_b0_from_b7_temp_prog \
--epochs 100 \
--batch_size 16
# B7 to B1 distillation
uv run python train_distillation_staged.py \
--config rgb_hierarchical_unet_v2_distillation_b1_from_b7_temp_prog \
--epochs 100 \
--batch_size 12
# Resume from checkpoint
uv run python train_distillation_staged.py \
--config rgb_hierarchical_unet_v2_distillation_b7_from_b7_temp_prog \
--resume checkpoints/distillation_epoch_050.pth \
--epochs 100
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B0
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B0
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B1
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B1
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B7
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B7
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B0_enhanced
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B0_enhanced
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B1_enhanced
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B1_enhanced
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B7_enhanced
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B7_enhanced
rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r128x96m256x192_disttrans_contdet_baware_from_B7_enhanced
# Train B0 model with standard ROI size (development dataset)
uv run python train_advanced.py \
--config rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B0 \
--epochs 10 \
--batch_size 8
# Train B1 model with enhanced ROI size (full dataset)
uv run python train_advanced.py \
--config rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r80x60m160x120_disttrans_contdet_baware_from_B1_enhanced \
--train_ann data/annotations/instances_train2017_person_only_no_crowd.json \
--val_ann data/annotations/instances_val2017_person_only_no_crowd.json \
--epochs 100 \
--batch_size 6
# Train B7 model with ultra ROI size
uv run python train_advanced.py \
--config rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r128x96m256x192_disttrans_contdet_baware_from_B7_enhanced \
--train_ann data/annotations/instances_train2017_person_only_no_crowd.json \
--val_ann data/annotations/instances_val2017_person_only_no_crowd.json \
--epochs 100 \
--batch_size 4
# Resume training from checkpoint
uv run python train_advanced.py \
--config rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B0 \
--resume experiments/*/checkpoints/checkpoint_epoch_0050_640x640_0750.pth \
--epochs 100
# Fine-tuning with smaller learning rate
uv run python train_advanced.py \
--config rgb_hierarchical_unet_v2_fullimage_pretrained_peopleseg_r64x48m128x96_disttrans_contdet_baware_from_B0 \
--pretrained_checkpoint experiments/*/checkpoints/best_model_*.pth \
--learning_rate 1e-5 \
--epochs 20
# Validate single checkpoint
uv run python validate_advanced.py \
experiments/*/checkpoints/best_model_epoch_*_640x640_*.pth \
--val_ann data/annotations/instances_val2017_person_only_no_crowd.json \
--batch_size 16
export_peopleseg_onnx.py
: Export pretrained UNet modelsexport_hierarchical_instance_peopleseg_onnx.py
: Export full hierarchical modelsexport_bilateral_filter.py
: Export bilateral filter post-processingexport_edge_smoothing_onnx.py
: Export edge smoothing modules
https://github.com/PINTO0309/human-instance-segmentation/releases/tag/weights

# Export B0 model to ONNX
uv run python export_hierarchical_instance_peopleseg_onnx.py \
experiments/*/checkpoints/best_model_b0_*.pth \
--output models/b0_model.onnx \
--image_size 640,640
# Export B1 model with 1-pixel dilation
uv run python export_hierarchical_instance_peopleseg_onnx.py \
experiments/*/checkpoints/best_model_b1_*.pth \
--output models/b1_model_dil2.onnx \
--image_size 640,640 \
--dilation_pixels 1
# Export B7 model with custom ROI size
uv run python export_hierarchical_instance_peopleseg_onnx.py \
experiments/*/checkpoints/best_model_b7_*.pth \
--output models/b7_model_ultra.onnx \
--image_size 1024,1024
# Export edge smoothing module
uv run python export_edge_smoothing_onnx.py

# Export bilateral filter
uv run python export_bilateral_filter.py

# Optimize ONNX model with onnxsim
uv run python -m onnxsim models/b0_model.onnx models/b0_model_opt.onnx
# Verify optimized model
uv run python -c "import onnx; model = onnx.load('models/b0_model_opt.onnx'); onnx.checker.check_model(model); print('Model is valid')"
# Test ONNX model on validation images
uv run python test_hierarchical_instance_peopleseg_onnx.py \
--onnx best_model_b1_80x60_0.8551_dil1.onnx \
--annotations data/annotations/instances_val2017_person_only_no_crowd_100imgs.json \
--images_dir data/images/val2017 \
--num_images 5 \
--output_dir test_outputs
# Test with CUDA provider
uv run python test_hierarchical_instance_peopleseg_onnx.py \
--onnx best_model_b1_80x60_0.8551_dil1.onnx \
--annotations data/annotations/instances_val2017_person_only_no_crowd.json \
--provider cuda \
--num_images 10 \
--output_dir test_outputs_cuda
# Test with binary mask visualization (green overlay)
uv run python test_hierarchical_instance_peopleseg_onnx.py \
--onnx best_model_b1_80x60_0.8551_dil1.onnx \
--annotations data/annotations/instances_val2017_person_only_no_crowd.json \
--num_images 20 \
--binary_mode \
--alpha 0.7 \
--output_dir test_binary_masks
# Test with custom score threshold
uv run python test_hierarchical_instance_peopleseg_onnx.py \
--onnx best_model_b1_80x60_0.8551_dil1.onnx \
--annotations data/annotations/instances_val2017_person_only_no_crowd.json \
--num_images 15 \
--score_threshold 0.5 \
--save_masks \
--output_dir test_high_confidence
pip install sit4onnx
# CUDA
sit4onnx -if best_model_b0_64x48_0.8545_dil1.onnx -oep cuda
INFO: file: best_model_b0_64x48_0.8545_dil1.onnx
INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 480, 640] dtype: float32
INFO: input_name.2: rois shape: [1, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 177.2298812866211 ms
INFO: avg elapsed time per pred: 17.72298812866211 ms
INFO: output_name.1: masks shape: [1, 3, 128, 96] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 480, 640] dtype: float32
sit4onnx -if best_model_b1_80x60_0.8551_dil1.onnx -oep cuda
INFO: file: best_model_b1_80x60_0.8551_dil1.onnx
INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: input_name.2: rois shape: [1, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 251.79290771484375 ms
INFO: avg elapsed time per pred: 25.179290771484375 ms
INFO: output_name.1: masks shape: [1, 3, 160, 120] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 640, 640] dtype: float32
# TensorRT
sit4onnx -if best_model_b0_64x48_0.8545_dil1.onnx -oep tensorrt
INFO: file: best_model_b0_64x48_0.8545_dil1.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 480, 640] dtype: float32
INFO: input_name.2: rois shape: [1, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 47.41835594177246 ms
INFO: avg elapsed time per pred: 4.741835594177246 ms
INFO: output_name.1: masks shape: [1, 3, 128, 96] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 480, 640] dtype: float32
sit4onnx -if best_model_b1_80x60_0.8551_dil1.onnx -oep tensorrt
INFO: file: best_model_b1_80x60_0.8551_dil1.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: input_name.2: rois shape: [1, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 68.60971450805664 ms
INFO: avg elapsed time per pred: 6.860971450805664 ms
INFO: output_name.1: masks shape: [1, 3, 160, 120] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 640, 640] dtype: float32
# TensorRT + Multi-ROI
sit4onnx -if best_model_b0_64x48_0.8545_dil1.onnx -oep tensorrt -fs 1 3 480 640 -fs 3 5
INFO: file: best_model_b0_64x48_0.8545_dil1.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 480, 640] dtype: float32
INFO: input_name.2: rois shape: [3, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 65.09065628051758 ms
INFO: avg elapsed time per pred: 6.509065628051758 ms
INFO: output_name.1: masks shape: [3, 3, 128, 96] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 480, 640] dtype: float32
sit4onnx -if best_model_b1_80x60_0.8551_dil1.onnx -oep tensorrt -fs 1 3 640 640 -fs 3 5
INFO: file: best_model_b1_80x60_0.8551_dil1.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: input_name.2: rois shape: [3, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 97.52345085144043 ms
INFO: avg elapsed time per pred: 9.752345085144043 ms
INFO: output_name.1: masks shape: [3, 3, 160, 120] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 640, 640] dtype: float32
sit4onnx -if best_model_b0_64x48_0.8545_dil1.onnx -oep tensorrt -fs 1 3 480 640 -fs 10 5
INFO: file: best_model_b0_64x48_0.8545_dil1.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 480, 640] dtype: float32
INFO: input_name.2: rois shape: [10, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 126.00469589233398 ms
INFO: avg elapsed time per pred: 12.600469589233398 ms
INFO: output_name.1: masks shape: [10, 3, 128, 96] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 480, 640] dtype: float32
sit4onnx -if best_model_b1_80x60_0.8551_dil1.onnx -oep tensorrt -fs 1 3 640 640 -fs 10 5
INFO: file: best_model_b1_80x60_0.8551_dil1.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: input_name.2: rois shape: [10, 5] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time: 196.87914848327637 ms
INFO: avg elapsed time per pred: 19.687914848327637 ms
INFO: output_name.1: masks shape: [10, 3, 160, 120] dtype: float32
INFO: output_name.2: binary_masks shape: [1, 1, 640, 640] dtype: float32
This project is licensed under the MIT License - see below for details:
MIT License
Copyright (c) 2025 Katsuya Hyodo
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
This project builds upon several excellent works in the computer vision community:
We gratefully acknowledge the work by Vladimir Iglovikov (Ternaus) on people segmentation:
- Repository: https://github.com/ternaus/people_segmentation
- Repository: https://github.com/PINTO0309/people_segmentation
@article{tan2019efficientnet,
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
author={Tan, Mingxing and Le, Quoc V},
journal={arXiv preprint arXiv:1905.11946},
year={2019}
}
@inproceedings{lin2014microsoft,
title={Microsoft COCO: Common Objects in Context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Dollár, Piotr and Zitnick, C Lawrence},
booktitle={European Conference on Computer Vision},
pages={740--755},
year={2014},
organization={Springer}
}
@inproceedings{ronneberger2015u,
title={U-Net: Convolutional Networks for Biomedical Image Segmentation},
author={Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={234--241},
year={2015},
organization={Springer}
}
@article{hinton2015distilling,
title={Distilling the Knowledge in a Neural Network},
author={Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff},
journal={arXiv preprint arXiv:1503.02531},
year={2015}
}
- The PyTorch team for the excellent deep learning framework
- The ONNX community for cross-platform model deployment tools
- The Albumentations team for powerful augmentation pipelines
- The Segmentation Models PyTorch contributors for pre-trained encoders