-
Notifications
You must be signed in to change notification settings - Fork 162
Description
Bug Report: Segmentation Mask Uses Instance Index Instead of Class Label and Has Data Type Limitation
Describe the bug
The segmentationDecode function in DetectionParserUtils.cpp has two main issues:
-
Semantic Issue: The function currently uses detection instance indices (
detIdx) as values in the segmentation mask instead of using the actual class labels fromDetectionCandidate.label. This makes the segmentation mask less meaningful because instance indices are arbitrary and change between runs. -
Data Type Limitation: The segmentation mask uses
uint8_tdata type which limits the maximum number of distinguishable classes to 254 (since value 255 is reserved for "unassigned" pixels). When class labels exceed 254, they get mapped to the same value, making it impossible to distinguish between different classes.
Current problematic code
In the segmentationDecode function:
const int detIdx = static_cast<int>(i); // index in outDetections list
// ...
const uint8_t value = static_cast<uint8_t>(std::min(detIdx, 254));Minimal Reproducible Example
Using any YOLO model with instance segmentation enabled that outputs class labels greater than 254 will demonstrate both issues:
- The segmentation mask uses instance indices rather than semantic class labels
- Class labels above 254 all get mapped to value 254 in the mask
Expected behavior
The segmentation mask should:
-
Use class labels from
DetectionCandidate.labelinstead of instance indices:const int detLabel = detectionCandidates[i].label; // Use class label const uint8_t value = static_cast<uint8_t>(std::min(detLabel, 254));
-
Either support more classes through a larger data type (e.g.,
uint16_t) or provide warnings when class labels exceed the supported range
Additional context
These issues affect the usability of the instance segmentation feature in DepthAI. The first issue impacts semantic clarity, while the second imposes a hard limit on the number of classes that can be properly distinguished in segmentation masks.