Skip to content

Update dicom_seg_writer_operator.py #517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kavmar
Copy link

@kavmar kavmar commented Jan 17, 2025

Hi @CPBridge

Proposing these changes with respect to #512 (comment)

Proposing these changes with respect to Project-MONAI#512 (comment)

Signed-off-by: kavmar <[email protected]>
@CPBridge
Copy link
Collaborator

Hi @kavmar,

Sorry for slow response as I am travelling over these few weeks. Thanks for stepping up and creating this PR. However, there are a few problems that I see here.

Firstly, I feel strongly that a segment description should not be omitted purely because it does not happen to be present in a particular image. It conveys important information to the receiver of the file when a segment description is included when the corresponding segment is not present: it indicates that the creator of the file checked for the presence of that segment and determined that it was not present. This information is lost if you just omit all segments that are not present. Therefore, the proposed changes should be limited to correcting the mapping of input values to segment numbers stored in the file, and should not attempt to remove or change the segment descriptions themselves.

Secondly, the proposed method for specifying this behaviour could be streamlined in my opinion. The force_continguous_labels parameter does not seem necessary. Why not just enable the behaviour when label_mapping_dict is present, and disable it when label_mapping_dict is None? This would save a parameter. Alternatively, in my personal opinion, I think this could be streamlined even further by removing both the force_contiguous_labels and label_mapping_dict parameters, and adding a further optional parameter to the SegmentDescription class (the one defined in monai app sdk not the class of the same name in highdicom) that specifies the pixel value that segment will take in the input segmentation masks. The writer operator class can then use this information to determine the mapping of input pixel values to segment numbers: if the input pixel value is specified for a given segment, it is used as provided, if it was not specified in the segment description it s assumed to the position of the segment description in the list (the current behaviour). To me this feels neater because it groups all the related information about each segment into one place (the SegmentDescription) rather than splitting it up into different places.

Also, the logic to relabel the segmentation mask is currently a little overcomplicated and will be slow for large arrays. There is a simple numpy trick to do this efficiently in a single operation, see here.

Another minor comment is that I would not make class properties "public" unless there is a good reason to do so. Therefore I would suggest renaming self.force_contiguous_labels to self._force_contiguous_labels and self.label_mapping_dict to self._label_mapping_dict (if these are even kept of course). This gives us more flexibility to change how things work under the hood in the future without breaking the public API.

@kavmar
Copy link
Author

kavmar commented Feb 2, 2025

Hi @CPBridge and thanks for response.

Hi @kavmar,

Sorry for slow response as I am travelling over these few weeks. Thanks for stepping up and creating this PR. However, there are a few problems that I see here.

Firstly, I feel strongly that a segment description should not be omitted purely because it does not happen to be present in a particular image. It conveys important information to the receiver of the file when a segment description is included when the corresponding segment is not present: it indicates that the creator of the file checked for the presence of that segment and determined that it was not present. This information is lost if you just omit all segments that are not present. Therefore, the proposed changes should be limited to correcting the mapping of input values to segment numbers stored in the file, and should not attempt to remove or change the segment descriptions themselves.

I get your point. However, based on my experience, I am concerned with downstream usability. Imagine that the SEG is a result of a model, which is used as an initial segmentation of a multilabel segmentation, such as TotalSegmentator or Vista3D, and the consumer would use it to further finetune some of the classes. I case the input volume would be only a subregion (head) of all possible model outputs (wholebody), the resulting SEG file would have in your described situation all the classes and the user would need to manually check if there aren't any voxels miss classified. This would be very cumbersome, to go through all the classes and check them.
I could imagine to leave the empty segments in if they would be somehow clearly marked. In a head scan, something like "Aorta - undetected" or "Liver - empty", so that the user would know right away that he/she doesn't need to check false positives.

Secondly, the proposed method for specifying this behaviour could be streamlined in my opinion. The force_continguous_labels parameter does not seem necessary. Why not just enable the behaviour when label_mapping_dict is present, and disable it when label_mapping_dict is None? This would save a parameter. Alternatively, in my personal opinion, I think this could be streamlined even further by removing both the force_contiguous_labels and label_mapping_dict parameters, and adding a further optional parameter to the SegmentDescription class (the one defined in monai app sdk not the class of the same name in highdicom) that specifies the pixel value that segment will take in the input segmentation masks. The writer operator class can then use this information to determine the mapping of input pixel values to segment numbers: if the input pixel value is specified for a given segment, it is used as provided, if it was not specified in the segment description it s assumed to the position of the segment description in the list (the current behaviour). To me this feels neater because it groups all the related information about each segment into one place (the SegmentDescription) rather than splitting it up into different places.

My proposal would leave the option to the user to have the current behavior and leave undetected classes in the SEG file. I am not trying to push my solution. Just to give a space to have the option for those who need or want it. That is why I introduced the parameter 'force_contiguous_labels = False'.

Also, the logic to relabel the segmentation mask is currently a little overcomplicated and will be slow for large arrays. There is a simple numpy trick to do this efficiently in a single operation, see here.

Another minor comment is that I would not make class properties "public" unless there is a good reason to do so. Therefore I would suggest renaming self.force_contiguous_labels to self._force_contiguous_labels and self.label_mapping_dict to self._label_mapping_dict (if these are even kept of course). This gives us more flexibility to change how things work under the hood in the future without breaking the public API.

Cleaning up the code, as you propose is of course OK, including more clear names, ...

How do we continue?

@kavmar
Copy link
Author

kavmar commented Feb 3, 2025

Hi @kavmar,

Sorry for slow response as I am travelling over these few weeks. Thanks for stepping up and creating this PR. However, there are a few problems that I see here.

Firstly, I feel strongly that a segment description should not be omitted purely because it does not happen to be present in a particular image. It conveys important information to the receiver of the file when a segment description is included when the corresponding segment is not present: it indicates that the creator of the file checked for the presence of that segment and determined that it was not present. This information is lost if you just omit all segments that are not present. Therefore, the proposed changes should be limited to correcting the mapping of input values to segment numbers stored in the file, and should not attempt to remove or change the segment descriptions themselves.

A quick note: TotalSegmentator also exports only non-empty masks: https://github.com/wasserth/TotalSegmentator/blob/0fe651c7680d76f64dad9ae4a5be69290c184617/totalsegmentator/dicom_io.py#L150

@CPBridge
Copy link
Collaborator

CPBridge commented Mar 3, 2025

Hi @kavmar I'm sorry again for my slowness here, and thanks for your response. I will try and monitor this thread more closely so that we can make progress!

My proposal would leave the option to the user to have the current behavior and leave undetected classes in the SEG file. I am not trying to push my solution. Just to give a space to have the option for those who need or want it. That is why I introduced the parameter 'force_contiguous_labels = False'.

Yes I think that it is fine to add an option, but I don't think it should be the default behaviour because the semantics are very different. It also would lead to odd situations like it being impossible to create an empty segmentation mask. How would you deal with that? It's definitely going to arise sometimes. Just error out? Or include all empty segments in that case?

A quick note: TotalSegmentator also exports only non-empty masks: https://github.com/wasserth/TotalSegmentator/blob/0fe651c7680d76f64dad9ae4a5be69290c184617/totalsegmentator/dicom_io.py#L150

Interesting, but I don't think it is especially important what one other tool chooses to do, I'm afraid.

I could imagine to leave the empty segments in if they would be somehow clearly marked. In a head scan, something like "Aorta - undetected" or "Liver - empty", so that the user would know right away that he/she doesn't need to check false positives.

There's just no mechanism to do that in a standardized way in DICOM, so I think we have to rule this out. We are limited by what is allowed by the format.

I think for the purposes of the conversation we need to be clear that there are actually two quite distinct problems that you are trying to address here. They seem similar but they are actually totally separate:

  1. Matching segment descriptions to input array values. Currently the writer assumes that each segment description is matched to pixels in the input array that have a value equal to the one-based index of the description of the item in the segment descriptions. You want the ability to optionally break this connection and specify arbitrary pixel array values for each segment description, and that these values may be non-contiguous. But then due to the constraints of the DICOM segmentation, this requires renumbering the pixels before encoding.
  2. Omission of described but empty segments. You want an option to check for and omit segments that do have a description but that are entirely empty. This then of course also involves renumbering the segments in order to meet the constraints.

Now of course, this creates quite some complexity because the options are combinatorial. The user may want one of these behaviours but not the other. so you need to deal with 4 situations:
a) The input arrays use consecutive segment numbers and any empty segments are retained (the only current case currently supported, requires no renumbering).
b) The input arrays use consecutive segment numbers but any empty segments is omitted. This needs renumbering to account for any missing segments.
c) The input arrays use arbitrary segment numbers but any empty segments are retained. This needs renumbering to account for the non-consecutive input numbers.
d) The input arrays use arbitrary segment numbers and any missing segment is omitted. This needs renumbering for two reasons, and needs to be handled carefully to account for both correctly.

I am okay to try to support both options, but we need to recognize the complexity. I am not entirely sure which of the two questions force_contiguous_labels is supposed to address currently. Can you clarify? It seems like it's sort of a combination of both, which I think conflates two quite different concerns.

Also I mentioned above the new LABELMAP segmentations, which don't have the limitation on consecutive segments. This is now fully supported in highdicom 0.24.0+ which is publicly released. Therefore, if you are more concerned about my question 1 above, we could just add an option for labelmap segmentations which would remove the constraint that segment numbers be consecutive and eliminate question 1 when the labelmap option is used. The downside is that currently only highdicom itself knows how to read and write these files at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants