-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
Hello, I think I encountered a bug in sagemaker.local. I'm trying to test a batch transform with images as input, but I get the following error even before I reach the input_fn of my custom inference script
│ 345 │ │ for element in self.splitter.split(file):
│ ❱ 346 │ │ │ if _payload_size_within_limit(buffer + element, size):
│ 347 │ │ │ │ buffer += element
│ 348 │ │ │ else:
│ 349 │ │ │ │ tmp = buffer
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "bytes") to str
I am not using a splitter (splitter type is None), as it's not necessary on images.
I believe the problem is in line 343 of MultRecordStrategy class
sagemaker-python-sdk/src/sagemaker/local/data.py
Lines 326 to 352 in ae3cc1c
| class MultiRecordStrategy(BatchStrategy): | |
| """Feed multiple records at a time for batch inference. | |
| Will group up as many records as possible within the payload specified. | |
| """ | |
| def pad(self, file, size=6): | |
| """Group together as many records as possible to fit in the specified size. | |
| Args: | |
| file (str): file path to read the records from. | |
| size (int): maximum size in MB that each group of records will be | |
| fitted to. passing 0 means unlimited size. | |
| Returns: | |
| generator of records | |
| """ | |
| buffer = "" | |
| for element in self.splitter.split(file): | |
| if _payload_size_within_limit(buffer + element, size): | |
| buffer += element | |
| else: | |
| tmp = buffer | |
| buffer = element | |
| yield tmp | |
| if _validate_payload_size(buffer, size): | |
| yield buffer |
We can see that the buffer variable is assumed to be a string, which means it's assumed that the file variable would not refer to a binary object, which should be possible.
To reproduce
Just run local batch transform with a single image as input. The model doesn't really matter I think, it will fail before any prediction or interaction between data and the model is made.
Expected behavior
I would expect the buffer to be sensitive to weather the file is a string like json or csv, or a binary type like png.
Screenshots or logs
See above.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.237.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch, custom inference and model
- Framework version: 2.5.1
- Python version: 3.11
- CPU or GPU: Both
- Custom Docker image (Y/N): Y, extending the pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker image
Additional context
Add any other context about the problem here.