Skip to content

Commit c87b7ea

Browse files
author
Zohar Mizrahi
authored
[RAPTOR-11836] Remove support for Apache Arrow (#1215)
1 parent 7a58779 commit c87b7ea

File tree

31 files changed

+33
-302
lines changed

31 files changed

+33
-302
lines changed

custom_model_runner/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
#### [1.16.1] - In Progress
8+
##### Changed
9+
- Remove support for Apache Arrow.
10+
711
#### [1.16.0] - 2025-01-08
812
##### Changed
913
- Remove 'mlpiper' dependency and replace its functionality with comparable built-in implementations.

custom_model_runner/README.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -289,28 +289,24 @@ Example: POST http://localhost:6789/predict/; POST http://localhost:6789/predict
289289
For these routes data can be posted in two ways:
290290
* as form data parameter with a <key:value> pair, where:
291291
key = X
292-
value = filename of the `csv/arrow/mtx` format, that contains the inference data.
293-
* as binary data; in case of `arrow` or `mtx` formats, mimetype `application/x-apache-arrow-stream` or `text/mtx` must be set.
292+
value = filename of the `csv/mtx` format, that contains the inference data.
293+
* as binary data; in case of `mtx` format, mimetype `text/mtx` must be set.
294294

295295
* Structured transform route (for Python predictor only):
296296
A POST **URL_PREFIX/transform/** route, which returns transformed data.
297297
Example: POST http://localhost:6789/transform/;
298298
For this route data can be posted in two ways:
299299
* as form data parameter with a <key:value> pair, where:
300300
key = `X`.
301-
value = filename of the `csv/arrow/mtx` format, that contains the inference data.
301+
value = filename of the `csv/mtx` format, that contains the inference data.
302302

303303
optionally a second key, `y`, can be passed with value = a second filename containing target data.
304304

305305
if `y` is passed, the route will return both `X.transformed` and `y.transformed` keys, along with `out.format`
306-
indicating the format of the transformed X output. This will take a value of `csv`,
307-
`sparse` or `arrow`. `y.transformed` is never sparse.
308-
309-
an `arrow_version` key may also be passed if you desire to use `arrow` format for `X.transformed` or `y.transformed`.
310-
this is used to ensure that the endpoint returns data that can be opened by the caller's version of arrow. without this
311-
key, all dense data returned will default to csv format.
306+
indicating the format of the transformed X output. This will take a value of `csv` or `sparse`.
307+
`y.transformed` is never sparse.
312308

313-
* as binary data; in case of `arrow` or `mtx` formats, mimetype `application/x-apache-arrow-stream` or `text/mtx` must be set.
309+
* as binary data; in case of `mtx` format, mimetype `text/mtx` must be set.
314310

315311

316312
* Unstructured predictions routes:

custom_model_runner/datarobot_drum/drum/adapters/model_adapters/python_model_adapter.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828
from datarobot_drum.drum.artifact_predictors.onnx_predictor import ONNXPredictor
2929

3030
from datarobot_drum.drum.common import (
31-
get_pyarrow_module,
3231
reroute_stdout_to_stderr,
3332
SupportedPayloadFormats,
3433
)
@@ -428,9 +427,6 @@ def supported_payload_formats(self):
428427
formats = SupportedPayloadFormats()
429428
formats.add(PayloadFormat.CSV)
430429
formats.add(PayloadFormat.MTX)
431-
pa = get_pyarrow_module()
432-
if pa is not None:
433-
formats.add(PayloadFormat.ARROW, pa.__version__)
434430
return formats
435431

436432
def model_info(self):

custom_model_runner/datarobot_drum/drum/common.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,6 @@ def __init__(self):
6868
PredictionServerMimetypes.TEXT_CSV: PayloadFormat.CSV,
6969
PredictionServerMimetypes.TEXT_PLAIN: PayloadFormat.CSV,
7070
PredictionServerMimetypes.TEXT_MTX: PayloadFormat.MTX,
71-
PredictionServerMimetypes.APPLICATION_X_APACHE_ARROW_STREAM: PayloadFormat.ARROW,
7271
}
7372

7473
def add(self, payload_format, format_version=None):
@@ -86,22 +85,6 @@ def __iter__(self):
8685
yield payload_format, format_version
8786

8887

89-
try:
90-
import pyarrow
91-
except ImportError:
92-
pyarrow = None
93-
94-
95-
def get_pyarrow_module():
96-
return pyarrow
97-
98-
99-
def verify_pyarrow_module():
100-
if pyarrow is None:
101-
raise ModuleNotFoundError("Please install pyarrow to support Arrow format")
102-
return pyarrow
103-
104-
10588
def to_bool(value):
10689
if value is None:
10790
return False

custom_model_runner/datarobot_drum/drum/enum.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -152,15 +152,13 @@ class PredictionServerMimetypes:
152152
APPLICATION_JSON = "application/json"
153153
APPLICATION_OCTET_STREAM = "application/octet-stream"
154154
TEXT_PLAIN = "text/plain"
155-
APPLICATION_X_APACHE_ARROW_STREAM = "application/x-apache-arrow-stream"
156155
TEXT_MTX = "text/mtx"
157156
TEXT_CSV = "text/csv"
158157
EMPTY = ""
159158

160159

161160
class InputFormatExtension:
162161
MTX = ".mtx"
163-
ARROW = ".arrow"
164162
CSV = ".csv"
165163

166164

@@ -181,7 +179,6 @@ class ModelInfoKeys:
181179

182180
InputFormatToMimetype = {
183181
InputFormatExtension.MTX: PredictionServerMimetypes.TEXT_MTX,
184-
InputFormatExtension.ARROW: PredictionServerMimetypes.APPLICATION_X_APACHE_ARROW_STREAM,
185182
}
186183

187184

@@ -413,7 +410,6 @@ class EnvVarNames:
413410

414411
class PayloadFormat:
415412
CSV = "csv"
416-
ARROW = "arrow"
417413
MTX = "mtx"
418414

419415

custom_model_runner/datarobot_drum/drum/root_predictors/predict_mixin.py

Lines changed: 6 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@
3030
)
3131
from datarobot_drum.drum.root_predictors.transform_helpers import (
3232
is_sparse,
33-
make_arrow_payload,
3433
make_csv_payload,
3534
make_mtx_payload,
3635
)
@@ -74,7 +73,7 @@ def _fetch_data_from_request(file_key, logger=None):
7473
else:
7574
wrong_key_error_message = (
7675
"Samples should be provided as: "
77-
" - a csv, mtx, or arrow file under `{}` form-data param key."
76+
" - a csv or mtx under `{}` form-data param key."
7877
" - binary data".format(file_key)
7978
)
8079
if logger is not None:
@@ -126,8 +125,6 @@ def _check_mimetype_support(self, mimetype):
126125
)
127126
+ "Make DRUM support the format or implement `read_input_data` hook to read the data. "
128127
)
129-
if mimetype == PredictionServerMimetypes.APPLICATION_X_APACHE_ARROW_STREAM:
130-
error_message += "pyarrow package may be missing, try to install."
131128
return {"message": error_message}, HTTP_422_UNPROCESSABLE_ENTITY
132129
return None
133130

@@ -193,16 +190,6 @@ def _build_drum_response_json_str(predict_response):
193190
def _transform(self, logger=None):
194191
response_status = HTTP_200_OK
195192

196-
arrow_key = "arrow_version"
197-
arrow_version = request.files.get(arrow_key)
198-
# TODO: check implementation of how arrow_version is passed
199-
# Currently it is passed as a file content,
200-
# so arrow_version is of type werkzeug.datastructures.FileStorage,
201-
# that's why io.BytesIO getvalue is called on it.
202-
if arrow_version is not None:
203-
arrow_version = arrow_version.getvalue().decode("utf-8")
204-
use_arrow = arrow_version is not None
205-
206193
try:
207194
feature_binary_data, feature_mimetype, feature_charset = self._fetch_data_from_request(
208195
"X", logger=logger
@@ -256,32 +243,13 @@ def _transform(self, logger=None):
256243

257244
# make output
258245
if is_sparse(out_data):
259-
if use_arrow:
260-
target_payload = (
261-
make_arrow_payload(out_target, arrow_version)
262-
if out_target is not None
263-
else None
264-
)
265-
target_out_format = "arrow"
266-
else:
267-
target_payload = make_csv_payload(out_target) if out_target is not None else None
268-
target_out_format = "csv"
246+
target_payload = make_csv_payload(out_target) if out_target is not None else None
269247
feature_payload, colnames = make_mtx_payload(out_data)
270248
out_format = "sparse"
271249
else:
272-
if use_arrow:
273-
feature_payload = make_arrow_payload(out_data, arrow_version)
274-
target_payload = (
275-
make_arrow_payload(out_target, arrow_version)
276-
if out_target is not None
277-
else None
278-
)
279-
out_format = "arrow"
280-
else:
281-
feature_payload = make_csv_payload(out_data)
282-
target_payload = make_csv_payload(out_target) if out_target is not None else None
283-
out_format = "csv"
284-
target_out_format = out_format
250+
feature_payload = make_csv_payload(out_data)
251+
target_payload = make_csv_payload(out_target) if out_target is not None else None
252+
out_format = "csv"
285253

286254
out_fields = {
287255
"X.format": out_format,
@@ -306,7 +274,7 @@ def _transform(self, logger=None):
306274
if target_payload is not None:
307275
out_fields.update(
308276
{
309-
"y.format": target_out_format,
277+
"y.format": "csv",
310278
Y_TRANSFORM_KEY: (
311279
Y_TRANSFORM_KEY,
312280
target_payload,

custom_model_runner/datarobot_drum/drum/root_predictors/transform_helpers.py

Lines changed: 0 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
from scipy.sparse import issparse
1717
from scipy.sparse.csr import csr_matrix
1818

19-
from datarobot_drum.drum.common import verify_pyarrow_module
2019
from datarobot_drum.drum.enum import X_FORMAT_KEY, X_TRANSFORM_KEY
2120

2221

@@ -73,30 +72,6 @@ def validate_and_convert_column_names_for_serialization(df):
7372
return df
7473

7574

76-
def make_arrow_payload(df, arrow_version):
77-
pa = verify_pyarrow_module()
78-
df = validate_and_convert_column_names_for_serialization(df)
79-
80-
pyarrow_available_version = version.parse(pa.__version__)
81-
pyarrow_requested_version = version.parse(arrow_version)
82-
pyarrow_0_20_version = version.parse("0.20")
83-
84-
if (
85-
pyarrow_requested_version != pyarrow_available_version
86-
and pyarrow_requested_version < pyarrow_0_20_version
87-
):
88-
batch = pa.RecordBatch.from_pandas(df, nthreads=None, preserve_index=False)
89-
sink = pa.BufferOutputStream()
90-
options = pa.ipc.IpcWriteOptions(
91-
metadata_version=pa.MetadataVersion.V4, use_legacy_format=True
92-
)
93-
with pa.RecordBatchStreamWriter(sink, batch.schema, options=options) as writer:
94-
writer.write_batch(batch)
95-
return sink.getvalue().to_pybytes()
96-
else:
97-
return pa.ipc.serialize_pandas(df, preserve_index=False).to_pybytes()
98-
99-
10075
def make_csv_payload(df):
10176
df = validate_and_convert_column_names_for_serialization(df)
10277

@@ -107,14 +82,6 @@ def make_csv_payload(df):
10782
return s_buf.getvalue()[:-2].encode("utf-8")
10883

10984

110-
def read_arrow_payload(response_dict, transform_key):
111-
pa = verify_pyarrow_module()
112-
113-
bytes = response_dict[transform_key]
114-
df = pa.ipc.deserialize_pandas(bytes)
115-
return df
116-
117-
11885
def read_csv_payload(response_dict, transform_key):
11986
bytes = response_dict[transform_key]
12087
return pd.read_csv(BytesIO(bytes))
@@ -159,7 +126,6 @@ def _sparse(data, key):
159126
return pd.DataFrame.sparse.from_spmatrix(read_mtx_payload(data, key))
160127

161128
reader = {
162-
"arrow": read_arrow_payload,
163129
"sparse": _sparse,
164130
"csv": read_csv_payload,
165131
}

custom_model_runner/datarobot_drum/drum/utils/structured_input_read_utils.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
import pandas as pd
1313
from scipy.io import mmread
1414

15-
from datarobot_drum.drum.common import get_pyarrow_module
1615
from datarobot_drum.drum.enum import (
1716
InputFormatToMimetype,
1817
PredictionServerMimetypes,
@@ -67,22 +66,6 @@ def read_structured_input_data_as_df(binary_data, mimetype, sparse_colnames=None
6766
return pd.DataFrame.sparse.from_spmatrix(
6867
mmread(io.BytesIO(binary_data)), columns=sparse_colnames
6968
)
70-
elif mimetype == PredictionServerMimetypes.APPLICATION_X_APACHE_ARROW_STREAM:
71-
df = get_pyarrow_module().ipc.deserialize_pandas(binary_data)
72-
73-
# After CSV serialization+deserialization,
74-
# original dataframe's None and np.nan values
75-
# become np.nan values.
76-
# After Arrow serialization+deserialization,
77-
# original dataframe's None and np.nan values
78-
# become np.nan for numeric columns and None for 'object' columns.
79-
#
80-
# Since we are supporting both CSV and Arrow,
81-
# to be consistent with CSV serialization/deserialization,
82-
# it is required to replace all None with np.nan for Arrow.
83-
df.fillna(value=np.nan, inplace=True)
84-
85-
return df
8669
else: # CSV format
8770
try:
8871
df = pd.read_csv(io.BytesIO(binary_data))

custom_model_runner/drum_server_api.yaml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -218,8 +218,6 @@ paths:
218218
type: object
219219
description: If format is supported, property present in the object. Property's value is a package version. If version is not pinned, value is null.
220220
properties:
221-
arrow:
222-
type: string
223221
csv:
224222
type: string
225223
mtx:
@@ -232,7 +230,6 @@ paths:
232230
type: boolean
233231
example:
234232
supported_payload_formats:
235-
arrow: 2.0.0
236233
csv: null
237234
mtx: null
238235
/URL_PREFIX/predict/:
@@ -256,11 +253,6 @@ paths:
256253
description: Scoring data.
257254
type: string
258255
format: text
259-
application/x-apache-arrow-stream:
260-
schema:
261-
description: Scoring data.
262-
type: string
263-
format: binary
264256
multipart/form-data:
265257
schema:
266258
description: Scoring data.
@@ -290,7 +282,7 @@ paths:
290282
type: string
291283
description: Status message
292284
example:
293-
message: "ERROR: Samples should be provided as: - a csv, mtx, or arrow file under `X` form-data param key. - binary data."
285+
message: "ERROR: Samples should be provided as: - a csv or mtx under `X` form-data param key. - binary data."
294286
/URL_PREFIX/predictions/:
295287
$ref: "#/paths/~1URL_PREFIX~1predict~1"
296288

custom_model_runner/requirements.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ strictyaml==1.4.2
1414
PyYAML
1515
texttable
1616
py4j~=0.10.9.0
17-
# only constrained by other packages, not DRUM
18-
pyarrow
1917
Pillow
2018
# constrained by Julia env
2119
julia<=0.5.7

0 commit comments

Comments
 (0)