-
Notifications
You must be signed in to change notification settings - Fork 981
Open
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't workinglibcudfAffects libcudf (C++/CUDA) code.Affects libcudf (C++/CUDA) code.
Description
Describe the bug
Series.str.slice_from() fails with a cryptic C++ error when starts and stops parameters have compatible but different integer dtypes (e.g., int32 and int64). The error occurs at the libcudf C++ layer instead of being handled gracefully at the Python API level.
Steps/Code to reproduce bug
import cudf
import cupy as cp
# Enable pandas compatibility mode (where str.len() returns int64)
cudf.set_option("mode.pandas_compatible", True)
# Create a simple string series
s = cudf.Series(["hello", "world", "test"])
# Create starts as int32, stops comes from str.len() as int64
starts = cudf.Series(cp.zeros(len(s), dtype=cp.int32))
stops = s.str.len() # Returns int64 in pandas mode
print(f"starts dtype: {starts.dtype}") # int32
print(f"stops dtype: {stops.dtype}") # int64
# This fails with dtype mismatch
result = s.str.slice_from(starts, stops)Error:
TypeError: CUDF failure at: /tmp/conda-bld-output/bld/rattler-build_libcudf/work/cpp/src/strings/slice.cu:330:
Parameters starts and stops must be of the same type.
Expected behavior
One of the following:
- Automatically cast both to a common dtype (like numpy/pandas type promotion)
- Raise a clear Python-level
TypeErrorwith actionable message: "starts (dtype int32) and stops (dtype int64) must have the same dtype. Consider casting both to the same type." - Document the requirement clearly in the API docs
Environment overview (please complete the following information)
- Environment location: Bare-metal
- Method of cuDF install: conda
Environment details
- cuDF version: 25.12 (latest)
- Python version: 3.13
- CUDA version: 13.0.1
Additional context
This became a breaking change after PR #20368 (merged Oct 28, 2025), which modified str.len() to return int64 in pandas-compatible mode to match pandas behavior. Code that previously worked with int32 slice indices now fails.
Workaround:
result = s.str.slice_from(starts, stops.astype(cp.int32))Metadata
Metadata
Assignees
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't workinglibcudfAffects libcudf (C++/CUDA) code.Affects libcudf (C++/CUDA) code.
Type
Projects
Status
Todo