Skip to content

Commit 99189f8

Browse files
Fix Python scalar UDF refcount leak (#245)
Fixes #224 Fixes the memory leak reported in [https://github.com/duckdb/duckdb-python/issues/224](https://github.com/duckdb/duckdb-python/issues/224) The issue was caused by Python UDF return values not being released after conversion. While this affected all return types, it was only observable for large VARCHAR/bytes values due to their size. The fix ensures correct reference management using `py::reinterpret_steal<py::object>.` A regression test was added to detect refcount leaks in Python scalar UDF execution.
2 parents 45fb522 + 2eaced6 commit 99189f8

File tree

2 files changed

+41
-1
lines changed

2 files changed

+41
-1
lines changed

src/duckdb_py/python_udf.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ static scalar_function_t CreateNativeFunction(PyObject *function, PythonExceptio
326326
}
327327

328328
// Call the function
329-
auto ret = PyObject_CallObject(function, bundled_parameters.ptr());
329+
auto ret = py::reinterpret_steal<py::object>(PyObject_CallObject(function, bundled_parameters.ptr()));
330330
if (ret == nullptr && PyErr_Occurred()) {
331331
if (exception_handling == PythonExceptionHandling::FORWARD_ERROR) {
332332
auto exception = py::error_already_set();
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
import gc
2+
import platform
3+
import sys
4+
5+
import pytest
6+
7+
import duckdb
8+
9+
10+
@pytest.mark.parametrize(("rows", "iters"), [(1000, 20)])
11+
def test_python_scalar_udf_return_value_refcount_does_not_leak(rows, iters):
12+
if platform.python_implementation() != "CPython":
13+
pytest.skip("refcount-based test requires CPython")
14+
15+
payload = b"processed_data_" + b"x" * 8192 # large-ish bytes to mimic the reported issue
16+
17+
def udf_bytes(_):
18+
return payload # Always return the exact same object so we can track its refcount.
19+
20+
# Baseline refcount (note: getrefcount adds a temporary ref)
21+
baseline = sys.getrefcount(payload)
22+
23+
con = duckdb.connect()
24+
con.create_function("udf_bytes", udf_bytes, ["BIGINT"], "VARCHAR")
25+
26+
for _ in range(iters):
27+
con.execute(f"SELECT udf_bytes(range) FROM range({rows})")
28+
res = con.fetchall()
29+
# Drop the result ASAP so we don't keep any refs alive in Python
30+
del res
31+
gc.collect()
32+
33+
# Re-check refcount. In the buggy version this grows by rows*iters (huge).
34+
after = sys.getrefcount(payload)
35+
36+
# Allow a tiny tolerance for transient references/caches.
37+
# In the presence of the leak, this will be thousands+ higher.
38+
assert after <= baseline + 10, (baseline, after)
39+
40+
con.close()

0 commit comments

Comments
 (0)