Skip to content

Conversation

@timosachsenberg
Copy link
Collaborator

@timosachsenberg timosachsenberg commented Jan 9, 2026

This change allows conversion providers (like StdStringUnicodeConverter) to work inside containers (like std::vector) by adding a delegation mechanism.

Changes:

  • Add supports_delegation() method to TypeConverterBase (default: False)
  • Enable delegation for StdStringUnicodeConverter and StdStringUnicodeOutputConverter
  • Modify StdVectorConverter to delegate to element converters when supports_delegation() returns True
  • Add test for UTF-8 string vectors

This fixes the limitation where UTF-8 string converters would not be picked up when used inside containers. Now, vectors of libcpp_utf8_string or libcpp_utf8_output_string properly encode/decode UTF-8 strings.

Summary by CodeRabbit

  • New Features

    • Element-wise delegation for container conversions, enabling per-item conversion and cleanup across common containers (single-level nesting).
  • Tests

    • Added UTF‑8 test suite exercising multilingual strings and round‑trip conversions for vectors, sets, maps and unordered containers.
  • Documentation

    • Updated docstrings to clarify delegation behavior for container elements and nesting limitations.

✏️ Tip: You can customize this high-level summary in your review settings.

This change allows conversion providers (like StdStringUnicodeConverter)
to work inside containers (like std::vector) by adding a delegation
mechanism.

Changes:
- Add supports_delegation() method to TypeConverterBase (default: False)
- Enable delegation for StdStringUnicodeConverter and
  StdStringUnicodeOutputConverter
- Modify StdVectorConverter to delegate to element converters when
  supports_delegation() returns True
- Add test for UTF-8 string vectors

This fixes the limitation where UTF-8 string converters would not be
picked up when used inside containers. Now, vectors of libcpp_utf8_string
or libcpp_utf8_output_string properly encode/decode UTF-8 strings.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Adds per-element delegation support for container conversions by introducing supports_delegation() on converters, enabling UTF‑8 string converters to delegate element-wise conversion and cleanup; extends container input/output conversion paths and registry checks; adds C++/Cython UTF‑8 test fixtures and Python tests exercising delegation paths.

Changes

Cohort / File(s) Summary
Core Delegation Feature
autowrap/ConversionProvider.py
Add supports_delegation(self) -> bool to TypeConverterBase (default False); override in StdStringUnicodeConverter and StdStringUnicodeOutputConverter to return True. Add _has_delegating_converter() helper. Extend container input/output conversion branches to delegate per-element when element converters support delegation; add per-container cleanup/back-propagation logic.
UTF‑8 Vector Tests (C++)
tests/test_files/libcpp_utf8_string_vector_test.hpp
New Utf8VectorTest C++ class with get_greetings(), echo(...), and count_strings(...) using std::vector<std::string] containing UTF‑8 examples.
UTF‑8 Vector Declarations (Cython)
tests/test_files/libcpp_utf8_string_vector_test.pxd
New Cython extern declarations mapping Utf8VectorTest methods to libcpp.vector / libcpp.string aliases (language_level=3).
UTF‑8 Containers Tests (C++)
tests/test_files/libcpp_utf8_containers_test.hpp
New Utf8ContainersTest C++ class with getters/echoes for vector, set, map, unordered_set, unordered_map using UTF‑8 std::string.
UTF‑8 Containers Declarations (Cython)
tests/test_files/libcpp_utf8_containers_test.pxd
New Cython extern declarations for Utf8ContainersTest covering multiple container templates and UTF‑8 string input/output types.
Python Tests
tests/test_code_generator.py
Add test_utf8_string_vector_conversion() and test_utf8_containers_delegation() to validate round-trip and delegated per-element conversion for UTF‑8 strings; duplicated test blocks present in diff.

Sequence Diagram(s)

sequenceDiagram
    actor Python
    participant ConversionProvider
    participant ConverterRegistry
    participant ElementConverter
    participant CppAPI

    Python->>ConversionProvider: call function with Python container
    ConversionProvider->>ConverterRegistry: lookup element converter
    ConverterRegistry-->>ConversionProvider: ElementConverter
    ConversionProvider->>ElementConverter: supports_delegation()?
    alt delegating
        ConversionProvider->>ElementConverter: convert element (per-element)
        ElementConverter-->>ConversionProvider: converted element
        ConversionProvider->>CppAPI: assemble container & call C++ API
        CppAPI-->>ConversionProvider: returns C++ container
        ConversionProvider->>ElementConverter: convert output element back to Python
        ElementConverter-->>ConversionProvider: Python element
        ConversionProvider-->>Python: return Python container
    else non-delegating
        ConversionProvider->>CppAPI: use standard container conversion path
        CppAPI-->>ConversionProvider: returns C++ container
        ConversionProvider-->>Python: return Python container
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

  • jpfeuffer

Poem

🐰 I nibble bytes and strings with gentle care,
I hop through vectors, swapping here and there,
From "Hello" to "你好" I guide each little part,
Delegating conversions straight from my heart,
A tiny rabbit patch — a multilingual art!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Design conversion provider delegation system' directly describes the main change: introducing a delegation mechanism for conversion providers to operate as element converters inside containers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #244

coderabbitai bot added a commit that referenced this pull request Jan 9, 2026
Docstrings generation was requested by @timosachsenberg.

* #243 (comment)

The following files were modified:

* `autowrap/ConversionProvider.py`
@jpfeuffer
Copy link
Contributor

Thanks a lot Timo!!
I'm sorry but I just had another idea! I think what actually happens is that for annotations like libcpp_vector[libcpp_string] or any libcpp standard type, the cython autoconversion abilities are used. And these are currently set to ASCII

|#cython: c_string_encoding=ascii

One could probably try to change that but I think it is a global change so you wouldn't be able to change it for some functions but not others. Probably not so important.

This PR here looks ok I think but I wonder if it works for further nesting and if not if it's worth to support further nesting. Deep nesting in autowrap is in general very unstable and untested. Especially when mixing containers 😅
In any case, nesting limitations if any should be documented better.
Maybe I let the AI do that at some point.

@timosachsenberg
Copy link
Collaborator Author

I would not support further nesting.
What would be the implications if we change this from ascii to utf? Should we work towards that or does this cause other issues?
Ok so we can merge this for now?

Delegation only works for single-level containers (e.g., vector<T>).
Nested containers like vector<vector<T>> fall through to Cython's
automatic conversion, bypassing the delegating converter.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @autowrap/ConversionProvider.py:
- Around line 1915-1986: The delegation input-conversion branch inconsistently
pushes items into $temp_var: when elem_code has content it uses a direct cast
"<$inner>$item" but when elem_code is empty it uses $elem_call_as; change the
first branch to push $elem_call_as instead so both branches use the
element_converter.input_conversion result consistently (inspect variables
element_converter, elem_code, elem_call_as, $temp_var, $inner and $argument_var
and replace the push_back call in the elem_code.present branch to use
$elem_call_as).
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b5ebfb and 7372e01.

📒 Files selected for processing (1)
  • autowrap/ConversionProvider.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.2.0, 3.12)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.2.0, 3.10)
  • GitHub Check: test (==3.1.0, 3.11)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.1.0, 3.12)
🔇 Additional comments (4)
autowrap/ConversionProvider.py (4)

137-151: LGTM! Clear API design with well-documented limitations.

The supports_delegation() method provides a clean opt-in mechanism for converters to enable element-wise delegation inside containers. The docstring clearly documents the single-level nesting limitation, which aligns with the PR objectives.


2549-2550: LGTM! Delegation enabled for UTF-8 string converters.

Both input and output UTF-8 string converters now correctly opt into delegation, enabling proper UTF-8 encoding/decoding for strings inside containers.

Also applies to: 2595-2596


2084-2111: LGTM! Output delegation logic is well-implemented.

The output conversion correctly:

  • Detects delegating converters using _has_delegating_converter()
  • Retrieves element output conversion via element_converter.output_conversion()
  • Handles different return types (None, Code object, or string)
  • Properly iterates through the vector and applies per-element conversion

The implementation is consistent with the pattern used for other wrapped types in the same method.


1687-1695: Good defensive implementation of delegation detection.

The _has_delegating_converter() helper properly:

  • Checks for converter registry availability via hasattr(self, "cr")
  • Handles exceptions gracefully with try/except
  • Returns False as safe default when delegation is not supported

This defensive approach prevents errors when the method is called in contexts where the converter registry might not be available.

Extended delegation support to:
- StdSetConverter (ordered set)
- StdMapConverter (ordered map)
- StdUnorderedSetConverter (hash set)
- StdUnorderedMapConverter (hash map)

For maps, both keys and values can independently use delegating
converters (e.g., libcpp_utf8_output_string for UTF-8 decoding).

Note: Same single-level nesting limitation applies - nested containers
like map<string, vector<libcpp_utf8_string>> won't delegate.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @autowrap/ConversionProvider.py:
- Around line 1286-1294: The delegation check helpers (e.g.,
_has_delegating_converter) are checking self.cr but _set_converter_registry
currently assigns self.converters, causing delegation to silently fail; fix by
making _set_converter_registry assign self.cr = r (in addition to or instead of
self.converters) so methods like _has_delegating_converter, the other delegation
helpers referenced in the comment, and any calls to self.cr.get(...) find the
registry; ensure you update _set_converter_registry (the method that sets the
registry) and verify no other places rely exclusively on self.converters without
having self.cr set.
🧹 Nitpick comments (1)
autowrap/ConversionProvider.py (1)

1417-1484: LGTM! Input delegation logic correctly handles element conversion and cleanup.

The input conversion delegation for sets and vectors properly:

  • Invokes element converters' input_conversion
  • Handles both Code objects and simple conversions
  • Implements cleanup for reference parameters to reflect changes back to Python

The logic for handling elem_code.content (line 2109) correctly distinguishes between converters that need additional variable declarations versus those that don't.

Optional: Consider adding delegation support to StdDequeConverter and StdListConverter.

Currently, only StdSetConverter, StdVectorConverter, StdUnorderedSetConverter, StdMapConverter, and StdUnorderedMapConverter support delegation. For consistency and completeness, you might want to add delegation support to:

  • StdDequeConverter (lines 3548-3709)
  • StdListConverter (lines 3712-3889)

This would allow UTF-8 string vectors to work with all sequence container types, not just vectors and sets.

Also applies to: 2089-2160, 3369-3436

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7372e01 and abf3ca5.

📒 Files selected for processing (1)
  • autowrap/ConversionProvider.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.2.0, 3.10)
  • GitHub Check: test (==3.1.0, 3.11)
  • GitHub Check: test (==3.2.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.1.0, 3.12)
🔇 Additional comments (3)
autowrap/ConversionProvider.py (3)

137-151: LGTM! Well-documented delegation interface.

The supports_delegation() method provides a clean extension point for converters. The documentation clearly explains the single-level nesting limitation, which is important for users to understand.


2723-2724: LGTM! String converters correctly enable delegation.

The overrides in both StdStringUnicodeConverter and StdStringUnicodeOutputConverter properly enable delegation support, allowing UTF-8 string handling inside containers.

Also applies to: 2769-2770


1217-1271: LGTM! Consistent delegation implementation across containers.

The output conversion delegation logic is consistently implemented across all container converters:

  • Checks for delegating converters on element types
  • Properly handles None, Code objects, and string returns from element converters
  • Builds appropriate conversion code for each container type

The implementation correctly supports delegation for both keys and values in maps, which is well thought out.

Also applies to: 1551-1576, 2258-2285, 3163-3218, 3507-3531

Tests cover all container types with UTF-8 string delegation:
- std::vector (input and output)
- std::set (input and output)
- std::map (UTF-8 values and UTF-8 keys)
- std::unordered_set (input and output)
- std::unordered_map (input and output)

Also fixes StdUnorderedMapConverter input_conversion to properly
delegate to element converters for keys and values.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/test_code_generator.py (2)

401-450: Solid test coverage for UTF-8 vector delegation.

The test comprehensively covers output conversion (C++ → Python), input conversion (Python → C++), and multiple input types (str/bytes). However, consider two minor improvements:

  1. Python version consistency: Line 426 uses an f-string (Python 3.6+), while the file includes __future__ imports suggesting potential Python 2 support. The existing test_automatic_output_string_conversion (lines 388-390) explicitly handles str vs unicode based on Python version. If Python 2 support is no longer needed, the __future__ imports could be removed for clarity.

  2. Edge cases: Consider testing empty containers and potentially malformed UTF-8 sequences to ensure delegation handles these gracefully.

Optional: Add Python version handling for consistency

If Python 2 support is needed:

+    expected_type = str
+    if sys.version_info[0] < 3:
+        expected_type = unicode
+
     # All elements should be unicode strings (str in Python 3)
     for s in greetings:
-        assert isinstance(s, str), f"Expected str, got {type(s)}"
+        assert isinstance(s, expected_type), "Expected %s, got %s" % (expected_type.__name__, type(s).__name__)

Apply similar pattern to line 438.


452-534: Excellent comprehensive coverage of container delegation.

This test thoroughly validates UTF-8 delegation across all container types (vector, set, map, unordered variants) with diverse UTF-8 content (Russian, Chinese, Arabic, Portuguese). The test structure clearly demonstrates the PR's core functionality: UTF-8 string converters now properly operate when used as element converters inside containers.

Note: The static analysis RUF001 warnings (lines 516, 527) about "ambiguous" Cyrillic characters are false positives—these are intentional UTF-8 test fixtures and should be ignored.

Optional enhancement: Consider adding tests for empty containers to ensure delegation handles edge cases:

# Empty container tests
empty_vec = h.echo_vector([])
assert empty_vec == []

empty_set = h.echo_set(set())
assert empty_set == set()

empty_map = h.echo_map({})
assert empty_map == {}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf1af41 and e37c2e3.

📒 Files selected for processing (2)
  • tests/test_code_generator.py
  • tests/test_files/libcpp_utf8_containers_test.pxd
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_files/libcpp_utf8_containers_test.pxd
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_code_generator.py (2)
tests/test_files/libcpp_utf8_string_vector_test.hpp (1)
  • Utf8VectorTest (6-6)
tests/test_files/libcpp_utf8_containers_test.hpp (1)
  • Utf8ContainersTest (10-10)
🪛 Ruff (0.14.10)
tests/test_code_generator.py

516-516: String contains ambiguous Н (CYRILLIC CAPITAL LETTER EN). Did you mean H (LATIN CAPITAL LETTER H)?

(RUF001)


516-516: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?

(RUF001)


516-516: String contains ambiguous б (CYRILLIC SMALL LETTER BE). Did you mean 6 (DIGIT SIX)?

(RUF001)


516-516: String contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)


516-516: String contains ambiguous р (CYRILLIC SMALL LETTER ER). Did you mean p (LATIN SMALL LETTER P)?

(RUF001)


527-527: String contains ambiguous З (CYRILLIC CAPITAL LETTER ZE). Did you mean 3 (DIGIT THREE)?

(RUF001)


527-527: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?

(RUF001)


527-527: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?

(RUF001)


527-527: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?

(RUF001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.1.0, 3.11)

Change libcpp_map declarations to use libcpp_utf8_output_string for keys,
enabling direct string access without bytes prefix, matching unordered_map.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_code_generator.py (1)

401-450: Consider Python 2/3 compatibility consistency.

The new test uses f-strings (lines 426, 438) and directly checks for str type, which differs from the existing pattern in this file. The existing test_automatic_output_string_conversion() (lines 389-390) explicitly handles Python 2/3 differences:

expected_type = str
if sys.version_info[0] < 3:
    expected_type = unicode

If Python 2 support has been dropped project-wide, this is fine. However, the file still includes Python 2/3 compatibility imports (from __future__ import print_function), creating inconsistency. If Python 2 is no longer supported, consider removing these imports file-wide for clarity.

♻️ Optional: Align with existing pattern if Python 2 still supported
     # All elements should be unicode strings (str in Python 3)
+    expected_type = str
+    if sys.version_info[0] < 3:
+        expected_type = unicode
     for s in greetings:
-        assert isinstance(s, str), f"Expected str, got {type(s)}"
+        assert isinstance(s, expected_type), "Expected %s, got %s" % (expected_type, type(s))
     assert "Hello" in greetings

Apply similar changes to line 438.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e37c2e3 and e6852c9.

📒 Files selected for processing (2)
  • tests/test_code_generator.py
  • tests/test_files/libcpp_utf8_containers_test.pxd
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_files/libcpp_utf8_containers_test.pxd
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_code_generator.py (2)
tests/test_files/libcpp_utf8_string_vector_test.hpp (1)
  • Utf8VectorTest (6-6)
tests/test_files/libcpp_utf8_containers_test.hpp (1)
  • Utf8ContainersTest (10-10)
🪛 Ruff (0.14.10)
tests/test_code_generator.py

517-517: String contains ambiguous Н (CYRILLIC CAPITAL LETTER EN). Did you mean H (LATIN CAPITAL LETTER H)?

(RUF001)


517-517: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?

(RUF001)


517-517: String contains ambiguous б (CYRILLIC SMALL LETTER BE). Did you mean 6 (DIGIT SIX)?

(RUF001)


517-517: String contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)


517-517: String contains ambiguous р (CYRILLIC SMALL LETTER ER). Did you mean p (LATIN SMALL LETTER P)?

(RUF001)


528-528: String contains ambiguous З (CYRILLIC CAPITAL LETTER ZE). Did you mean 3 (DIGIT THREE)?

(RUF001)


528-528: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?

(RUF001)


528-528: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?

(RUF001)


528-528: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?

(RUF001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.12)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.2.0, 3.10)
  • GitHub Check: test (==3.1.0, 3.11)
🔇 Additional comments (1)
tests/test_code_generator.py (1)

452-535: Comprehensive UTF-8 delegation test coverage.

This test thoroughly validates the delegation mechanism across all container types (vector, set, map, unordered variants) with UTF-8 strings. The test data includes multiple language scripts (Russian, Chinese, Arabic, Portuguese), providing strong validation of the encoding/decoding paths.

Note: The Ruff warnings about ambiguous Cyrillic characters (lines 517, 528) are false positives—these are intentional UTF-8 test strings that should use Cyrillic characters, not Latin look-alikes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants