-
Notifications
You must be signed in to change notification settings - Fork 23
Design conversion provider delegation system #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Design conversion provider delegation system #243
Conversation
This change allows conversion providers (like StdStringUnicodeConverter) to work inside containers (like std::vector) by adding a delegation mechanism. Changes: - Add supports_delegation() method to TypeConverterBase (default: False) - Enable delegation for StdStringUnicodeConverter and StdStringUnicodeOutputConverter - Modify StdVectorConverter to delegate to element converters when supports_delegation() returns True - Add test for UTF-8 string vectors This fixes the limitation where UTF-8 string converters would not be picked up when used inside containers. Now, vectors of libcpp_utf8_string or libcpp_utf8_output_string properly encode/decode UTF-8 strings.
📝 WalkthroughWalkthroughAdds per-element delegation support for container conversions by introducing Changes
Sequence Diagram(s)sequenceDiagram
actor Python
participant ConversionProvider
participant ConverterRegistry
participant ElementConverter
participant CppAPI
Python->>ConversionProvider: call function with Python container
ConversionProvider->>ConverterRegistry: lookup element converter
ConverterRegistry-->>ConversionProvider: ElementConverter
ConversionProvider->>ElementConverter: supports_delegation()?
alt delegating
ConversionProvider->>ElementConverter: convert element (per-element)
ElementConverter-->>ConversionProvider: converted element
ConversionProvider->>CppAPI: assemble container & call C++ API
CppAPI-->>ConversionProvider: returns C++ container
ConversionProvider->>ElementConverter: convert output element back to Python
ElementConverter-->>ConversionProvider: Python element
ConversionProvider-->>Python: return Python container
else non-delegating
ConversionProvider->>CppAPI: use standard container conversion path
CppAPI-->>ConversionProvider: returns C++ container
ConversionProvider-->>Python: return Python container
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @timosachsenberg. * #243 (comment) The following files were modified: * `autowrap/ConversionProvider.py`
|
Thanks a lot Timo!! autowrap/autowrap/CodeGenerator.py Line 2211 in 6236c3d
One could probably try to change that but I think it is a global change so you wouldn't be able to change it for some functions but not others. Probably not so important. This PR here looks ok I think but I wonder if it works for further nesting and if not if it's worth to support further nesting. Deep nesting in autowrap is in general very unstable and untested. Especially when mixing containers 😅 |
|
I would not support further nesting. |
Delegation only works for single-level containers (e.g., vector<T>). Nested containers like vector<vector<T>> fall through to Cython's automatic conversion, bypassing the delegating converter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @autowrap/ConversionProvider.py:
- Around line 1915-1986: The delegation input-conversion branch inconsistently
pushes items into $temp_var: when elem_code has content it uses a direct cast
"<$inner>$item" but when elem_code is empty it uses $elem_call_as; change the
first branch to push $elem_call_as instead so both branches use the
element_converter.input_conversion result consistently (inspect variables
element_converter, elem_code, elem_call_as, $temp_var, $inner and $argument_var
and replace the push_back call in the elem_code.present branch to use
$elem_call_as).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
autowrap/ConversionProvider.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test (==3.2.0, 3.12)
- GitHub Check: test (==3.1.0, 3.13)
- GitHub Check: test (==3.2.0, 3.13)
- GitHub Check: test (==3.2.0, 3.10)
- GitHub Check: test (==3.1.0, 3.11)
- GitHub Check: test (==3.1.0, 3.10)
- GitHub Check: test (==3.2.0, 3.11)
- GitHub Check: test (==3.1.0, 3.12)
🔇 Additional comments (4)
autowrap/ConversionProvider.py (4)
137-151: LGTM! Clear API design with well-documented limitations.The
supports_delegation()method provides a clean opt-in mechanism for converters to enable element-wise delegation inside containers. The docstring clearly documents the single-level nesting limitation, which aligns with the PR objectives.
2549-2550: LGTM! Delegation enabled for UTF-8 string converters.Both input and output UTF-8 string converters now correctly opt into delegation, enabling proper UTF-8 encoding/decoding for strings inside containers.
Also applies to: 2595-2596
2084-2111: LGTM! Output delegation logic is well-implemented.The output conversion correctly:
- Detects delegating converters using
_has_delegating_converter()- Retrieves element output conversion via
element_converter.output_conversion()- Handles different return types (None, Code object, or string)
- Properly iterates through the vector and applies per-element conversion
The implementation is consistent with the pattern used for other wrapped types in the same method.
1687-1695: Good defensive implementation of delegation detection.The
_has_delegating_converter()helper properly:
- Checks for converter registry availability via
hasattr(self, "cr")- Handles exceptions gracefully with try/except
- Returns False as safe default when delegation is not supported
This defensive approach prevents errors when the method is called in contexts where the converter registry might not be available.
Extended delegation support to: - StdSetConverter (ordered set) - StdMapConverter (ordered map) - StdUnorderedSetConverter (hash set) - StdUnorderedMapConverter (hash map) For maps, both keys and values can independently use delegating converters (e.g., libcpp_utf8_output_string for UTF-8 decoding). Note: Same single-level nesting limitation applies - nested containers like map<string, vector<libcpp_utf8_string>> won't delegate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @autowrap/ConversionProvider.py:
- Around line 1286-1294: The delegation check helpers (e.g.,
_has_delegating_converter) are checking self.cr but _set_converter_registry
currently assigns self.converters, causing delegation to silently fail; fix by
making _set_converter_registry assign self.cr = r (in addition to or instead of
self.converters) so methods like _has_delegating_converter, the other delegation
helpers referenced in the comment, and any calls to self.cr.get(...) find the
registry; ensure you update _set_converter_registry (the method that sets the
registry) and verify no other places rely exclusively on self.converters without
having self.cr set.
🧹 Nitpick comments (1)
autowrap/ConversionProvider.py (1)
1417-1484: LGTM! Input delegation logic correctly handles element conversion and cleanup.The input conversion delegation for sets and vectors properly:
- Invokes element converters' input_conversion
- Handles both Code objects and simple conversions
- Implements cleanup for reference parameters to reflect changes back to Python
The logic for handling
elem_code.content(line 2109) correctly distinguishes between converters that need additional variable declarations versus those that don't.Optional: Consider adding delegation support to StdDequeConverter and StdListConverter.
Currently, only StdSetConverter, StdVectorConverter, StdUnorderedSetConverter, StdMapConverter, and StdUnorderedMapConverter support delegation. For consistency and completeness, you might want to add delegation support to:
StdDequeConverter(lines 3548-3709)StdListConverter(lines 3712-3889)This would allow UTF-8 string vectors to work with all sequence container types, not just vectors and sets.
Also applies to: 2089-2160, 3369-3436
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
autowrap/ConversionProvider.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test (==3.1.0, 3.13)
- GitHub Check: test (==3.2.0, 3.10)
- GitHub Check: test (==3.1.0, 3.11)
- GitHub Check: test (==3.2.0, 3.12)
- GitHub Check: test (==3.2.0, 3.11)
- GitHub Check: test (==3.2.0, 3.13)
- GitHub Check: test (==3.1.0, 3.10)
- GitHub Check: test (==3.1.0, 3.12)
🔇 Additional comments (3)
autowrap/ConversionProvider.py (3)
137-151: LGTM! Well-documented delegation interface.The
supports_delegation()method provides a clean extension point for converters. The documentation clearly explains the single-level nesting limitation, which is important for users to understand.
2723-2724: LGTM! String converters correctly enable delegation.The overrides in both
StdStringUnicodeConverterandStdStringUnicodeOutputConverterproperly enable delegation support, allowing UTF-8 string handling inside containers.Also applies to: 2769-2770
1217-1271: LGTM! Consistent delegation implementation across containers.The output conversion delegation logic is consistently implemented across all container converters:
- Checks for delegating converters on element types
- Properly handles None, Code objects, and string returns from element converters
- Builds appropriate conversion code for each container type
The implementation correctly supports delegation for both keys and values in maps, which is well thought out.
Also applies to: 1551-1576, 2258-2285, 3163-3218, 3507-3531
Tests cover all container types with UTF-8 string delegation: - std::vector (input and output) - std::set (input and output) - std::map (UTF-8 values and UTF-8 keys) - std::unordered_set (input and output) - std::unordered_map (input and output) Also fixes StdUnorderedMapConverter input_conversion to properly delegate to element converters for keys and values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/test_code_generator.py (2)
401-450: Solid test coverage for UTF-8 vector delegation.The test comprehensively covers output conversion (C++ → Python), input conversion (Python → C++), and multiple input types (str/bytes). However, consider two minor improvements:
Python version consistency: Line 426 uses an f-string (Python 3.6+), while the file includes
__future__imports suggesting potential Python 2 support. The existingtest_automatic_output_string_conversion(lines 388-390) explicitly handlesstrvsunicodebased on Python version. If Python 2 support is no longer needed, the__future__imports could be removed for clarity.Edge cases: Consider testing empty containers and potentially malformed UTF-8 sequences to ensure delegation handles these gracefully.
Optional: Add Python version handling for consistency
If Python 2 support is needed:
+ expected_type = str + if sys.version_info[0] < 3: + expected_type = unicode + # All elements should be unicode strings (str in Python 3) for s in greetings: - assert isinstance(s, str), f"Expected str, got {type(s)}" + assert isinstance(s, expected_type), "Expected %s, got %s" % (expected_type.__name__, type(s).__name__)Apply similar pattern to line 438.
452-534: Excellent comprehensive coverage of container delegation.This test thoroughly validates UTF-8 delegation across all container types (vector, set, map, unordered variants) with diverse UTF-8 content (Russian, Chinese, Arabic, Portuguese). The test structure clearly demonstrates the PR's core functionality: UTF-8 string converters now properly operate when used as element converters inside containers.
Note: The static analysis RUF001 warnings (lines 516, 527) about "ambiguous" Cyrillic characters are false positives—these are intentional UTF-8 test fixtures and should be ignored.
Optional enhancement: Consider adding tests for empty containers to ensure delegation handles edge cases:
# Empty container tests empty_vec = h.echo_vector([]) assert empty_vec == [] empty_set = h.echo_set(set()) assert empty_set == set() empty_map = h.echo_map({}) assert empty_map == {}
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/test_code_generator.pytests/test_files/libcpp_utf8_containers_test.pxd
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_files/libcpp_utf8_containers_test.pxd
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_code_generator.py (2)
tests/test_files/libcpp_utf8_string_vector_test.hpp (1)
Utf8VectorTest(6-6)tests/test_files/libcpp_utf8_containers_test.hpp (1)
Utf8ContainersTest(10-10)
🪛 Ruff (0.14.10)
tests/test_code_generator.py
516-516: String contains ambiguous Н (CYRILLIC CAPITAL LETTER EN). Did you mean H (LATIN CAPITAL LETTER H)?
(RUF001)
516-516: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
516-516: String contains ambiguous б (CYRILLIC SMALL LETTER BE). Did you mean 6 (DIGIT SIX)?
(RUF001)
516-516: String contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?
(RUF001)
516-516: String contains ambiguous р (CYRILLIC SMALL LETTER ER). Did you mean p (LATIN SMALL LETTER P)?
(RUF001)
527-527: String contains ambiguous З (CYRILLIC CAPITAL LETTER ZE). Did you mean 3 (DIGIT THREE)?
(RUF001)
527-527: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
527-527: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?
(RUF001)
527-527: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?
(RUF001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test (==3.1.0, 3.13)
- GitHub Check: test (==3.1.0, 3.12)
- GitHub Check: test (==3.2.0, 3.13)
- GitHub Check: test (==3.1.0, 3.10)
- GitHub Check: test (==3.2.0, 3.12)
- GitHub Check: test (==3.2.0, 3.10)
- GitHub Check: test (==3.2.0, 3.11)
- GitHub Check: test (==3.1.0, 3.11)
Change libcpp_map declarations to use libcpp_utf8_output_string for keys, enabling direct string access without bytes prefix, matching unordered_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_code_generator.py (1)
401-450: Consider Python 2/3 compatibility consistency.The new test uses f-strings (lines 426, 438) and directly checks for
strtype, which differs from the existing pattern in this file. The existingtest_automatic_output_string_conversion()(lines 389-390) explicitly handles Python 2/3 differences:expected_type = str if sys.version_info[0] < 3: expected_type = unicodeIf Python 2 support has been dropped project-wide, this is fine. However, the file still includes Python 2/3 compatibility imports (
from __future__ import print_function), creating inconsistency. If Python 2 is no longer supported, consider removing these imports file-wide for clarity.♻️ Optional: Align with existing pattern if Python 2 still supported
# All elements should be unicode strings (str in Python 3) + expected_type = str + if sys.version_info[0] < 3: + expected_type = unicode for s in greetings: - assert isinstance(s, str), f"Expected str, got {type(s)}" + assert isinstance(s, expected_type), "Expected %s, got %s" % (expected_type, type(s)) assert "Hello" in greetingsApply similar changes to line 438.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/test_code_generator.pytests/test_files/libcpp_utf8_containers_test.pxd
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_files/libcpp_utf8_containers_test.pxd
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_code_generator.py (2)
tests/test_files/libcpp_utf8_string_vector_test.hpp (1)
Utf8VectorTest(6-6)tests/test_files/libcpp_utf8_containers_test.hpp (1)
Utf8ContainersTest(10-10)
🪛 Ruff (0.14.10)
tests/test_code_generator.py
517-517: String contains ambiguous Н (CYRILLIC CAPITAL LETTER EN). Did you mean H (LATIN CAPITAL LETTER H)?
(RUF001)
517-517: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
517-517: String contains ambiguous б (CYRILLIC SMALL LETTER BE). Did you mean 6 (DIGIT SIX)?
(RUF001)
517-517: String contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?
(RUF001)
517-517: String contains ambiguous р (CYRILLIC SMALL LETTER ER). Did you mean p (LATIN SMALL LETTER P)?
(RUF001)
528-528: String contains ambiguous З (CYRILLIC CAPITAL LETTER ZE). Did you mean 3 (DIGIT THREE)?
(RUF001)
528-528: String contains ambiguous а (CYRILLIC SMALL LETTER A). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
528-528: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?
(RUF001)
528-528: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?
(RUF001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test (==3.1.0, 3.10)
- GitHub Check: test (==3.2.0, 3.13)
- GitHub Check: test (==3.1.0, 3.12)
- GitHub Check: test (==3.2.0, 3.12)
- GitHub Check: test (==3.1.0, 3.13)
- GitHub Check: test (==3.2.0, 3.11)
- GitHub Check: test (==3.2.0, 3.10)
- GitHub Check: test (==3.1.0, 3.11)
🔇 Additional comments (1)
tests/test_code_generator.py (1)
452-535: Comprehensive UTF-8 delegation test coverage.This test thoroughly validates the delegation mechanism across all container types (vector, set, map, unordered variants) with UTF-8 strings. The test data includes multiple language scripts (Russian, Chinese, Arabic, Portuguese), providing strong validation of the encoding/decoding paths.
Note: The Ruff warnings about ambiguous Cyrillic characters (lines 517, 528) are false positives—these are intentional UTF-8 test strings that should use Cyrillic characters, not Latin look-alikes.
This change allows conversion providers (like StdStringUnicodeConverter) to work inside containers (like std::vector) by adding a delegation mechanism.
Changes:
This fixes the limitation where UTF-8 string converters would not be picked up when used inside containers. Now, vectors of libcpp_utf8_string or libcpp_utf8_output_string properly encode/decode UTF-8 strings.
Summary by CodeRabbit
New Features
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.