Skip to content

Conversation

@sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Aug 9, 2025

Description

Note

This PR depends on #186.

This PR

  • Adds custom primitive Spider Python types.
  • Add TDL types and their string representations.
  • Adds converting Python types to TDL types.
  • Adds tests for type convertion.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Unit tests for type conversion pass.
  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features

    • Package root exports numeric wrappers (Int8/16/32/64, Float, Double) for simpler imports.
    • New TDL type system and converter producing textual TDL strings for primitives, classes, lists and maps.
    • Utility to derive fully qualified class names for TDL conversion.
  • Bug Fixes / Validation

    • Runtime bounds checking for bounded integers and clearer errors for unsupported map keys/types.
  • Tests

    • Added comprehensive tests for conversions, nesting, formatting and error cases.
  • Chores

    • Updated linting configuration for docstring and typing rules.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments. Still need to read sth about Python's type system to review the rest.

Comment on lines 15 to 16
msg = f"Bounded integer value ({value}) must be between {min_val} and {max_val}"
raise ValueError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need msg variable, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff complains about directly using f-string or even a string literal inside an exception constructor.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't finished my review yet. A few comments before I go off for today.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
python/spider-py/src/spider_py/type/type.py (2)

5-5: Prefer typing.Self for constructors; remove cast noise

Using Self makes the return types precise and eliminates the need for cast(...). This also addresses the prior ask to annotate “as the class itself”.

Proposed changes:

-from typing import cast
+from typing import Self
-    def __new__(cls, value: int, bits: int = 32) -> BoundedInt:
+    def __new__(cls, value: int, bits: int = 32) -> Self:
 class Int8(BoundedInt):
@@
-    def __new__(cls, value: int) -> Int8:
+    def __new__(cls, value: int) -> Self:
         """Creates an int8 integer."""
-        return cast("Int8", super().__new__(cls, value, bits=8))
+        return super().__new__(cls, value, bits=8)
 class Int16(BoundedInt):
@@
-    def __new__(cls, value: int) -> Int16:
+    def __new__(cls, value: int) -> Self:
         """Creates an int16 integer."""
-        return cast("Int16", super().__new__(cls, value, bits=16))
+        return super().__new__(cls, value, bits=16)
 class Int32(BoundedInt):
@@
-    def __new__(cls, value: int) -> Int32:
+    def __new__(cls, value: int) -> Self:
         """Creates an int32 integer."""
-        return cast("Int32", super().__new__(cls, value, bits=32))
+        return super().__new__(cls, value, bits=32)
 class Int64(BoundedInt):
@@
-    def __new__(cls, value: int) -> Int64:
+    def __new__(cls, value: int) -> Self:
         """Creates an int64 integer."""
-        return cast("Int64", super().__new__(cls, value, bits=64))
+        return super().__new__(cls, value, bits=64)
 class Float(float):
@@
-    def __new__(cls, value: float) -> Float:
+    def __new__(cls, value: float) -> Self:
         """Creates a float number."""
         return super().__new__(cls, value)
 class Double(float):
@@
-    def __new__(cls, value: float) -> Double:
+    def __new__(cls, value: float) -> Self:
         """Creates a double number."""
         return super().__new__(cls, value)

Before adopting Self, please confirm the supported Python versions. If Python < 3.11 is still supported, use typing_extensions.Self instead.

Run this to verify declared support in the repo:

#!/bin/bash
# Inspect python version declarations
set -euo pipefail
fd -a -H 'pyproject.toml' -E '.venv' -E 'node_modules' | xargs -r sed -n '1,200p' | rg -n 'requires-python|python_requires|Programming Language :: Python'
fd -a -H 'setup.cfg' 'setup.py' -E '.venv' -E 'node_modules' | xargs -r sed -n '1,200p' | rg -n 'python_requires|Programming Language :: Python'

Also applies to: 11-11, 30-33, 38-41, 46-49, 54-57, 62-65, 70-73


16-23: Acknowledgement: using a local msg variable is fine here

Given the linter rule against passing string literals directly to exception constructors (e.g., flake8-errmsg), the explicit msg variable is appropriate. No change needed.

🧹 Nitpick comments (3)
python/spider-py/src/spider_py/type/type.py (3)

20-22: Use comparison chaining; aligns with Ruff PLR1716 and reads cleaner

Chained boolean comparison can be simplified without changing semantics.

-        if not (min_val <= value and value <= max_val):
+        if not (min_val <= value <= max_val):
             msg = f"Bounded integer value ({value}) must be between {min_val} and {max_val}"
             raise ValueError(msg)

28-29: Docstring wording: use “8-bit/16-bit/32-bit/64-bit”

Minor wording polish for consistency with common usage.

-    """8 bits integer type."""
+    """8-bit integer type."""
-    """16 bits integer type."""
+    """16-bit integer type."""
-    """32 bits integer type."""
+    """32-bit integer type."""
-    """64 bits integer type."""
+    """64-bit integer type."""

Also applies to: 36-37, 44-45, 52-53


8-24: Optional: explicitly reject bool to avoid surprising construction

Because bool is a subclass of int, Int8(True) would silently yield 1. If you want stricter semantics, explicitly disallow bool (or coerce explicitly).

Example change (if stricter behaviour desired):

 def __new__(cls, value: int, bits: int = 32) -> Self:
     """Creates a bounded integer."""
+        if isinstance(value, bool):
+            msg = "Boolean values are not accepted; use an explicit 0 or 1."
+            raise TypeError(msg)

If the current implicit coercion is intentional to ease user ergonomics, feel free to skip this.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f3e41f9 and c9cb4b6.

📒 Files selected for processing (2)
  • python/spider-py/src/spider_py/type/tdl_convert.py (1 hunks)
  • python/spider-py/src/spider_py/type/type.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/spider-py/src/spider_py/type/tdl_convert.py
🧰 Additional context used
🪛 Ruff (0.12.2)
python/spider-py/src/spider_py/type/type.py

20-20: Contains chained boolean comparison that can be simplified

Use a single compare expression

(PLR1716)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally lgtm. Most of the comments are just about the style. The comments I put may also be applicable to other PRs, please check. Thx!

Comment on lines 59 to 61
if native_type in (int, float, str, complex, bytes):
msg = f"{native_type} is not a valid TDL type."
raise TypeError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for explicitly listing all these types? Seems like we're also missing some primitive types, for example, bool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is the only native Python primitive type that we support. For int and float, we force user to use our own sized type. For str, we use List<Int8>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's not too obvious. I'd say this check isn't necessary, considering we already check all the types we support.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is necessary. If we don't check for them, the code will reach line 91 and returns a primitive type, against the requirement to throw a TypeError for unsupported types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then I think this part of code should be refactored a little bit otherwise it's confusing. Can we move this check and throw into _to_primitive_tdl_type and make the error message more descriptive, sth like {native_type} is not a TDL type. Please use the corresponding TDL primitive type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and pushed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
python/spider-py/src/spider_py/type/type.py (1)

11-18: Validate bits before computing bounds to prevent negative shift (previously flagged).

min_val/max_val are computed before validating bits. Passing bits=0 (or any unsupported value) will raise a “negative shift count” before your intended ValueError. Move the validation first.

Apply this diff:

     def __new__(cls, value: int, bits: int = 32) -> BoundedInt:
         """Creates a bounded integer."""
-        min_val = -(1 << (bits - 1))
-        max_val = (1 << (bits - 1)) - 1
-
-        if bits not in (8, 16, 32, 64):
+        if bits not in (8, 16, 32, 64):
             msg = f"Unsupported bits size: {bits}. Supported sizes are 8, 16, 32, or 64."
             raise ValueError(msg)
+        min_val = -(1 << (bits - 1))
+        max_val = (1 << (bits - 1)) - 1
🧹 Nitpick comments (5)
python/spider-py/src/spider_py/type/type.py (5)

11-24: Optionally enforce integer-only inputs (reject float truncation and bools).

int.__new__ will silently truncate floats (e.g., 3.7 → 3) and accepts bool since it’s a subclass of int. If you want strictness, validate the type before range checks.

Example patch:

     def __new__(cls, value: int, bits: int = 32) -> BoundedInt:
         """Creates a bounded integer."""
+        # Enforce integer-only values (exclude bool) to avoid silent truncation from float.
+        if not isinstance(value, int) or isinstance(value, bool):
+            msg = f"BoundedInt requires an int (excluding bool); got {type(value).__name__}."
+            raise TypeError(msg)

16-18: Minor: extract supported bit sizes into a constant for reuse/readability.

Keeps the sentinel in one place and self-documents intent.

-from typing import cast
+from typing import cast
+
+SUPPORTED_BIT_SIZES: tuple[int, ...] = (8, 16, 32, 64)
@@
-        if bits not in (8, 16, 32, 64):
+        if bits not in SUPPORTED_BIT_SIZES:
             msg = f"Unsupported bits size: {bits}. Supported sizes are 8, 16, 32, or 64."
             raise ValueError(msg)

27-33: Docstring grammar: use “8-bit/16-bit/32-bit/64-bit” phrasing.

Purely textual polish; improves clarity.

 class Int8(BoundedInt):
-    """8 bits integer type."""
+    """8-bit integer type."""
@@
-        """Creates an int8 integer."""
+        """Creates an 8-bit integer."""
@@
 class Int16(BoundedInt):
-    """16 bits integer type."""
+    """16-bit integer type."""
@@
-        """Creates an int16 integer."""
+        """Creates a 16-bit integer."""
@@
 class Int32(BoundedInt):
-    """32 bits integer type."""
+    """32-bit integer type."""
@@
-        """Creates an int32 integer."""
+        """Creates a 32-bit integer."""
@@
 class Int64(BoundedInt):
-    """64 bits integer type."""
+    """64-bit integer type."""
@@
-        """Creates an int64 integer."""
+        """Creates a 64-bit integer."""

Also applies to: 35-41, 43-49, 51-57


27-57: Optional: drop casts by annotating base new with Self (if Python ≥3.11 or via typing_extensions).

Using Self in BoundedInt.__new__ lets subclass __new__ return the right type without typing.cast. Keep if your current type-checker config prefers the explicit casts.

Illustrative change:

-from typing import cast
+from typing import cast
+try:  # Python 3.11+
+    from typing import Self
+except Exception:  # pragma: no cover
+    from typing_extensions import Self  # type: ignore
@@
-    def __new__(cls, value: int, bits: int = 32) -> BoundedInt:
+    def __new__(cls, value: int, bits: int = 32) -> Self:
         """Creates a bounded integer."""
@@
-        return super().__new__(cls, value)
+        return super().__new__(cls, value)

Then each subclass can simply: return super().__new__(cls, value, bits=8) (casts no longer needed).


59-73: Clarify Float/Double intent in docstrings (optional).

Python’s float is already double-precision; Double exists to model TDL, not a different runtime width. A one-liner note helps future readers.

 class Float(float):
-    """Float type."""
+    """Float type (mapped to TDL Float)."""
@@
 class Double(float):
-    """Double type."""
+    """Double type (aliasing Python float; mapped to TDL Double)."""
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 412a2b5 and bab4806.

📒 Files selected for processing (3)
  • python/spider-py/src/spider_py/type/tdl_convert.py (1 hunks)
  • python/spider-py/src/spider_py/type/tdl_type.py (1 hunks)
  • python/spider-py/src/spider_py/type/type.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • python/spider-py/src/spider_py/type/tdl_convert.py
  • python/spider-py/src/spider_py/type/tdl_type.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (1)
python/spider-py/src/spider_py/type/type.py (1)

20-22: Bounds check is correct and inclusive.

The condition is explicit and readable; error message is consistent and punctuated.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
python/spider-py/src/spider_py/type/tdl_convert.py (1)

65-86: GenericAlias-only handling misses typing.List/typing.Dict and can raise TypeError downstream

Relying on isinstance(native_type, GenericAlias) excludes typing.List[T]/typing.Dict[K, V] (get_origin(...) works for both PEP 585 and typing.* generics). When a typing.* generic slips past this branch, issubclass(native_type, Collection) below will raise TypeError since first arg is not a class. Also, list arity check accepts extra args silently; enforce exactly one for consistency with dict.

Apply this refactor to handle both forms and harden arity/error cases:

@@
-from collections.abc import Collection
+from collections.abc import Collection
+import inspect
@@
-    if native_type in (int, float, str, complex, bytes):
+    if native_type in (int, float, str, complex, bytes):
         msg = f"{native_type} is not a valid TDL type."
         raise TypeError(msg)
 
-    if isinstance(native_type, GenericAlias):
-        origin = get_origin(native_type)
-        if origin is list:
-            args = get_args(native_type)
-            if len(args) == 0:
-                msg = "List does not have an element type."
-                raise TypeError(msg)
-            arg = args[0]
-            return ListType(to_tdl_type(arg))
-
-        if origin is dict:
-            args = get_args(native_type)
-            msg = "Dict does not have a key/value type."
-            if len(args) != 2:  # noqa: PLR2004
-                raise TypeError(msg)
-            key = args[0]
-            value = args[1]
-            return MapType(to_tdl_type(key), to_tdl_type(value))
-
-        msg = f"{native_type} is not a valid TDL type."
-        raise TypeError(msg)
+    origin = get_origin(native_type)
+    if origin is not None:
+        if origin is list:
+            args = get_args(native_type)
+            if len(args) != 1:
+                raise TypeError("List must have exactly one element type.")
+            return ListType(to_tdl_type(args[0]))
+        if origin is dict:
+            args = get_args(native_type)
+            if len(args) != 2:
+                raise TypeError("Dict must have exactly two type arguments: key and value.")
+            key, value = args
+            return MapType(to_tdl_type(key), to_tdl_type(value))
+        raise TypeError(f"{native_type} is not a valid TDL type.")
 
-    if issubclass(native_type, Collection):
-        msg = f"{native_type} is not a valid TDL type."
-        raise TypeError(msg)
+    # Reject bare container classes like list/dict/set/tuple.
+    if inspect.isclass(native_type) and issubclass(native_type, Collection):
+        raise TypeError(f"{native_type} is not a valid TDL type.")
 
-    return ClassType(get_class_name(native_type))
+    if inspect.isclass(native_type):
+        return ClassType(get_class_name(native_type))
+    raise TypeError(f"{native_type} is not a valid TDL type.")
🧹 Nitpick comments (5)
python/spider-py/src/spider_py/type/tdl_convert.py (4)

50-56: Docstring nit: use “raises” and clarify the contract

Switch to Sphinx-style “raises” and tighten wording.

@@
-    :return: The converted TDL type.
-    :raise: TypeError if `native_type` is not a valid TDL type.
+    :return: The converted TDL type.
+    :raises TypeError: If `native_type` is not a valid TDL type.

69-83: Tighten arity checks and messages for list/dict generics

Make the diagnostics precise and enforce exact arity for list as well.

-            if len(args) == 0:
-                msg = "List does not have an element type."
-                raise TypeError(msg)
+            if len(args) != 1:
+                raise TypeError("List must have exactly one element type.")
@@
-            msg = "Dict does not have a key/value type."
-            if len(args) != 2:  # noqa: PLR2004
-                raise TypeError(msg)
+            if len(args) != 2:  # noqa: PLR2004
+                raise TypeError("Dict must have exactly two type arguments: key and value.")

94-100: Docstring nit: use “raises” here too

Align with project style and earlier docstrings.

@@
-    :return: A string representation of the TDL type.
-    :raise: TypeError if `native_type` is not a valid TDL type.
+    :return: A string representation of the TDL type.
+    :raises TypeError: If `native_type` is not a valid TDL type.

50-100: Add tests for typing.List and typing.Dict to cover typing generics

We ran a search over python/spider-py/tests/type/test_to_tdl.py and found only PEP 585 syntax (list[...] / dict[...]) exercised—no occurrences of typing.List or typing.Dict. To ensure the code path for typing generics (outside the GenericAlias branch) remains correct, please add tests like the following in python/spider-py/tests/type/test_to_tdl.py:

+ from typing import List, Dict
+
+ class TestToTDL:
+     # existing tests...
+
+     def test_to_tdl_typing_list(self) -> None:
+         """Test converting typing.List generics to TDL Types."""
+         assert to_tdl_type_str(List[Int8]) == "List<int8>"
+         with pytest.raises(TypeError):
+             to_tdl_type_str(List[int])
+
+     def test_to_tdl_typing_dict(self) -> None:
+         """Test converting typing.Dict generics to TDL Types."""
+         assert to_tdl_type_str(Dict[Int8, Float]) == "Map<int8,float>"
+         with pytest.raises(TypeError):
+             to_tdl_type_str(Dict[Int8, int])

Let me know if you’d like me to draft these additions in a PR.

python/spider-py/src/spider_py/type/type.py (1)

20-24: Readable range check

Minor readability improvement using chained comparisons.

-        if not (lower_bound <= value and value <= upper_bound):
+        if not (lower_bound <= value <= upper_bound):
             msg = (
                 f"Bounded integer value ({value}) must be between {lower_bound} and {upper_bound}."
             )
             raise ValueError(msg)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bab4806 and bdf1d8f.

📒 Files selected for processing (3)
  • python/spider-py/src/spider_py/type/tdl_convert.py (1 hunks)
  • python/spider-py/src/spider_py/type/tdl_type.py (1 hunks)
  • python/spider-py/src/spider_py/type/type.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/spider-py/src/spider_py/type/tdl_type.py
🧰 Additional context used
🧬 Code graph analysis (1)
python/spider-py/src/spider_py/type/tdl_convert.py (3)
python/spider-py/src/spider_py/type/tdl_type.py (22)
  • BoolType (64-69)
  • ClassType (72-84)
  • DoubleType (16-21)
  • FloatType (24-29)
  • Int8Type (32-37)
  • Int16Type (40-45)
  • Int32Type (48-53)
  • Int64Type (56-61)
  • ListType (87-99)
  • MapType (126-144)
  • TdlType (8-13)
  • type_str (12-13)
  • type_str (20-21)
  • type_str (28-29)
  • type_str (36-37)
  • type_str (44-45)
  • type_str (52-53)
  • type_str (60-61)
  • type_str (68-69)
  • type_str (83-84)
  • type_str (98-99)
  • type_str (143-144)
python/spider-py/src/spider_py/type/type.py (6)
  • Double (69-74)
  • Float (61-66)
  • Int8 (29-34)
  • Int16 (37-42)
  • Int32 (45-50)
  • Int64 (53-58)
python/spider-py/src/spider_py/type/utils.py (1)
  • get_class_name (4-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (3)
python/spider-py/src/spider_py/type/tdl_convert.py (1)

33-47: Primitive mappings look correct and intentional

Covers Bool and the custom-sized Int/Float wrappers cleanly; returns None for unsupported built-ins so the caller can decide. No issues here.

python/spider-py/src/spider_py/type/type.py (2)

11-16: Good fix: validate bits before computing bounds

This prevents negative shift issues and matches the prior feedback. Looks solid.


61-75: Float/Double wrappers are fine

Using distinct nominal types over float is acceptable for downstream mapping; no behavioural differences are implied, which aligns with usage in tdl_convert.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/spider-py/src/spider_py/type/tdl_convert.py (1)

50-101: Add missing tests for typing.List/typing.Dict generics and arity errors

The current suite verifies only PEP 585 generics (list[T]/dict[K, V]) but doesn’t cover the typing.List[T] and typing.Dict[K, V] aliases nor arity‐error scenarios. Please add the following tests to prevent regressions:

• Correct‐conversion tests for typing.List and typing.Dict
– typing.List[Int8] → "List<int8>"
– typing.List[typing.List[Int8]] → "List<List<int8>>"
– typing.Dict[Int8, Float] → "Map<int8,float>"
– typing.Dict[typing.List[Int8], Double] → "Map<List<int8>,double>"

• Negative tests for arity mismatches on both PEP 585 and typing aliases:
– No type arguments (e.g. typing.List or list without [T]) → raises TypeError("List must have exactly one element type.")
– Too many/few arguments (e.g. typing.List[Int8, Float], typing.Dict[Int8], dict[K], dict[K, V, X]) → raises TypeError("Dict must have exactly two type arguments.")"

♻️ Duplicate comments (2)
python/spider-py/src/spider_py/type/tdl_convert.py (2)

65-86: Handle typing generics with get_origin for both PEP 585 and typing. (prevents TypeError on typing.List/Dict)*

Current logic only triggers for types.GenericAlias, missing typing.List[...] / typing.Dict[...]. It also won’t validate list arity beyond “empty”. Use get_origin(...) unconditionally and validate arg counts precisely.

Apply this diff:

-    if isinstance(native_type, GenericAlias):
-        origin = get_origin(native_type)
-        if origin is list:
-            args = get_args(native_type)
-            if len(args) == 0:
-                msg = "List does not have an element type."
-                raise TypeError(msg)
-            arg = args[0]
-            return ListType(to_tdl_type(arg))
-
-        if origin is dict:
-            args = get_args(native_type)
-            msg = "Dict does not have a key/value type."
-            if len(args) != 2:  # noqa: PLR2004
-                raise TypeError(msg)
-            key = args[0]
-            value = args[1]
-            return MapType(to_tdl_type(key), to_tdl_type(value))
-
-        msg = f"{native_type} is not a valid TDL type."
-        raise TypeError(msg)
+    origin = get_origin(native_type)
+    if origin is not None:
+        if origin is list:
+            args = get_args(native_type)
+            if len(args) != 1:
+                raise TypeError("List must have exactly one element type.")
+            return ListType(to_tdl_type(args[0]))
+
+        if origin is dict:
+            args = get_args(native_type)
+            if len(args) != 2:  # noqa: PLR2004
+                raise TypeError("Dict must have exactly two type arguments: key and value.")
+            key, value = args
+            return MapType(to_tdl_type(key), to_tdl_type(value))
+
+        raise TypeError(f"{native_type} is not a valid TDL type.")

3-6: Guard issubclass(...) with inspect.isclass(...) to avoid runtime TypeError on typing constructs

issubclass(native_type, Collection) raises when native_type is a typing construct. Guard it and import inspect.

Apply this diff:

-from collections.abc import Collection
+import inspect
+from collections.abc import Collection
@@
-    if issubclass(native_type, Collection):
+    if inspect.isclass(native_type) and issubclass(native_type, Collection):
         msg = f"{native_type} is not a valid TDL type."
         raise TypeError(msg)

Also applies to: 87-89

🧹 Nitpick comments (2)
python/spider-py/src/spider_py/type/tdl_convert.py (2)

50-56: Optional: broaden the type annotation to reflect accepted inputs

to_tdl_type accepts typing constructs that aren’t instances of types.GenericAlias (e.g., typing.List[int]). Consider widening the annotation for readability (runtime isn’t impacted).

Apply this minimal change:

-from typing import get_args, get_origin
+from typing import Any, get_args, get_origin
@@
-def to_tdl_type(native_type: type | GenericAlias) -> TdlType:
+def to_tdl_type(native_type: type | GenericAlias | Any) -> TdlType:
@@
-def to_tdl_type_str(native_type: type | GenericAlias) -> str:
+def to_tdl_type_str(native_type: type | GenericAlias | Any) -> str:

If you prefer not to import Any, use type | object instead.


65-86: Error strings: make arity expectations explicit

While updating the generic handling, prefer messages that say “exactly one” / “exactly two” to help users fix inputs faster. The diff in my main comment applies this wording.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5602b11 and 0696cac.

📒 Files selected for processing (1)
  • python/spider-py/src/spider_py/type/tdl_convert.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
python/spider-py/src/spider_py/type/tdl_convert.py (3)
python/spider-py/src/spider_py/type/tdl_type.py (22)
  • BoolType (64-69)
  • ClassType (72-84)
  • DoubleType (16-21)
  • FloatType (24-29)
  • Int8Type (32-37)
  • Int16Type (40-45)
  • Int32Type (48-53)
  • Int64Type (56-61)
  • ListType (87-99)
  • MapType (126-144)
  • TdlType (8-13)
  • type_str (12-13)
  • type_str (20-21)
  • type_str (28-29)
  • type_str (36-37)
  • type_str (44-45)
  • type_str (52-53)
  • type_str (60-61)
  • type_str (68-69)
  • type_str (83-84)
  • type_str (98-99)
  • type_str (143-144)
python/spider-py/src/spider_py/type/type.py (6)
  • Double (69-74)
  • Float (61-66)
  • Int8 (29-34)
  • Int16 (37-42)
  • Int32 (45-50)
  • Int64 (53-58)
python/spider-py/src/spider_py/type/utils.py (1)
  • get_class_name (4-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (3)
python/spider-py/src/spider_py/type/tdl_convert.py (3)

61-64: Disallowing direct built-ins is consistent with the design

Explicitly rejecting int/float/str/complex/bytes matches the “use custom sized/int/float and List for text” policy noted in prior discussion. No change requested.


32-47: Primitive mappings look correct

Correctly maps Bool and the sized numeric wrappers to their TDL counterparts. Order avoids bool-as-int pitfalls by handling primitives before the built-in rejection.


94-101: API wrapper is clear and minimal

to_tdl_type_str delegates as expected; docstring is concise and aligned with tdl_type.TdlType.type_str().

LinZhihao-723
LinZhihao-723 previously approved these changes Aug 23, 2025
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(spider-py): Define TDL types; Add support for converting Python types into TDL types.

@sitaowang1998 sitaowang1998 changed the title feat: Add custom Spider types and converting Python types to TDL types. feat(spider-py): Define TDL types; Add support for converting Python types into TDL types. Aug 23, 2025
@sitaowang1998 sitaowang1998 merged commit 68805a6 into y-scope:main Aug 23, 2025
7 checks passed
@sitaowang1998 sitaowang1998 deleted the python_type branch August 23, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants