-
Notifications
You must be signed in to change notification settings - Fork 399
Add Python serialization examples #1205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Sakeeb91, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Python SDK by introducing a robust, yet lightweight, serialization framework. The primary goal is to provide concrete, runnable examples within the documentation, improving clarity and usability for developers. This involves not only the creation of the serialization mechanism itself but also its integration into existing core components and the addition of comprehensive examples and tests to validate its functionality. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable serialization capability to the Python SDK, complete with a lightweight implementation, documentation, and examples. The overall approach is solid, particularly the use of a Serializable mixin and lazy registration to avoid circular dependencies. My review highlights a few areas for improvement to enhance robustness and clarity. I've pointed out a significant limitation in the current serializer regarding the handling of object identity, which could cause issues with complex data structures. I've also noted an inconsistency in the Serializable helper class and a couple of confusing aspects in the new examples. Addressing these points will strengthen the implementation and improve the developer experience for users of this new feature.
| def _encode(cls, value: Any) -> Any: | ||
| if value is None or isinstance(value, (bool, int, float, str)): | ||
| return value | ||
|
|
||
| if isinstance(value, (list, tuple, set)): | ||
| return [cls._encode(item) for item in value] | ||
|
|
||
| if isinstance(value, dict): | ||
| return {str(key): cls._encode(item) for key, item in value.items()} | ||
|
|
||
| factory, name = cls._find_factory_for_instance(value) | ||
| plain = factory.to_plain(value) | ||
| encoded_plain = cls._encode(plain) | ||
| return { | ||
| "__serializer": True, | ||
| "__class": name, | ||
| "__value": encoded_plain, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This serialization implementation doesn't handle object identity. If the same object instance appears multiple times in the object graph, it will be serialized each time. Upon deserialization, this will result in multiple distinct object copies instead of a single shared instance. This can lead to subtle bugs and larger payloads. Consider implementing reference tracking (e.g., using object IDs and __ref fields, similar to the TypeScript implementation) to correctly handle graphs with shared objects or cycles.
docs/modules/serialization.mdx
Outdated
| from beeai_framework.backend import UserMessage | ||
| from beeai_framework.memory import UnconstrainedMemory | ||
|
|
||
| SERIALIZED_MEMORY = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid comment? @Sakeeb91
| instance = cls() # type: ignore[call-arg] | ||
| instance.load_snapshot(snapshot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of cls() with a type: ignore[call-arg] suggests a fragile assumption that all Serializable subclasses have a no-argument __init__. This is also inconsistent with the internal deserialization path which uses cls.__new__(cls) to create an instance without calling __init__. To make this method more robust and consistent, consider using __new__ here as well.
| instance = cls() # type: ignore[call-arg] | |
| instance.load_snapshot(snapshot) | |
| instance = cls.__new__(cls) | |
| instance.load_snapshot(snapshot) |
|
|
||
|
|
||
| def main() -> None: | ||
| memory = UnconstrainedMemory.from_serialized(SERIALIZED_MEMORY, extra_classes=[UserMessage]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SERIALIZED_MEMORY string contains both UserMessage and AssistantMessage instances, but extra_classes only lists UserMessage. While this example currently works because importing UserMessage from beeai_framework.backend triggers the registration of all message types in that module, it's misleading for a user trying to understand extra_classes. It incorrectly implies that only UserMessage needs to be specified. For clarity, consider either including AssistantMessage in extra_classes (and importing it) or simplifying SERIALIZED_MEMORY to only contain UserMessage instances.
Signed-off-by: Sakeeb91 <[email protected]>
|
Thanks for the thorough review! I pushed a follow-up that:
Let me know if you spot anything else! |
docs/modules/serialization.mdx
Outdated
| from beeai_framework.backend import UserMessage | ||
| from beeai_framework.memory import UnconstrainedMemory | ||
|
|
||
| SERIALIZED_MEMORY = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid comment? @Sakeeb91
| message_cls: type[Message[Any]], | ||
| allowed_content: Mapping[str, type[BaseModel]], | ||
| ) -> None: | ||
| def _registrar() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo I guess
|
Thank you for your hard work! I am thinking if there is a more Pythonic way of doing that (pickling). What do you think? |
|
@Tomas2D thanks for the thoughtful suggestion! I explored pickle early on, but stuck with this JSON serializer for a couple reasons: it keeps deserialization side-effect free (pickle executes arbitrary code), the snapshots need to round-trip through our JavaScript SDK, and JSON is a lot easier to diff/debug in docs and tooling. With the reference tracking and tighter registry in this patch we close the identity gaps the review pointed out while staying interoperable. |
Signed-off-by: Sakeeb91 <[email protected]>
|
I see your point. Thank you for the research. Before we merge this, can you verify that some basic examples work in both TS and Python so that one can use dump from TS and use it in Python, and vice versa? |
|
Verified the TS ↔︎ Python round-trip with runnable snippets:
Commands + outputs are in |
| def create_snapshot(self) -> T: | ||
| raise NotImplementedError | ||
|
|
||
| def load_snapshot(self, snapshot: T) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should be:
- abstract method
- Awaitable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated both create_snapshot and load_snapshot to be abstract methods with Awaitable return types to match the TypeScript implementation:
@AbstractMethod
def create_snapshot(self) -> T | Awaitable[T]:
raise NotImplementedError
@AbstractMethod
def load_snapshot(self, snapshot: T) -> None | Awaitable[None]:
raise NotImplementedError
The Serializable class now inherits from ABC, and both methods use the @AbstractMethod decorator. This ensures type safety and aligns with the TypeScript interface at serializable.ts:25-26 where both methods can return Promise types.
…able Address PR review feedback: - Make create_snapshot and load_snapshot abstract methods using ABC - Add Awaitable return types to align with TypeScript implementation - Both methods can now return sync or async results - Remove to_snapshot method (not present in TypeScript) Relates to: i-am-bee#1205 (comment) Relates to: i-am-bee#1205 (comment) Signed-off-by: Sakeeb91 <[email protected]>
Break long JSON strings into multiple lines to comply with linting rules Signed-off-by: Sakeeb91 <[email protected]>
Tomas2D
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work so far! Thank you!
| def _decode_builtin(cls, class_name: str, raw: Mapping[str, Any], seen: dict[str, Any]) -> Any: | ||
| payload = raw.get("__value") | ||
| ref_id = raw.get("__ref") | ||
|
|
||
| def remember(value: Any) -> Any: | ||
| if ref_id is not None: | ||
| seen[ref_id] = value | ||
| return value | ||
|
|
||
| if class_name in {"Undefined", "Null"}: | ||
| return remember(None) | ||
| if class_name == "Number": | ||
| if payload is None: | ||
| return remember(0) | ||
| if isinstance(payload, str): | ||
| number = float(payload) | ||
| if number.is_integer(): | ||
| return remember(int(number)) | ||
| return remember(number) | ||
| if isinstance(payload, int | float): | ||
| return remember(payload) | ||
| raise SerializerError("Encountered malformed payload for JavaScript Number.") | ||
| if class_name == "Boolean": | ||
| if isinstance(payload, bool): | ||
| return remember(payload) | ||
| if isinstance(payload, str): | ||
| return remember(payload.lower() in {"true", "1"}) | ||
| return remember(bool(payload)) | ||
| if class_name == "String": | ||
| if payload is None: | ||
| return remember("") | ||
| return remember(str(payload)) | ||
| if class_name == "BigInt": | ||
| if payload is None: | ||
| return remember(0) | ||
| return remember(int(payload)) | ||
| if class_name == "Date": | ||
| if not isinstance(payload, str): | ||
| raise SerializerError("Encountered malformed payload for JavaScript Date.") | ||
| text = payload.replace("Z", "+00:00") if payload.endswith("Z") else payload | ||
| return remember(datetime.fromisoformat(text)) | ||
| if class_name in {"Array", "Set"}: | ||
| container: list[Any] | set[Any] | ||
| if payload is None: | ||
| items: list[Any] = [] | ||
| elif isinstance(payload, list): | ||
| items = payload | ||
| else: | ||
| raise SerializerError("Encountered malformed payload for JavaScript Array or Set.") | ||
| container = set() if class_name == "Set" else [] | ||
| remember(container) | ||
| for item in items: | ||
| decoded = cls._decode(item, seen) | ||
| if isinstance(container, list): | ||
| container.append(decoded) | ||
| else: | ||
| container.add(decoded) | ||
| return container | ||
| if class_name == "Map": | ||
| mapping: dict[Any, Any] = {} | ||
| remember(mapping) | ||
| if payload is None: | ||
| return mapping | ||
| if not isinstance(payload, list): | ||
| raise SerializerError("Encountered malformed payload for JavaScript Map.") | ||
| for pair in payload: | ||
| if not isinstance(pair, Iterable): | ||
| raise SerializerError("Encountered malformed payload for JavaScript Map.") | ||
| try: | ||
| key_raw, value_raw = pair | ||
| except ValueError as exc: | ||
| raise SerializerError("Encountered malformed payload for JavaScript Map.") from exc | ||
| key = cls._decode(key_raw, seen) | ||
| value = cls._decode(value_raw, seen) | ||
| mapping[key] = value | ||
| return mapping | ||
| if class_name == "Object": | ||
| result: dict[str, Any] = {} | ||
| if not isinstance(payload, Mapping): | ||
| raise SerializerError("Encountered malformed payload for JavaScript Object.") | ||
| remember(result) | ||
| for key, item in payload.items(): | ||
| result[key] = cls._decode(item, seen) | ||
| return result | ||
|
|
||
| return cls._NOT_HANDLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Built-ins should be also registered as every other data types as it is done in TypeScript. No messy if statements.
| def _register_message_class( | ||
| message_cls: type[Message[Any]], | ||
| allowed_content: Mapping[str, type[BaseModel]], | ||
| ) -> None: | ||
| def _registrar() -> None: | ||
| from beeai_framework.serialization import Serializer, SerializerError | ||
|
|
||
| def _to_plain(message: Message[Any]) -> dict[str, Any]: | ||
| return { | ||
| "id": message.id, | ||
| "meta": dict(message.meta), | ||
| "role": str(message.role), | ||
| "content": [fragment.model_dump() for fragment in message.content], | ||
| } | ||
|
|
||
| def _from_plain(payload: Mapping[str, Any]) -> Message[Any]: | ||
| content_payload = payload.get("content", []) or [] | ||
| content_models: list[Any] = [] | ||
|
|
||
| for item in content_payload: | ||
| if not isinstance(item, Mapping): | ||
| raise SerializerError("Serialized message content must be a mapping.") | ||
|
|
||
| content_type = item.get("type") | ||
| if not isinstance(content_type, str): | ||
| raise SerializerError( | ||
| f"Serialized message content is missing or has an invalid 'type': {content_type!r}", | ||
| ) | ||
|
|
||
| model_cls = allowed_content.get(content_type) | ||
| if model_cls is None: | ||
| raise SerializerError( | ||
| f"Unsupported message content '{content_type}' for {message_cls.__name__}.", | ||
| ) | ||
| content_models.append(model_cls.model_validate(dict(item))) | ||
|
|
||
| meta = payload.get("meta") or {} | ||
| meta_copy = dict(meta) | ||
| return message_cls(content_models, meta=meta_copy, id=payload.get("id")) | ||
|
|
||
| Serializer.register( | ||
| message_cls, | ||
| to_plain=_to_plain, | ||
| from_plain=_from_plain, | ||
| ) | ||
|
|
||
| _registrar() | ||
|
|
||
| def _register_method(cls: type[Message[Any]], *, aliases: Iterable[str] | None = None) -> None: | ||
| _registrar() | ||
|
|
||
| type.__setattr__(message_cls, "register", classmethod(_register_method)) | ||
|
|
||
|
|
||
| _register_message_class( | ||
| SystemMessage, | ||
| { | ||
| "text": MessageTextContent, | ||
| }, | ||
| ) | ||
|
|
||
| _register_message_class( | ||
| UserMessage, | ||
| { | ||
| "text": MessageTextContent, | ||
| "image_url": MessageImageContent, | ||
| "file": MessageFileContent, | ||
| }, | ||
| ) | ||
|
|
||
| _register_message_class( | ||
| AssistantMessage, | ||
| { | ||
| "text": MessageTextContent, | ||
| "tool-call": MessageToolCallContent, | ||
| }, | ||
| ) | ||
|
|
||
| _register_message_class( | ||
| ToolMessage, | ||
| { | ||
| "tool-result": MessageToolResultContent, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be needed if we make an arbitrary Pydantic model a serializable which should be easy.
Seriazer.register(BaseModel, to_plain=lambda value: value.model_dump(), from_plain=lambda value, ref: ref.model_validate(value)) Note: the current implementation of from_plain differs from the one defined in TS. In TS the second parameter is a reference to the factory (class/function). It is not implemented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message parts should inherit from Serializable instead of adding these functions. Please update.
|
Hello @Sakeeb91. Would you be willing to address the comments I mentioned? I am ready to help if you need me. Thank you! |
…to from_plain - Replace _decode_builtin if-statement logic with proper factory registrations for all built-in types (int, float, str, bool, list, set, dict, datetime) matching the TypeScript serializer pattern - Update from_plain signature to accept factory ref as second parameter for consistency with TypeScript implementation - Register Pydantic BaseModel as generic serializable type Signed-off-by: Sakeeb91 <[email protected]>
|
@Tomas2D Thank you for the buzz. I may have overlooked your previous comments and gone on with my life. I have addressed the changes. I hope they are alright! |
Tomas2D
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments have not been addressed. Please take a look at my comments regarding serialization of the message parts.
|
Addressed the review feedback. Message content classes now inherit from All serialization and backend tests pass. |
- Add SerializableModel base class combining BaseModel with Serializable - Update all message content classes to inherit from SerializableModel - Update Message class to inherit from Serializable with snapshot methods - Remove _register_message_class function and manual registration calls Signed-off-by: Sakeeb91 <[email protected]>
Summary
Serializer/Serializableimplementation for the Python SDK so the docs can point at runnable snippetsdocs/modules/serialization.mdxwith concrete Python and TypeScript examples that follow existing codegroup conventionspython/examples/serializationDesign Decisions
Testing
deepeval,outlines)Closes #1200.