Skip to content

Conversation

@Sakeeb91
Copy link
Contributor

Summary

  • add a lightweight Serializer/Serializable implementation for the Python SDK so the docs can point at runnable snippets
  • replace the "Example coming soon" placeholders in docs/modules/serialization.mdx with concrete Python and TypeScript examples that follow existing codegroup conventions
  • add focused pytest coverage for the new serializer behavior and surface the Python examples under python/examples/serialization

Design Decisions

  • kept the serializer API intentionally minimal (registry-based with JSON payloads) to mirror the TypeScript shape without pulling in heavy dependencies
  • registered core backend message types lazily so docs examples (and deserialization) work without forcing global imports
  • stored Python examples alongside existing samples to stay consistent with the docs embedding workflow

Testing

  • python/.venv/bin/python -m pytest python/tests/serialization/test_serializer.py
  • python/.venv/bin/python -m pytest -m unit (fails: missing optional deps deepeval, outlines)

Closes #1200.

@Sakeeb91 Sakeeb91 requested review from a team as code owners October 18, 2025 19:44
@github-actions github-actions bot added documentation Improvements or additions to documentation python Python related functionality labels Oct 18, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Sakeeb91, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Python SDK by introducing a robust, yet lightweight, serialization framework. The primary goal is to provide concrete, runnable examples within the documentation, improving clarity and usability for developers. This involves not only the creation of the serialization mechanism itself but also its integration into existing core components and the addition of comprehensive examples and tests to validate its functionality.

Highlights

  • New Serialization Framework: Introduced a lightweight Serializer and Serializable implementation for the Python SDK, designed to be registry-based with JSON payloads and a minimal API.
  • Documentation Examples: Replaced 'Example coming soon' placeholders in docs/modules/serialization.mdx with concrete, runnable Python and TypeScript serialization examples, following existing codegroup conventions.
  • Core Component Serialization: Implemented serialization capabilities for key framework components, including datetime objects and memory classes (UnconstrainedMemory, TokenMemory), by integrating them with the new Serializable mixin.
  • Lazy Registration of Message Types: Core backend message types are now registered lazily with the serializer, ensuring that documentation examples and deserialization work without forcing global imports.
  • New Test Coverage: Added focused pytest coverage for the new serializer behavior, including tests for datetime roundtrips, memory serialization, custom class registration, and Serializable subclass functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable serialization capability to the Python SDK, complete with a lightweight implementation, documentation, and examples. The overall approach is solid, particularly the use of a Serializable mixin and lazy registration to avoid circular dependencies. My review highlights a few areas for improvement to enhance robustness and clarity. I've pointed out a significant limitation in the current serializer regarding the handling of object identity, which could cause issues with complex data structures. I've also noted an inconsistency in the Serializable helper class and a couple of confusing aspects in the new examples. Addressing these points will strengthen the implementation and improve the developer experience for users of this new feature.

Comment on lines 179 to 196
def _encode(cls, value: Any) -> Any:
if value is None or isinstance(value, (bool, int, float, str)):
return value

if isinstance(value, (list, tuple, set)):
return [cls._encode(item) for item in value]

if isinstance(value, dict):
return {str(key): cls._encode(item) for key, item in value.items()}

factory, name = cls._find_factory_for_instance(value)
plain = factory.to_plain(value)
encoded_plain = cls._encode(plain)
return {
"__serializer": True,
"__class": name,
"__value": encoded_plain,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This serialization implementation doesn't handle object identity. If the same object instance appears multiple times in the object graph, it will be serialized each time. Upon deserialization, this will result in multiple distinct object copies instead of a single shared instance. This can lead to subtle bugs and larger payloads. Consider implementing reference tracking (e.g., using object IDs and __ref fields, similar to the TypeScript implementation) to correctly handle graphs with shared objects or cycles.

from beeai_framework.backend import UserMessage
from beeai_framework.memory import UnconstrainedMemory

SERIALIZED_MEMORY = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SERIALIZED_MEMORY constant is an empty string. This will cause json.loads() to fail with a JSONDecodeError inside the from_serialized method, preventing the example from running. To make this a working example, please provide a valid serialized JSON string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid comment? @Sakeeb91

Comment on lines 258 to 259
instance = cls() # type: ignore[call-arg]
instance.load_snapshot(snapshot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of cls() with a type: ignore[call-arg] suggests a fragile assumption that all Serializable subclasses have a no-argument __init__. This is also inconsistent with the internal deserialization path which uses cls.__new__(cls) to create an instance without calling __init__. To make this method more robust and consistent, consider using __new__ here as well.

Suggested change
instance = cls() # type: ignore[call-arg]
instance.load_snapshot(snapshot)
instance = cls.__new__(cls)
instance.load_snapshot(snapshot)



def main() -> None:
memory = UnconstrainedMemory.from_serialized(SERIALIZED_MEMORY, extra_classes=[UserMessage])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SERIALIZED_MEMORY string contains both UserMessage and AssistantMessage instances, but extra_classes only lists UserMessage. While this example currently works because importing UserMessage from beeai_framework.backend triggers the registration of all message types in that module, it's misleading for a user trying to understand extra_classes. It incorrectly implies that only UserMessage needs to be specified. For clarity, consider either including AssistantMessage in extra_classes (and importing it) or simplifying SERIALIZED_MEMORY to only contain UserMessage instances.

@Sakeeb91
Copy link
Contributor Author

Thanks for the thorough review! I pushed a follow-up that:

  • adds reference tracking/__ref support to the Python serializer so shared instances and cycles survive a round trip (with new identity + cycle tests)
  • tightens the backend message deserializer to validate the 'type' field explicitly
  • switches Serializable.from_snapshot to use __new__ instead of assuming a no-arg __init__
  • updates the context example to ship a real serialized payload and list both UserMessage and AssistantMessage in extra_classes

Let me know if you spot anything else!

@sandijean90 sandijean90 requested a review from Tomas2D October 20, 2025 16:36
@sandijean90
Copy link
Contributor

@Sakeeb91 thanks for your work here! We're excited to have you on board as a contributor. I've assigned to @Tomas2D to review/comment before merging.

from beeai_framework.backend import UserMessage
from beeai_framework.memory import UnconstrainedMemory

SERIALIZED_MEMORY = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid comment? @Sakeeb91

message_cls: type[Message[Any]],
allowed_content: Mapping[str, type[BaseModel]],
) -> None:
def _registrar() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo I guess

@Tomas2D
Copy link
Contributor

Tomas2D commented Oct 21, 2025

Thank you for your hard work! I am thinking if there is a more Pythonic way of doing that (pickling). What do you think?

@Sakeeb91
Copy link
Contributor Author

Sakeeb91 commented Oct 21, 2025

@Tomas2D thanks for the thoughtful suggestion! I explored pickle early on, but stuck with this JSON serializer for a couple reasons: it keeps deserialization side-effect free (pickle executes arbitrary code), the snapshots need to round-trip through our JavaScript SDK, and JSON is a lot easier to diff/debug in docs and tooling. With the reference tracking and tighter registry in this patch we close the identity gaps the review pointed out while staying interoperable.

@Tomas2D
Copy link
Contributor

Tomas2D commented Oct 29, 2025

I see your point. Thank you for the research. Before we merge this, can you verify that some basic examples work in both TS and Python so that one can use dump from TS and use it in Python, and vice versa?

@Sakeeb91
Copy link
Contributor Author

Verified the TS ↔︎ Python round-trip with runnable snippets:

  • TS → Py: Serializer.serialize(UnconstrainedMemory) snapshot restores via UnconstrainedMemory.from_serialized(...), yielding the original UserMessage/AssistantMessage texts.
  • Py → TS: Python memory.serialize() payload hydrates in TypeScript by deserializing the memory and resolving each serializer node (same message types/text).

Commands + outputs are in reviewer_verification.md (above). I also reran poetry run pytest tests/serialization/test_serializer.py. Let me know if you’d like the raw dumps or any additional coverage.

def create_snapshot(self) -> T:
raise NotImplementedError

def load_snapshot(self, snapshot: T) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should be:

  1. abstract method
  2. Awaitable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated both create_snapshot and load_snapshot to be abstract methods with Awaitable return types to match the TypeScript implementation:
@AbstractMethod
def create_snapshot(self) -> T | Awaitable[T]:
raise NotImplementedError

@AbstractMethod
def load_snapshot(self, snapshot: T) -> None | Awaitable[None]:
raise NotImplementedError
The Serializable class now inherits from ABC, and both methods use the @AbstractMethod decorator. This ensures type safety and aligns with the TypeScript interface at serializable.ts:25-26 where both methods can return Promise types.

…able

Address PR review feedback:
- Make create_snapshot and load_snapshot abstract methods using ABC
- Add Awaitable return types to align with TypeScript implementation
- Both methods can now return sync or async results
- Remove to_snapshot method (not present in TypeScript)

Relates to: i-am-bee#1205 (comment)
Relates to: i-am-bee#1205 (comment)

Signed-off-by: Sakeeb91 <[email protected]>
Break long JSON strings into multiple lines to comply with linting rules

Signed-off-by: Sakeeb91 <[email protected]>
Copy link
Contributor

@Tomas2D Tomas2D left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far! Thank you!

Comment on lines 268 to 353
def _decode_builtin(cls, class_name: str, raw: Mapping[str, Any], seen: dict[str, Any]) -> Any:
payload = raw.get("__value")
ref_id = raw.get("__ref")

def remember(value: Any) -> Any:
if ref_id is not None:
seen[ref_id] = value
return value

if class_name in {"Undefined", "Null"}:
return remember(None)
if class_name == "Number":
if payload is None:
return remember(0)
if isinstance(payload, str):
number = float(payload)
if number.is_integer():
return remember(int(number))
return remember(number)
if isinstance(payload, int | float):
return remember(payload)
raise SerializerError("Encountered malformed payload for JavaScript Number.")
if class_name == "Boolean":
if isinstance(payload, bool):
return remember(payload)
if isinstance(payload, str):
return remember(payload.lower() in {"true", "1"})
return remember(bool(payload))
if class_name == "String":
if payload is None:
return remember("")
return remember(str(payload))
if class_name == "BigInt":
if payload is None:
return remember(0)
return remember(int(payload))
if class_name == "Date":
if not isinstance(payload, str):
raise SerializerError("Encountered malformed payload for JavaScript Date.")
text = payload.replace("Z", "+00:00") if payload.endswith("Z") else payload
return remember(datetime.fromisoformat(text))
if class_name in {"Array", "Set"}:
container: list[Any] | set[Any]
if payload is None:
items: list[Any] = []
elif isinstance(payload, list):
items = payload
else:
raise SerializerError("Encountered malformed payload for JavaScript Array or Set.")
container = set() if class_name == "Set" else []
remember(container)
for item in items:
decoded = cls._decode(item, seen)
if isinstance(container, list):
container.append(decoded)
else:
container.add(decoded)
return container
if class_name == "Map":
mapping: dict[Any, Any] = {}
remember(mapping)
if payload is None:
return mapping
if not isinstance(payload, list):
raise SerializerError("Encountered malformed payload for JavaScript Map.")
for pair in payload:
if not isinstance(pair, Iterable):
raise SerializerError("Encountered malformed payload for JavaScript Map.")
try:
key_raw, value_raw = pair
except ValueError as exc:
raise SerializerError("Encountered malformed payload for JavaScript Map.") from exc
key = cls._decode(key_raw, seen)
value = cls._decode(value_raw, seen)
mapping[key] = value
return mapping
if class_name == "Object":
result: dict[str, Any] = {}
if not isinstance(payload, Mapping):
raise SerializerError("Encountered malformed payload for JavaScript Object.")
remember(result)
for key, item in payload.items():
result[key] = cls._decode(item, seen)
return result

return cls._NOT_HANDLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built-ins should be also registered as every other data types as it is done in TypeScript. No messy if statements.

Comment on lines 379 to 462
def _register_message_class(
message_cls: type[Message[Any]],
allowed_content: Mapping[str, type[BaseModel]],
) -> None:
def _registrar() -> None:
from beeai_framework.serialization import Serializer, SerializerError

def _to_plain(message: Message[Any]) -> dict[str, Any]:
return {
"id": message.id,
"meta": dict(message.meta),
"role": str(message.role),
"content": [fragment.model_dump() for fragment in message.content],
}

def _from_plain(payload: Mapping[str, Any]) -> Message[Any]:
content_payload = payload.get("content", []) or []
content_models: list[Any] = []

for item in content_payload:
if not isinstance(item, Mapping):
raise SerializerError("Serialized message content must be a mapping.")

content_type = item.get("type")
if not isinstance(content_type, str):
raise SerializerError(
f"Serialized message content is missing or has an invalid 'type': {content_type!r}",
)

model_cls = allowed_content.get(content_type)
if model_cls is None:
raise SerializerError(
f"Unsupported message content '{content_type}' for {message_cls.__name__}.",
)
content_models.append(model_cls.model_validate(dict(item)))

meta = payload.get("meta") or {}
meta_copy = dict(meta)
return message_cls(content_models, meta=meta_copy, id=payload.get("id"))

Serializer.register(
message_cls,
to_plain=_to_plain,
from_plain=_from_plain,
)

_registrar()

def _register_method(cls: type[Message[Any]], *, aliases: Iterable[str] | None = None) -> None:
_registrar()

type.__setattr__(message_cls, "register", classmethod(_register_method))


_register_message_class(
SystemMessage,
{
"text": MessageTextContent,
},
)

_register_message_class(
UserMessage,
{
"text": MessageTextContent,
"image_url": MessageImageContent,
"file": MessageFileContent,
},
)

_register_message_class(
AssistantMessage,
{
"text": MessageTextContent,
"tool-call": MessageToolCallContent,
},
)

_register_message_class(
ToolMessage,
{
"tool-result": MessageToolResultContent,
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be needed if we make an arbitrary Pydantic model a serializable which should be easy.

Seriazer.register(BaseModel, to_plain=lambda value: value.model_dump(), from_plain=lambda value, ref: ref.model_validate(value)) 

Note: the current implementation of from_plain differs from the one defined in TS. In TS the second parameter is a reference to the factory (class/function). It is not implemented here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message parts should inherit from Serializable instead of adding these functions. Please update.

@Tomas2D
Copy link
Contributor

Tomas2D commented Dec 2, 2025

Hello @Sakeeb91. Would you be willing to address the comments I mentioned? I am ready to help if you need me. Thank you!

…to from_plain

- Replace _decode_builtin if-statement logic with proper factory
  registrations for all built-in types (int, float, str, bool, list,
  set, dict, datetime) matching the TypeScript serializer pattern
- Update from_plain signature to accept factory ref as second parameter
  for consistency with TypeScript implementation
- Register Pydantic BaseModel as generic serializable type

Signed-off-by: Sakeeb91 <[email protected]>
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Dec 2, 2025
@Sakeeb91
Copy link
Contributor Author

Sakeeb91 commented Dec 3, 2025

@Tomas2D Thank you for the buzz. I may have overlooked your previous comments and gone on with my life. I have addressed the changes. I hope they are alright!

Copy link
Contributor

@Tomas2D Tomas2D left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments have not been addressed. Please take a look at my comments regarding serialization of the message parts.

@Sakeeb91
Copy link
Contributor Author

Addressed the review feedback. Message content classes now inherit from Serializable via a new SerializableModel base class that combines BaseModel with Serializable. The Message class also inherits from Serializable directly. Removed the _register_message_class function and all manual registration calls.

All serialization and backend tests pass.

- Add SerializableModel base class combining BaseModel with Serializable
- Update all message content classes to inherit from SerializableModel
- Update Message class to inherit from Serializable with snapshot methods
- Remove _register_message_class function and manual registration calls

Signed-off-by: Sakeeb91 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python Python related functionality size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace “Example coming soon” placeholders in serialization docs

3 participants