Skip to content

Conversation

@colinosullivan-ie
Copy link

@colinosullivan-ie colinosullivan-ie commented Dec 7, 2025

Which issue(s) does this pull-request address?

#1228

Closes: #

Description

Adding docstring documentation to the Backend module

Checklist

General

Code quality checks

  • Code quality checks pass: mise check (mise fix to auto-fix)

Testing

  • Unit tests pass: mise test:unit
  • E2E tests pass: mise test:e2e
  • Tests are included (for bug fixes or new features)

Documentation

  • Documentation is updated
  • Embedme embeds code examples in docs. To update after edits, run: Python mise docs:fix

@colinosullivan-ie colinosullivan-ie requested a review from a team as a code owner December 7, 2025 21:15
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 7, 2025
@github-actions github-actions bot added the python Python related functionality label Dec 7, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @colinosullivan-ie, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation within the backend module of the beeai-framework by introducing comprehensive docstrings across several key classes and methods. The primary goal is to improve the developer experience by providing clear, detailed explanations and usage examples for core components such as Backend, ChatModel, EmbeddingModel, DocumentLoader, TextSplitter, and VectorStore. This effort ensures that developers can quickly understand and effectively utilize the framework's AI model and data processing capabilities.

Highlights

  • Backend Module Documentation: Comprehensive docstrings have been added to the Backend class and its core methods (__init__, from_name, from_provider, clone) in python/beeai_framework/backend/backend.py. These docstrings clarify the purpose, attributes, arguments, returns, and provide usage examples for each component, significantly improving the clarity of the unified interface for chat and embedding models.
  • Chat Model Documentation: Detailed docstrings are now present for ChatModelKwargs and the ChatModel abstract base class, along with all its abstract methods and key functionalities like __init__, config, from_name, clone, and get_default_parameters in python/beeai_framework/backend/chat.py. This enhances understanding of how to interact with various LLM providers, including tool calling, structured outputs, and model configuration.
  • Embedding Model Documentation: The EmbeddingModelKwargs and EmbeddingModel abstract base class, including its properties (model_id, provider_id, emitter) and methods (__init__, create, from_name, clone, destroy), have received extensive docstrings in python/beeai_framework/backend/embedding.py. This provides clear guidance on converting text into vector embeddings and managing embedding model instances.
  • Document and Text Processing Documentation: Docstrings have been added to the DocumentLoader, TextSplitter, QueryLike protocol, and VectorStore abstract base classes and their respective methods in python/beeai_framework/backend/document_loader.py, python/beeai_framework/backend/text_splitter.py, and python/beeai_framework/backend/vector_store.py. This improves the documentation for loading documents, splitting text into chunks, and interacting with vector databases for semantic search.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does an excellent job of adding comprehensive docstrings to the backend module, significantly improving the documentation. The new docstrings are clear, follow a consistent style, and include helpful examples.

I've identified a couple of areas for improvement:

  • The examples for the clone methods in Backend, ChatModel, and EmbeddingModel can be misleading. They imply a deep copy behavior that is not guaranteed by the base implementation, which could lead to user confusion.
  • The placeholder error messages in NotImplementedError exceptions for abstract methods could be made more descriptive to improve the developer experience when subclassing.

These are minor points in an otherwise great contribution to the project's documentation.

Comment on lines +114 to +117
Example:
>>> original = Backend.from_provider("openai")
>>> cloned = await original.clone()
>>> # Modifications to cloned won't affect original
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example for clone assumes that the underlying chat and embedding models implement a deep copy clone method. As noted in the ChatModel and EmbeddingModel base classes, their default clone implementation performs a shallow copy (returns self). If the specific models (e.g., from the "openai" provider) do not override this, the example is misleading as cloned will not be a deep copy. This could cause unexpected side effects for users. The documentation should be clear about this dependency or use a more robust example that doesn't rely on a specific provider's implementation.

Comment on lines +747 to +751
Example:
>>> original = ChatModel.from_name("openai:gpt-4")
>>> cloned = await original.clone()
>>> # Modifications to cloned won't affect original
>>> cloned.config(parameters=ChatModelParameters(temperature=0.9))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example provided for the clone method can be misleading. It uses ChatModel.from_name("openai:gpt-4"), which returns an OpenAIChatModel instance. However, the base ChatModel.clone method returns self (a shallow copy) and logs a warning if not overridden. If OpenAIChatModel or its parents do not override clone to perform a deep copy, the example is incorrect because modifications to cloned will affect original. This could lead to subtle bugs for users who copy this example.

To avoid confusion, the example should either use a mock class that is known to implement clone correctly, or be rephrased to not rely on a specific implementation that may not have this feature.

Comment on lines +286 to +289
Example:
>>> original = EmbeddingModel.from_name("openai:text-embedding-3-small")
>>> cloned = await original.clone()
>>> # Modifications to cloned won't affect original
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the ChatModel.clone method, the example for EmbeddingModel.clone can be misleading. It uses EmbeddingModel.from_name("openai:text-embedding-3-small") and implies a deep copy is made. However, the base implementation returns self and warns if not overridden. If OpenAIEmbeddingModel does not implement a deep clone, this example is incorrect and could cause bugs for users.

Consider using a mock class in the example to demonstrate the intended behavior of a correctly implemented clone method, to avoid confusion.

>>> print(len(chunks))
>>> print(chunks[0]) # First chunk
"""
raise NotImplementedError("Implement me")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The NotImplementedError message "Implement me" is generic. For a better developer experience, consider providing a more descriptive message.

Suggested change
raise NotImplementedError("Implement me")
raise NotImplementedError("Subclasses must implement the `split_text` method.")

Raises:
NotImplementedError: If the subclass doesn't implement this method.
"""
raise NotImplementedError("Implement me")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The NotImplementedError message "Implement me" is generic. For a better developer experience, consider providing a more descriptive message.

Suggested change
raise NotImplementedError("Implement me")
raise NotImplementedError("Subclasses must implement the `_class_from_name` method.")

>>> ids = await vector_store.add_documents(documents)
>>> print(ids) # ['id1', 'id2']
"""
raise NotImplementedError("Implement me")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The NotImplementedError message "Implement me" is generic. For a better developer experience, consider providing a more descriptive message.

Suggested change
raise NotImplementedError("Implement me")
raise NotImplementedError("Subclasses must implement the `add_documents` method.")

... filter={"source": "documentation"}
... )
"""
raise NotImplementedError("Implement me")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The NotImplementedError message "Implement me" is generic. For a better developer experience, consider providing a more descriptive message.

Suggested change
raise NotImplementedError("Implement me")
raise NotImplementedError("Subclasses must implement the `search` method.")

Raises:
NotImplementedError: If the subclass doesn't implement this method.
"""
raise NotImplementedError("Implement me")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The NotImplementedError message "Implement me" is generic. For a better developer experience, consider providing a more descriptive message.

Suggested change
raise NotImplementedError("Implement me")
raise NotImplementedError("Subclasses must implement the `_class_from_name` method.")

Comment on lines +69 to +84
Attributes:
tool_call_fallback_via_response_format: Enable fallback to response format for tool calls.
retry_on_empty_response: Automatically retry when the model returns an empty response.
model_supports_tool_calling: Whether the underlying model supports native tool calling.
allow_parallel_tool_calls: Allow the model to make multiple tool calls simultaneously.
ignore_parallel_tool_calls: Ignore all but the first tool call when multiple are returned.
use_strict_tool_schema: Use strict JSON schema validation for tool parameters.
use_strict_model_schema: Use strict JSON schema validation for structured outputs.
supports_top_level_unions: Whether the model supports union types at the top level.
parameters: Default parameters for model generation (temperature, max_tokens, etc.).
cache: Cache implementation for storing and retrieving model outputs.
settings: Additional provider-specific settings.
middlewares: List of middleware to apply during model execution.
tool_choice_support: Set of supported tool choice modes (required, none, single, auto).
fix_invalid_tool_calls: Automatically attempt to fix malformed tool calls.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do comments on each attribute instead? Like it is done in AgentOptions.

Copy link
Contributor

@Tomas2D Tomas2D left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. Just please update the way comments are done for Pydantic Models / Kwargs. Prefer an inline description instead of the top-level one, as it improves readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Python related functionality size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants