Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/aggregators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,13 @@ Several additional parameters that guide general behavior of the aggregation flo
- Whether to retain or delete model update files after they have been processed (default is to delete them)


Extending Scaleout Edge with new Aggregators
--------------------------------------------
Implement own Aggregators
-------------------------

Scaleout Edge supports a flexible architecture that allows developers to implement custom aggregation logic beyond the built-in options.
To define and register your own aggregator, you should use the server functions interface, where aggregation behavior can be customized to suit specific research or production needs.

For detailed instructions and examples on how to implement new aggregators, see the section on :ref:server-functions.
For detailed instructions and examples on how to implement new aggregators, see the section on :ref:`server-functions`.


.. meta::
Expand Down
4 changes: 2 additions & 2 deletions docs/apiclient.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ To obtain an admin API token press "Generate" in the "Generate Admin token" sect

.. image:: img/generate_admin_token.png

To initalize the connection to the Scaleout Edge REST API:
To initalize the connection to the Scaleout REST API:

.. code-block:: python

Expand Down Expand Up @@ -146,7 +146,7 @@ To get a specific session:
For more information on how to use the APIClient, see the :py:mod:`scaleout-client.scaleout.network.api.client`.
There is also a collection of Jupyter Notebooks showcasing more advanced use of the API, including how to work with other built-in aggregators and how to automate hyperparameter tuning:

- `API Example <https://github.com/scaleoutsystems/scaleout-client/tree/master/scaleout/examples/api-tutorials>`_ .
- `API Example <https://github.com/scaleoutsystems/scaleout-client/python/examples/api-tutorials>`_ .


.. meta::
Expand Down
106 changes: 78 additions & 28 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,54 +3,104 @@
Architecture overview
=====================

Constructing a federated model with Scaleout Edge amounts to a) specifying the details of the client-side training code and data integrations, and b) deploying the federated network. A Scaleout Edge network, as illustrated in the picture below, is made up of components into three different tiers: the *Controller* tier (3), one or more *Combiners* in second tier (2), and a number of *Clients* in tier (1).
The combiners forms the backbone of the federated ML orchestration mechanism, while the Controller tier provides discovery services and controls to coordinate training over the federated network.
By horizontally scaling the number of combiners, one can meet the needs of a growing number of clients.
This page provides an overview of the **Scaleout Edge architecture**. What
follows is a conceptual description of the components that make up a Scaleout
Edge network and how they interact during a federated training session.

A Scaleout Edge network consists of three tiers:

- **Tier 1: Clients**
- **Tier 2: Combiners**
- **Tier 3: Controller and supporting services**

These components work together to coordinate distributed, privacy-preserving
machine learning across a large number of participating data nodes.

.. image:: img/Scaleout_Edge_network.png
:alt: Scaleout Edge network
:width: 100%
:align: center

Tier 1 — Clients
----------------

A **Client** (gRPC client) is a data node holding private data and connecting to
a Combiner (gRPC server) to receive training tasks and validation requests during
federated sessions.

Key characteristics:

- Clients communicate **outbound only** using RPC.
No inbound or publicly exposed ports are required.
- Upon connecting to the network, a client receives a **compute package** from the
Controller or uses one that is locally available for the client. This package
contains e.g. training and validation code to execute locally.
- The compute package is defined by entry points in the client code and can be
customized to support various model types, frameworks, and even programming
languages.

Python, C++ and Kotlin client implementations are provided out-of-the-box, but clients may be
implemented in any language to suit specific hardware or software environments.

Tier 2 — Combiners
------------------

A **Combiner** orchestrates and aggregates model updates coming from its local
group of clients. It is responsible for the mid-level federated learning workflow.

Key responsibilities:

- Running a dedicated gRPC server for interacting with clients and the Controller.
- Executing the orchestration plan defined in the global **compute plan**
provided by the Controller.
- Reducing client model updates into a single **combiner-level model**.

Because each Combiner operates independently, the total number of clients that
can be supported scales with the number of deployed Combiners. Combiners may be
placed in the cloud, on fog/edge nodes, or in any environment suited for running
the aggregation service.

**The clients: tier 1**
Tier 3 — Controller and base services
-------------------------------------

A Client (gRPC client) is a data node, holding private data and connecting to a Combiner (gRPC server) to receive model update requests and model validation requests during training sessions.
Importantly, clients uses remote procedure calls (RPC) to ask for model updates tasks, thus the clients not require any open ingress ports! A client receives the code (called package or compute package) to be executed from the *Controller*
upon connecting to the network, and thus they only need to be configured prior to connection to read the local datasets during training and validation. The package is based on entry points in the client code, and can be customized to fit the needs of the user.
This allows for a high degree of flexibility in terms of what kind of training and validation tasks that can be performed on the client side. Such as different types of machine learning models and framework, and even programming languages.
A python3 client implementation is provided out of the box, and it is possible to write clients in a variety of languages to target different software and hardware requirements.
Tier 3 contains several services, with the **Controller** being the central
component coordinating global training. The Controller has three primary roles:

**The combiners: tier 2**
1. **Global orchestration**
It defines the overall training strategy, distributes the compute plan, and
specifies how combiner-level models should be combined into a global model.

A combiner is an actor whose main role is to orchestrate and aggregate model updates from a number of clients during a training session.
When and how to trigger such orchestration are specified in the overall *compute plan* laid out by the *Controller*.
Each combiner in the network runs an independent gRPC server, providing RPCs for interacting with the federated network it controls.
Hence, the total number of clients that can be accommodated in a Scaleout Edge network is proportional to the number of active combiners in the Scaleout Edge network.
Combiners can be deployed anywhere, e.g. in a cloud or on a fog node to provide aggregation services near the cloud edge.
2. **Global state management**
The Controller maintains the **model trail**—an immutable record of global
model updates forming the training timeline.

**The controller: tier 3**
3. **Discovery and connectivity**
It provides discovery services and mediates connections between clients and
combiners. For this purpose, the Controller exposes a standard REST API used
by RPC clients/servers and by user interfaces.

Tier 3 does actually contain several components and services, but we tend to associate it with the *Controller* the most. The *Controller* fills three main roles in the Scaleout Edge network:
Additional Tier 3 services include:

1. it lays out the overall, global training strategy and communicates that to the combiner network.
It also dictates the strategy to aggregate model updates from individual combiners into a single global model,
2. it handles global state and maintains the *model trail* - an immutable trail of global model updates uniquely defining the federated ML training timeline, and
3. it provides discovery services, mediating connections between clients and combiners. For this purpose, the *Controller* exposes a standard REST API both for RPC clients and servers, but also for user interfaces and other services.
- **Reducer**
Aggregates the combiner-level models into a single global model.

Tier 3 also contain a *Reducer* component, which is responsible for aggregating combiner-level models into a single global model. Further, it contains a *StateStore* database,
which is responsible for storing various states of the network and training sessions. The final global model trail from a traning session is stored in the *ModelRegistry* database.
- **StateStore**
Stores the state of the network, training sessions, and metadata.

- **ModelRegistry**
Stores the final global model trail after a completed training session.

**Notes on aggregating algorithms**
Notes on aggregation algorithms
-------------------------------

Scaleout Edge is designed to allow customization of the FedML algorithm, following a specified pattern, or programming model.
Model aggregation happens on two levels in the network. First, each Combiner can be configured with a custom orchestration and aggregation implementation, that reduces model updates from Clients into a single, *combiner level* model.
Then, a configurable aggregation protocol on the *Controller* level is responsible for combining the combiner-level models into a global model. By varying the aggregation schemes on the two levels in the system,
many different possible outcomes can be achieved. Good starting configurations are provided out-of-the-box to help the user get started. See :ref:`agg-label` and API reference for more details.
Scaleout Edge includes several **built-in aggregators** for common FL workflows
(see :ref:`agg-label`). For advanced scenarios, users may override the
Combiner-level behavior using **server functions** (:ref:`server-functions`),
allowing custom orchestration or aggregation logic.

Aggregation happens in two stages:
1) each Combiner reduces client updates into a *combiner-level model*, and
2) the Controller (Reducer) combines these into the final global model.

.. meta::
:description lang=en:
Expand Down
27 changes: 14 additions & 13 deletions docs/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
CLI
=================================

The Scaleout Edge Command-Line Interface (CLI) is a powerful tool that allows users to interact with the Scaleout Edge server. It provides a comprehensive set of commands to manage and operate various components of the Scaleout Edge network, including starting services, managing sessions, and retrieving data.
The Scaleout Edge Command-Line Interface (CLI) is designed to streamline management of the Scaleout Edge platform, making it easier for users to deploy, monitor and interact with their federated learning networks.

With the Scaleout Edge CLI, users can:

Expand All @@ -19,6 +19,19 @@ The Scaleout Edge CLI is designed to streamline the management of the Scaleout E

For detailed usage and examples, refer to the sections below.

Login
------

The `scaleout` commands allow users to log in to Scaleout Edge and interact with the platform.

**Commands:**

- **scaleout login** - Log in to the Scaleout Edge using a username, password, and host. Example:

.. code-block:: bash

scaleout login -u username -P password -H host

Client
------

Expand All @@ -44,18 +57,6 @@ The `scaleout client` commands allow users to start and manage Scaleout Edge cli

scaleout client get-config --name test-client

Login
------

The `scaleout` commands allow users to log in to Scaleout Edge and interact with the platform.

**Commands:**

- **scaleout login** - Log in to the Scaleout Edge using a username, password, and host. Example:

.. code-block:: bash

scaleout login -u username -P password -H host

Combiner
--------
Expand Down
25 changes: 9 additions & 16 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,20 @@ Yes, to facilitate interactive development of the compute package you can start

.. code-block:: bash

scaleout client start --remote=False -in client.yaml
scaleout client start --local-package


Note that in production federations this options should in most cases be disallowed.
Note that in production federations the remote compute package option should in most cases be disallowed.

Q: How can other aggregation algorithms can be defined?
-------------------------------------------------------
Q: How can I define custom aggregation algorithms?
--------------------------------------------------

There is a plugin interface for extending the framework with new aggregators. See
Scaleout Edge provides several built-in aggregators, but custom aggregation or
server-side behavior can be implemented through the **server functions**
interface. This allows you to override or extend the Combiner-level logic as
needed.

:ref:`agg-label`
See :ref:`agg-label` and :ref:`server-functions` for details.


Q: What is needed to include additional ML frameworks in Scaleout Edge?
Expand All @@ -49,16 +52,6 @@ see the section about model marshaling:

:ref:`helper-label`

Q: Can I start a client listening only to training requests or only on validation requests?:
--------------------------------------------------------------------------------------------

Yes! You can toggle which message streams a client subscribes to when starting the client. For example, to start a pure validation client:

.. code-block:: bash

scaleout client start --trainer=False -in client.yaml


Q: How do you approach the question of output privacy?
----------------------------------------------------------------------------------

Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Welcome to Scaleout Edge Documentation
======================================

Scaleout Edge is an open-source framework for scalable federated learning. This documentation covers architecture, setup, deployment, API references, and troubleshooting guidance. Quickly locate configuration examples, technical concepts, or operational details you need to deploy federated models efficiently in production environments.
Scaleout Edge is a framework for scalable federated learning. This documentation covers architecture, setup, deployment, API references, and troubleshooting guidance. Quickly locate configuration examples, technical concepts, or operational details you need to deploy federated models efficiently in production environments.

.. toctree::
:maxdepth: 1
Expand All @@ -23,10 +23,10 @@ Scaleout Edge is an open-source framework for scalable federated learning. This
A Guide to the Scaleout Edge Project Structure <projects>
Architecture Overview <architecture>
aggregators
serverfunctions
cli
helpers
apiclient
serverfunctions
localcompute

.. toctree::
Expand Down
62 changes: 49 additions & 13 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ How federated learning works
In federated learning, models are trained across multiple devices or servers (called client nodes) without moving the data. Here's how it works:

1. **Initialize the global model -** A central server starts with an initial global model—like a neural network or decision tree.
2. **Sending to clients -** The model's parameters are sent to selected clients. Each client keeps its local dataset private.
2. **Model retrieval -** Selected clients download the current model parameters from the server. Their local datasets remain private.
3. **Local training -** Each client updates the model using its local data. This training is repeated in several rounds — not to completion.
4. **Combining the updates -** The updated models from each client are sent back to the central server, where they are combined.

Expand All @@ -29,7 +29,7 @@ This cycle repeats until the global model reaches the desired accuracy.
The Scaleout Edge framework
---------------------------

Scaleout Edge is a federated learning framework focused on security, scalability, and ease of use. It supports the full development lifecycle—from early experiments to production deployments—with minimal code changes. Key design goals include:
Scaleout Edge focuses on security, scalability, and ease of use. It supports the full development lifecycle—from early experiments to production deployments—with minimal code changes. Key design goals include:

- **Minimal server-side complexity for the end-user**. Scaleout Edge handles orchestration, providing a UI, REST API, and Python interface for managing experiments and tracking metrics in real time.

Expand Down Expand Up @@ -57,15 +57,51 @@ Federated learning:
- No inbound ports required on client devices


From development to FL in production:

- Secure deployment of server-side / control-plane on Kubernetes.
- UI with dashboards for orchestrating FL experiments and for visualizing results
- Team features - collaborate with other users in shared project workspaces.
- Features for the trusted-third party: Manage access to the FL network, FL clients and training progress.
- REST API for handling experiments/jobs.
- View and export logging and tracing information.
- Public cloud, dedicated cloud and on-premise deployment options.
From development to FL in production
------------------------------------

Scaleout Edge provides a complete operational toolkit for moving federated
learning from early prototypes to production deployments. The platform’s
capabilities can be grouped into the following categories:

ModelOps / FL Ops
~~~~~~~~~~~~~~~~~
- UI and dashboards for orchestrating FL experiments and monitoring training
progress.
- REST API for managing experiments and jobs.
- Support for multi-round orchestration and model lifecycle management.
- Plug-in architecture for extending aggregators, storage backends, load
balancers, and orchestration components.

Observability & Telemetry
~~~~~~~~~~~~~~~~~~~~~~~~~
- Built-in logging, tracing, and experiment metrics.
- Export and integration options for external observability systems.
- Visual dashboards showing experiment status, model performance, client
activity, and system health.

Security & Trust
~~~~~~~~~~~~~~~~
- Secure, cloud-native control plane deployed on Kubernetes.
- Token-based authentication (JWT) and role-based access control (RBAC).
- Outbound-only connectivity for clients (no inbound ports required).
- Trusted third-party features: manage access to the FL network, clients,
and training progress.

Collaboration & Governance
~~~~~~~~~~~~~~~~~~~~~~~~~~
- Shared project workspaces for collaborative experimentation.
- User and role management for multi-team or multi-organization setups.
- Clear separation of responsibilities between data owners, model owners,
and infrastructure operators.

Deployment & Infrastructure
~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Flexible deployment options: public cloud, private cloud, dedicated cloud,
or fully on-premise.
- Horizontal scalability through multiple combiners.
- Resilience to intermittent client availability and failures across
critical components.

Available client APIs:

Expand All @@ -76,10 +112,10 @@ Available client APIs:
Support
--------

Community support in available in our `Discord
Community support is available in our `Discord
server <https://discord.gg/KMg4VwszAd>`__.

For professionals / Enteprise, we offer `Dedicated support <https://www.scaleoutsystems.com/start#pricing>`__.
For professionals / Enterprise, we offer `Dedicated support <https://www.scaleoutsystems.com/start#pricing>`__.

.. meta::
:description lang=en:
Expand Down
Loading
Loading