Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,5 @@ clean:
rm -rf fedn.rst
rm -rf _build/

apidoc:
sphinx-apidoc --ext-autodoc --module-first -o . ../fedn ../*tests* ../fedn/cli* ../fedn/common* ../fedn/network/api/v1* ../fedn/network/grpc/fedn_pb2.py ../fedn/network/grpc/fedn_pb2_grpc.py ../fedn/network/api/server.py ../fedn/network/controller/controlbase.py

html: clean apidoc
sphinx-build . _build
4 changes: 3 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
FEDn is using sphinx with reStructuredText.
Scaleout Edge is using sphinx with reStructuredText.

# Install sphinx
pip install -r requirements.txt

# Updated build Script
cd docs/
Expand Down
55 changes: 11 additions & 44 deletions docs/aggregators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,15 @@ During a training session, the combiners will instantiate an Aggregator and use
:align: center

The figure above illustrates the overall workflow. When a client completes a model update, the model parameters are streamed to the combiner,
and a model update message is sent. The parameters are saved to a file on disk, and the update message is passed to a callback function, ``on_model_update``.
This function validates the model update and, if successful, places the update message in an aggregation queue.
The model parameters are saved to disk at a configurable storage location within the combiner to prevent exhausting RAM.
As multiple clients submit updates, the aggregation queue accumulates. Once specific criteria are met, another method, ``combine_models``,
begins processing the queue, aggregating models according to the specifics of the scheme (e.g., FedAvg, FedAdam).
and a model update message is sent. The model parameters are saved to disk at a configurable storage location within the combiner to prevent exhausting RAM.
As multiple clients submit updates, the aggregation queue accumulates. Once specific criteria are met, the combiner begins processing
the queue, aggregating models according to the specifics of the scheme (e.g., FedAvg, FedAdam).


Using built-in Aggregators
--------------------------

FEDn supports the following aggregation algorithms:
Scaleout Edge supports the following aggregation algorithms:

- FedAvg (default)
- FedAdam (FedOpt)
Expand Down Expand Up @@ -55,7 +53,7 @@ Training sessions can be configured to use a given aggregator. For example, to u

.. note::

The FedOpt family of methods use server-side momentum. FEDn resets the aggregator for each new session.
The FedOpt family of methods use server-side momentum. Scaleout Edge resets the aggregator for each new session.
This means that the history will will also be reset, i.e. the momentum terms will be forgotten.
When using FedAdam, FedYogi and FedAdaGrad, the user needs to strike a
balance between the number of rounds in the session from a convergence and utility perspective.
Expand All @@ -74,47 +72,16 @@ Several additional parameters that guide general behavior of the aggregation flo
- Whether to retain or delete model update files after they have been processed (default is to delete them)


Extending FEDn with new Aggregators
-----------------------------------
Implement own Aggregators
-------------------------

A developer can extend FEDn with his/her own Aggregator(s) by implementing the interface specified in
:py:mod:`fedn.network.combiner.aggregators.aggregatorbase.AggregatorBase`. This involes implementing the two methods:
Scaleout Edge supports a flexible architecture that allows developers to implement custom aggregation logic beyond the built-in options.
To define and register your own aggregator, you should use the server functions interface, where aggregation behavior can be customized to suit specific research or production needs.

- ``on_model_update`` (perform model update validation before update is placed on queue, optional)
- ``combine_models`` (process the queue and aggregate updates)

**on_model_update**

The ``on_model_update`` callback recieves the model update messages from clients (including all metadata) and can be used to perform validation and
potential transformation of the model update before it is placed on the aggregation queue (see image above).
The base class implements a default callback that checks that all metadata assumed by the aggregation algorithms FedAvg and FedOpt is available. The callback could also be used to implement custom pre-processing and additional checks including strategies
to filter out updates that are suspected to be corrupted or malicious.

**combine_models**

When a certain criteria is met, e.g. if all clients have sent updates, or the round has times out, the ``combine_model_update`` method
processes the model update queue, producing an aggregated model. This is the main extension point where the
numerical details of the aggregation scheme is implemented. The best way to understand how to implement this method is to study the built-in aggregation algorithms:

- :py:mod:`fedn.network.combiner.aggregators.fedavg` (weighted average of parameters)
- :py:mod:`fedn.network.combiner.aggregators.fedopt` (compute pseudo-gradients and apply a server-side optmizer)

To add an aggregator plugin ``myaggregator``, the developer implements the interface and places a file called ‘myaggregator.py’ in the folder ‘fedn.network.combiner.aggregators’.
This extension can then simply be called as such:

.. code:: python

session_config = {
"helper": "numpyhelper",
"id": "experiment_myaggregator",
"aggregator": "myaggregator",
"rounds": 10
}

result_myaggregator = client.start_session(**session_config)
For detailed instructions and examples on how to implement new aggregators, see the section on :ref:`server-functions`.


.. meta::
:description lang=en:
Aggregators are responsible for combining client model updates into a combiner-level global model.
:keywords: Federated Learning, Aggregators, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems
:keywords: Federated Learning, Aggregators, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems, Scaleout Edge
32 changes: 16 additions & 16 deletions docs/apiclient.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Using the API Client
====================

FEDn comes with an *APIClient* - a Python3 library that is used to interact with FEDn programmatically.
Scaleout Edge comes with an *APIClient* - a Python3 library that is used to interact with your project programmatically.

This guide assumes that the user has aleady taken the :ref:`quickstart-label` tutorial. If this is not the case, please start there to learn how to set up a FEDn Studio project and learn
This guide assumes that the user has aleady taken the :ref:`quickstart-label` tutorial. If this is not the case, please start there to learn how to set up a Scaleout Edge project and learn
to connect clients. In this guide we will build on that same PyTorch example (MNIST), showing how to use the APIClient to control training sessions, use different aggregators, and to retrieve models and metrics.

**Installation**
Expand All @@ -14,47 +14,47 @@ The APIClient is available as a Python package on PyPI, and can be installed usi

.. code-block:: bash

$ pip install fedn
$ pip install scaleout

**Connect the APIClient to the FEDn project**
**Connect the APIClient to the Scaleout Edge project**

To access the API you need the URL to the controller-host, as well as an admin API token. You
obtain these from your Studio project. Navigate to your "Project settings" and copy the "Project url", this is the controller host address:
obtain these from your Scaleout Edge project. Navigate to your "Project settings" and copy the "Project url", this is the controller host address:

.. image:: img/find_controller_url.png

To obtain an admin API token press "Generate" in the "Generate Admin token" section and copy the token:

.. image:: img/generate_admin_token.png

To initalize the connection to the FEDn REST API:
To initalize the connection to the Scaleout REST API:

.. code-block:: python

>>> from fedn import APIClient
>>> from scaleout import APIClient
>>> client = APIClient(host="<controller-host>", token="<access-token>", secure=True, verify=True)

Alternatively, the access token can be sourced from an environment variable.

.. code-block:: bash

$ export FEDN_AUTH_TOKEN=<access-token>
$ export SCALEOUT_AUTH_TOKEN=<access-token>

Then passing a token as an argument is not required.

.. code-block:: python

>>> from fedn import APIClient
>>> from scaleout import APIClient
>>> client = APIClient(host="<controller-host>", secure=True, verify=True)

We are now ready to work with the API.

We here assume that you have worked through steps 1-2 in the quisktart tutorial, i.e. that you have created the compute package and seed model on your local machine.
In the next step, we will use the API to upload these objects to the Studio project (corresponding to step 3 in the quickstart tutorial).
In the next step, we will use the API to upload these objects to the Scaleout Edge project (corresponding to step 3 in the quickstart tutorial).

**Set the active compute package and seed model**

To set the active compute package in the FEDn Studio Project:
To set the active compute package in the Scaleout Edge Project:

.. code:: python

Expand All @@ -78,7 +78,7 @@ using the default aggregator (FedAvg):
>>> model_id = models[-1]['model']
>>> validations = client.get_validations(model_id=model_id)

You can follow the progress of the training in the Studio UI.
You can follow the progress of the training in the Scaleout Edge UI.

To run a session using the FedAdam aggregator using custom hyperparamters:

Expand Down Expand Up @@ -143,14 +143,14 @@ To get a specific session:

>>> session = client.get_session(id="session_id")

For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`.
For more information on how to use the APIClient, see the :py:mod:`scaleout-client.scaleout.network.api.client`.
There is also a collection of Jupyter Notebooks showcasing more advanced use of the API, including how to work with other built-in aggregators and how to automate hyperparameter tuning:

- `API Example <https://github.com/scaleoutsystems/fedn/tree/master/examples/api-tutorials>`_ .
- `API Example <https://github.com/scaleoutsystems/scaleout-client/python/examples/api-tutorials>`_ .


.. meta::
:description lang=en:
FEDn comes with an APIClient - a Python3 library that can be used to interact with FEDn programmatically.
:keywords: Federated Learning, APIClient, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems
Scaleout Edge comes with an APIClient - a Python3 library that can be used to interact with Scaleout Edge programmatically.
:keywords: Federated Learning, APIClient, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems, Scaleout Edge

114 changes: 82 additions & 32 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,57 +3,107 @@
Architecture overview
=====================

Constructing a federated model with FEDn amounts to a) specifying the details of the client-side training code and data integrations, and b) deploying the federated network. A FEDn network, as illustrated in the picture below, is made up of components into three different tiers: the *Controller* tier (3), one or more *Combiners* in second tier (2), and a number of *Clients* in tier (1).
The combiners forms the backbone of the federated ML orchestration mechanism, while the Controller tier provides discovery services and controls to coordinate training over the federated network.
By horizontally scaling the number of combiners, one can meet the needs of a growing number of clients.
This page provides an overview of the **Scaleout Edge architecture**. What
follows is a conceptual description of the components that make up a Scaleout
Edge network and how they interact during a federated training session.

A Scaleout Edge network consists of three tiers:

- **Tier 1: Clients**
- **Tier 2: Combiners**
- **Tier 3: Controller and supporting services**

These components work together to coordinate distributed, privacy-preserving
machine learning across a large number of participating data nodes.

.. image:: img/FEDn_network.png
:alt: FEDn network
.. image:: img/Scaleout_Edge_network.png
:alt: Scaleout Edge network
:width: 100%
:align: center

Tier 1 — Clients
----------------

A **Client** (gRPC client) is a data node holding private data and connecting to
a Combiner (gRPC server) to receive training tasks and validation requests during
federated sessions.

Key characteristics:

- Clients communicate **outbound only** using RPC.
No inbound or publicly exposed ports are required.
- Upon connecting to the network, a client receives a **compute package** from the
Controller or uses one that is locally available for the client. This package
contains e.g. training and validation code to execute locally.
- The compute package is defined by entry points in the client code and can be
customized to support various model types, frameworks, and even programming
languages.

Python, C++ and Kotlin client implementations are provided out-of-the-box, but clients may be
implemented in any language to suit specific hardware or software environments.

Tier 2 — Combiners
------------------

A **Combiner** orchestrates and aggregates model updates coming from its local
group of clients. It is responsible for the mid-level federated learning workflow.

Key responsibilities:

- Running a dedicated gRPC server for interacting with clients and the Controller.
- Executing the orchestration plan defined in the global **compute plan**
provided by the Controller.
- Reducing client model updates into a single **combiner-level model**.

Because each Combiner operates independently, the total number of clients that
can be supported scales with the number of deployed Combiners. Combiners may be
placed in the cloud, on fog/edge nodes, or in any environment suited for running
the aggregation service.

**The clients: tier 1**
Tier 3 — Controller and base services
-------------------------------------

A Client (gRPC client) is a data node, holding private data and connecting to a Combiner (gRPC server) to receive model update requests and model validation requests during training sessions.
Importantly, clients uses remote procedure calls (RPC) to ask for model updates tasks, thus the clients not require any open ingress ports! A client receives the code (called package or compute package) to be executed from the *Controller*
upon connecting to the network, and thus they only need to be configured prior to connection to read the local datasets during training and validation. The package is based on entry points in the client code, and can be customized to fit the needs of the user.
This allows for a high degree of flexibility in terms of what kind of training and validation tasks that can be performed on the client side. Such as different types of machine learning models and framework, and even programming languages.
A python3 client implementation is provided out of the box, and it is possible to write clients in a variety of languages to target different software and hardware requirements.
Tier 3 contains several services, with the **Controller** being the central
component coordinating global training. The Controller has three primary roles:

**The combiners: tier 2**
1. **Global orchestration**
It defines the overall training strategy, distributes the compute plan, and
specifies how combiner-level models should be combined into a global model.

A combiner is an actor whose main role is to orchestrate and aggregate model updates from a number of clients during a training session.
When and how to trigger such orchestration are specified in the overall *compute plan* laid out by the *Controller*.
Each combiner in the network runs an independent gRPC server, providing RPCs for interacting with the federated network it controls.
Hence, the total number of clients that can be accommodated in a FEDn network is proportional to the number of active combiners in the FEDn network.
Combiners can be deployed anywhere, e.g. in a cloud or on a fog node to provide aggregation services near the cloud edge.
2. **Global state management**
The Controller maintains the **model trail**—an immutable record of global
model updates forming the training timeline.

**The controller: tier 3**
3. **Discovery and connectivity**
It provides discovery services and mediates connections between clients and
combiners. For this purpose, the Controller exposes a standard REST API used
by RPC clients/servers and by user interfaces.

Tier 3 does actually contain several components and services, but we tend to associate it with the *Controller* the most. The *Controller* fills three main roles in the FEDn network:
Additional Tier 3 services include:

1. it lays out the overall, global training strategy and communicates that to the combiner network.
It also dictates the strategy to aggregate model updates from individual combiners into a single global model,
2. it handles global state and maintains the *model trail* - an immutable trail of global model updates uniquely defining the federated ML training timeline, and
3. it provides discovery services, mediating connections between clients and combiners. For this purpose, the *Controller* exposes a standard REST API both for RPC clients and servers, but also for user interfaces and other services.
- **Reducer**
Aggregates the combiner-level models into a single global model.

Tier 3 also contain a *Reducer* component, which is responsible for aggregating combiner-level models into a single global model. Further, it contains a *StateStore* database,
which is responsible for storing various states of the network and training sessions. The final global model trail from a traning session is stored in the *ModelRegistry* database.
- **StateStore**
Stores the state of the network, training sessions, and metadata.

- **ModelRegistry**
Stores the final global model trail after a completed training session.

**Notes on aggregating algorithms**
Notes on aggregation algorithms
-------------------------------

FEDn is designed to allow customization of the FedML algorithm, following a specified pattern, or programming model.
Model aggregation happens on two levels in the network. First, each Combiner can be configured with a custom orchestration and aggregation implementation, that reduces model updates from Clients into a single, *combiner level* model.
Then, a configurable aggregation protocol on the *Controller* level is responsible for combining the combiner-level models into a global model. By varying the aggregation schemes on the two levels in the system,
many different possible outcomes can be achieved. Good starting configurations are provided out-of-the-box to help the user get started. See :ref:`agg-label` and API reference for more details.
Scaleout Edge includes several **built-in aggregators** for common FL workflows
(see :ref:`agg-label`). For advanced scenarios, users may override the
Combiner-level behavior using **server functions** (:ref:`server-functions`),
allowing custom orchestration or aggregation logic.

Aggregation happens in two stages:
1) each Combiner reduces client updates into a *combiner-level model*, and
2) the Controller (Reducer) combines these into the final global model.

.. meta::
:description lang=en:
Architecture overview - An overview of the FEDn federated learning platform architecture.
:keywords: Federated Learning, Architecture, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems
Architecture overview - An overview of the Scaleout Edge federated learning platform architecture.
:keywords: Federated Learning, Architecture, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems, Scaleout Edge

Loading
Loading