Skip to content

Commit 90996c0

Browse files
authored
Aggregator shouldn't refer to persistence DB in case of evaluation (#1632)
* Enhance Collaborator module with improved error handling and documentation. Refactor the `run` method to include detailed error logging and a new `_execute_collaborator_rounds` method for better task management. Update the `start_` function to handle exceptions during collaborator initialization and execution, ensuring critical errors are logged and communicated to the user. Additionally, improve type hints and docstrings for better code clarity and maintainability. Signed-off-by: Rahul Garg <[email protected]> * Improve error logging in Collaborator class by ensuring consistent formatting of log messages. This change enhances the clarity of critical error messages and maintains a uniform style across the logging functionality. Signed-off-by: Rahul Garg <[email protected]> * .DS_Store banished! * Refine persistent checkpoint handling in Aggregator class to prevent recovery during evaluation mode. Update logging messages for clarity on checkpoint status. This change ensures that the system behaves correctly in different operational contexts. Signed-off-by: Rahul Garg <[email protected]> * Refine logging message in Aggregator class to clarify conditions for persistent checkpoint usage. The updated message improves understanding by specifying when checkpoints are disabled or when the experiment is in evaluation mode. Signed-off-by: Rahul Garg <[email protected]> --------- Signed-off-by: Rahul Garg <[email protected]>
1 parent 6524c75 commit 90996c0

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

openfl/component/aggregator/aggregator.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -150,15 +150,17 @@ def __init__(
150150
self.quit_job_sent_to = []
151151

152152
self.tensor_db = TensorDB()
153-
if persist_checkpoint:
153+
if persist_checkpoint and not self.assigner.is_task_group_evaluation():
154154
persistent_db_path = persistent_db_path or "tensor.db"
155155
logger.info(
156156
"Persistent checkpoint is enabled, setting persistent db at path %s",
157157
persistent_db_path,
158158
)
159159
self.persistent_db = PersistentTensorDB(persistent_db_path)
160160
else:
161-
logger.info("Persistent checkpoint is disabled")
161+
logger.info(
162+
"Either persistent checkpoint is disabled or the experiment is in evaluation mode"
163+
)
162164
self.persistent_db = None
163165
# FIXME: I think next line generates an error on the second round
164166
# if it is set to 1 for the aggregator.
@@ -225,8 +227,10 @@ def __init__(
225227

226228
self.secagg = SecAggSetup(self.uuid, self.authorized_cols, self.tensor_db)
227229

228-
if self.persistent_db and self._recover():
229-
logger.info("Recovered state of aggregator")
230+
# Only recover from persistent DB if not in evaluation mode
231+
if self.persistent_db and not self.assigner.is_task_group_evaluation():
232+
if self._recover():
233+
logger.info("Recovered state of aggregator")
230234

231235
# TODO: Aggregator has no concrete notion of round_begin.
232236
# https://github.com/securefederatedai/openfl/pull/1195#discussion_r1879479537

0 commit comments

Comments
 (0)