-
Notifications
You must be signed in to change notification settings - Fork 661
[Optimize] Reduce comm overhead of engine-worker by obtaining requests asynchronously #5105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -159,6 +159,27 @@ def __init__(self, fd_config: FDConfig, ranks: int = 1, local_rank: int = 0) -> | |||||||||
|
|
||||||||||
| self.max_chips_per_node = 16 if current_platform.is_iluvatar() else 8 | ||||||||||
|
|
||||||||||
| def _exist_tasks_from_engine(self): | ||||||||||
| """ | ||||||||||
| Check if there exists new tasks sent from engine process | ||||||||||
| """ | ||||||||||
| if envs.DISABLE_ENGINE_WORKER_ASYNC_TASK_COMM: | ||||||||||
| return self.task_queue.num_tasks() > 0 | ||||||||||
| else: | ||||||||||
| return self.local_synced_tasks is not None | ||||||||||
|
|
||||||||||
| def _get_tasks_from_engine(self): | ||||||||||
| """ | ||||||||||
| Get new tasks that sent from engine process | ||||||||||
| """ | ||||||||||
| if envs.DISABLE_ENGINE_WORKER_ASYNC_TASK_COMM: | ||||||||||
| return self.task_queue.get_tasks() | ||||||||||
| else: | ||||||||||
| new_tasks, read_finished = self.local_synced_tasks, self.all_local_tp_synced | ||||||||||
Jiang-Jia-Jun marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| self.local_synced_tasks = None | ||||||||||
| self.all_local_tp_synced = False | ||||||||||
| return new_tasks, read_finished | ||||||||||
|
||||||||||
|
|
||||||||||
| def init_health_status(self) -> None: | ||||||||||
| """ | ||||||||||
| Initialize the health status of the worker. | ||||||||||
|
|
@@ -406,7 +427,7 @@ def event_loop_normal(self) -> None: | |||||||||
|
|
||||||||||
| # The first worker detects whether there are tasks in the task queue | ||||||||||
| if tp_rank == 0: | ||||||||||
| if self.task_queue.num_tasks() > 0: | ||||||||||
| if self._exist_tasks_from_engine(): | ||||||||||
|
||||||||||
| if self._exist_tasks_from_engine(): | |
| if self._exist_requests_from_engine(): |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Logging inconsistency: When async mode is enabled, this log message will output the raw object self.local_synced_tasks which could be None or a tuple/list. This differs from the original behavior which logged the number of tasks. Consider:
if envs.DISABLE_ENGINE_WORKER_ASYNC_TASK_COMM:
logger.info(f"current task queue data: {self.task_queue.num_tasks()}")
else:
logger.info(f"current task queue data: {self.local_synced_tasks}")Or provide more descriptive logging:
task_info = self.task_queue.num_tasks() if envs.DISABLE_ENGINE_WORKER_ASYNC_TASK_COMM else len(self.local_synced_tasks) if self.local_synced_tasks else 0
logger.info(f"current task queue data: {task_info}")| logger.info(f"current task queue data: {self.local_synced_tasks}") | |
| logger.info(f"current task queue data: {len(self.local_synced_tasks) if self.local_synced_tasks else 0}") |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name inconsistency: The variable is defined as self.local_synced_requests (line 164), but it's referenced as self.local_synced_tasks here. This will cause an AttributeError at runtime.
| logger.info(f"current task queue data: {self.local_synced_tasks}") | |
| logger.info(f"current task queue data: {self.local_synced_requests}") |
Jiang-Jia-Jun marked this conversation as resolved.
Show resolved
Hide resolved
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name inconsistency: The method is defined as _get_requests_from_engine (line 177), but it's called as self.task_queue._get_tasks_from_engine() here. This will cause an AttributeError at runtime. Should be self._get_requests_from_engine() instead.
| tasks, read_finish = self.task_queue._get_tasks_from_engine() | |
| tasks, read_finish = self._get_requests_from_engine() |
Uh oh!
There was an error while loading. Please reload this page.