-
Notifications
You must be signed in to change notification settings - Fork 9
feat(spider-py): Add support for client-end task graph grouping and chaining. #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sitaowang1998
merged 207 commits into
y-scope:main
from
sitaowang1998:python_client_task
Sep 3, 2025
Merged
Changes from all commits
Commits
Show all changes
207 commits
Select commit
Hold shift + click to select a range
d2ac198
Update yscope-dev-utils
sitaowang1998 cc8a097
Use boost install task
sitaowang1998 7f68bdc
Update install task variable names
sitaowang1998 2394649
Set CMP0074 to NEW to find boost
sitaowang1998 6ff3e33
Add uv to install script
sitaowang1998 7ae8a78
Fix cpp-lint root paths
sitaowang1998 56502f9
Fix clang-tidy file pattern
sitaowang1998 6f6becf
Limit build parallelism
sitaowang1998 a74523d
Bug fix
sitaowang1998 2345d22
Merge branch 'dep-concurrency' into yscope-dev-utils
sitaowang1998 77ee494
Bug fix
sitaowang1998 38b86c8
Merge branch 'dep-concurrency' into yscope-dev-utils
sitaowang1998 3698339
Bug fix
sitaowang1998 369e9f1
Rename variables to mirror CLP core
sitaowang1998 90aa5a2
Rename variables to mirror clp core
sitaowang1998 1769c95
Merge branch 'dep-concurrency' of github.com:sitaowang1998/spider int…
sitaowang1998 edaa834
Merge branch 'dep-concurrency' into yscope-dev-utils
sitaowang1998 7faac8f
Revert "Merge branch 'dep-concurrency' of github.com:sitaowang1998/sp…
sitaowang1998 f494a90
Add comment for deps parallelism default value
sitaowang1998 eb01bb2
Merge branch 'dep-concurrency' into yscope-dev-utils
sitaowang1998 ff2fe1c
Update yscope-dev-utils
sitaowang1998 d476e42
Merge branch 'main' into yscope-dev-utils
sitaowang1998 850126d
Merge branch 'yscope-dev-utils' into python_lint
sitaowang1998 65841a0
Add latest python lint config files
sitaowang1998 8564fc2
Update ruff lint tasks
sitaowang1998 a6e7d29
Fix ruff lint
sitaowang1998 573b448
Fix ruff lint
sitaowang1998 6ad72c3
Bug fix
sitaowang1998 ba7c6e5
Fix ruff
sitaowang1998 209acb1
Fix ruff
sitaowang1998 406b514
Reformat files
sitaowang1998 66892f5
Remove .inc from cpp linting
sitaowang1998 80f0a10
Merge branch 'yscope-dev-utils' into python_lint
sitaowang1998 3186a42
Add mypy and merge lint and test requirements.txt
sitaowang1998 0925938
Fix mysql connection type
sitaowang1998 6936a9b
Fix socket name type
sitaowang1998 7407090
Fix return type from db cursor
sitaowang1998 6a95b24
Fix db cursor return type
sitaowang1998 f29dcba
Fix mypy import untyped
sitaowang1998 453f209
Fix mypy and Popen
sitaowang1998 d1011c7
Fix generator type hint
sitaowang1998 a7dc642
Fix mypy
sitaowang1998 a7d92c2
Fix ruff
sitaowang1998 1c6213b
Simply socket return types.
sitaowang1998 0f13d07
Merge branch 'main' into python_lint
sitaowang1998 14d6326
Merge branch 'python_lint' into mypy_lint
sitaowang1998 55a727a
Add tombi lint tasks
sitaowang1998 c98fea1
Rename cpp build tasks
sitaowang1998 ef64454
Merge branch 'main' into mypy_lint
sitaowang1998 09ba8ad
Merge branch 'main' into tombi
sitaowang1998 4e7237e
Add basic python structure
sitaowang1998 1f944d0
Fix code structure
sitaowang1998 68992fc
Fix ruff lint
sitaowang1998 1127b22
Extend lint tasks to python directory
sitaowang1998 4c51c91
Add python build tasks
sitaowang1998 56878b2
Merge branch 'main' into mypy_lint
sitaowang1998 bb428c9
Merge branch 'mypy_lint' into tombi
sitaowang1998 944bd5e
Merge branch 'tombi' into build-task
sitaowang1998 d75bc15
Merge branch 'build-task' into python_structure
sitaowang1998 4b7eca2
Merge branch 'main' into tombi
sitaowang1998 8ddbda7
Merge branch 'tombi' into build-task
sitaowang1998 0d998ec
Merge branch 'build-task' into python_structure
sitaowang1998 6004d64
Fix yaml lint error
sitaowang1998 bcab5db
Fix tombi lint
sitaowang1998 a50d25a
Remove wrong mypy config
sitaowang1998 fbebb96
Fix typo and format file
sitaowang1998 0322bd5
Use typed msgpack and remove mypy config for msgpack from pyproject
sitaowang1998 2a08155
Update uv lock
sitaowang1998 dd29c23
Add task module
sitaowang1998 570cf2f
Merge branch 'python_structure' into python_task
sitaowang1998 9b2a175
Add data and improve type alias
sitaowang1998 32afe0a
Add task graph and reformat files
sitaowang1998 0b62f73
Merge branch 'main' into tombi
sitaowang1998 7384158
Merge branch 'main' into build-task
sitaowang1998 70ef90a
Merge branch 'main' into python_structure
sitaowang1998 1aae0cb
Merge branch 'main' into python_task
sitaowang1998 2f9b4a0
Fix typo
sitaowang1998 4e080ac
Fix set remove key error
sitaowang1998 b09515c
Increase min version of tombi
sitaowang1998 bad9675
Merge branch 'tombi' into build-task
sitaowang1998 e009ff4
Merge branch 'tombi' into python_structure
sitaowang1998 f11a3b6
Merge branch 'python_structure' into python_task
sitaowang1998 a0407a7
Merge branch 'main' into tombi
sitaowang1998 aa9a8de
Merge branch 'tombi' into build-task
sitaowang1998 3c7e794
Merge branch 'tombi' into python_structure
sitaowang1998 ca4ec29
Merge branch 'python_structure' into python_task
sitaowang1998 b4d6576
Add uv in README
sitaowang1998 23f31cd
Merge branch 'main' into build-task
sitaowang1998 0167ac0
Merge branch 'build-task' into python_structure
sitaowang1998 5a9cb76
Merge branch 'python_structure' into python_task
sitaowang1998 f447614
Fix redenduncy caused by merge
sitaowang1998 a84de83
Merge branch 'build-task' into python_structure
sitaowang1998 f94460f
Merge branch 'python_structure' into python_task
sitaowang1998 2d1b6fc
Merge branch 'main' into python_structure
sitaowang1998 5a35122
Merge branch 'main' into python_task
sitaowang1998 061d101
Merge branch 'main' into python_structure
sitaowang1998 451a80e
Merge branch 'main' into python_task
sitaowang1998 de149ee
Add integral types
sitaowang1998 d332dbe
Revert "Add integral types"
sitaowang1998 80b6772
Add package export control for type
sitaowang1998 f2ebaa6
Add floating point type
sitaowang1998 8c27b16
Restructure under src/spider
sitaowang1998 1b7ab77
Merge branch 'python_structure' into python_task
sitaowang1998 b7f1884
Merge branch 'python_structure' into python_type
sitaowang1998 a7aa6b0
Re-export spider.type under spider
sitaowang1998 95b98c8
Merge branch 'main' into python_structure
sitaowang1998 e7b119c
Merge branch 'main' into python_task
sitaowang1998 8ffbd03
Fix import path
sitaowang1998 5765acf
Add tdl types; Allow no docstring for override function
sitaowang1998 427dd6b
Merge branch 'main' into python_structure
sitaowang1998 a9c9e57
Merge branch 'main' into python_task
sitaowang1998 213d2fc
Add type conversion to tdl type
sitaowang1998 a1a86f3
Add to_tdl_str
sitaowang1998 97af71b
Add to_tdl_type_str to type package export
sitaowang1998 74f31e2
Add pytest and basic test structure
sitaowang1998 7d95706
Add python test tasks
sitaowang1998 96b5324
Rename some cpp tests and add python tests to GH workflow and doc
sitaowang1998 14ea6fb
Don't create __pycache__ when running pytest
sitaowang1998 854cd11
Fix missing link
sitaowang1998 5565c9e
Merge branch 'python_structure' of github.com:sitaowang1998/spider in…
sitaowang1998 a9791e5
Merge branch 'python_structure' into python_task
sitaowang1998 2504429
Merge branch 'python_structure' into python_test_setup
sitaowang1998 7ea4928
Use task env
sitaowang1998 01dd0c6
Fix task executor path
sitaowang1998 c6ccce6
Merge branch 'python_structure' into python_task
sitaowang1998 4cec09b
Merge branch 'python_structure' into python_test_setup
sitaowang1998 e022404
Merge branch 'main' into python_structure
sitaowang1998 8894b90
Merge branch 'main' into python_test_setup
sitaowang1998 01c9c64
Merge branch 'main' into python_task
sitaowang1998 7753153
Merge branch 'main' into python_structure
sitaowang1998 dd59915
Merge branch 'main' into python_test_setup
sitaowang1998 b86d47c
Merge branch 'main' into python_task
sitaowang1998 ac1d4ee
Merge branch 'python_structure' into python_type
sitaowang1998 0d38c43
Merge branch 'python_test_setup' into python_type
sitaowang1998 438f6e7
Remove unnecessary __init__.py files
sitaowang1998 2caad5d
Merge branch 'python_structure' into python_test_setup
sitaowang1998 c6815ce
Merge branch 'python_test_setup' into python_type
sitaowang1998 61e0651
Fix bugs
sitaowang1998 8b13abd
Add some type convertion tests
sitaowang1998 50b2b25
Bug fix
sitaowang1998 519dcd8
Merge branch 'python_test_setup' into python_type
sitaowang1998 8e2786b
Add more tests
sitaowang1998 a8c8641
Fix pytest
sitaowang1998 9fabc44
Merge branch 'python_test_setup' into python_type
sitaowang1998 e558f67
Improve code according to coderabbit
sitaowang1998 09d3fd0
Merge branch 'python_type' into python_client_task
sitaowang1998 ae2a935
Add reset id
sitaowang1998 008c6fb
Export core
sitaowang1998 cd60dc6
Merge branch 'python_task' into python_client_task
sitaowang1998 720265f
Add client task graph
sitaowang1998 ebe64c6
Add client taskgraph to spider export
sitaowang1998 404cf3b
Excempt _impl from private access check
sitaowang1998 3f9fe04
Add basic TaskFunction defintion
sitaowang1998 4d10f88
"Add default values for tasks"
sitaowang1998 fe4121f
Merge branch 'python_task' into python_client_task
sitaowang1998 f3ac001
Fix task IO type and add create_task from function
sitaowang1998 95b7773
Add client task group
sitaowang1998 a72e4a6
Add task graph chain
sitaowang1998 f289e5a
Reset ids after chain
sitaowang1998 2e829bc
Add export
sitaowang1998 ec6802d
Export TaskContext
sitaowang1998 3685dd7
Add unit tests
sitaowang1998 2862f01
Fix tuple check
sitaowang1998 03af08d
Fix size check in chain
sitaowang1998 c3dda18
Fix chain dependencies
sitaowang1998 2a1f33e
Fix id collision
sitaowang1998 0e4452c
Add more tests and more fixes
sitaowang1998 3705ef3
Satisfy ruff
sitaowang1998 fb8938a
Satisfy mypy
sitaowang1998 9ed09be
Bug fix
sitaowang1998 ef827a3
Remove unnecessary reset_ids
sitaowang1998 220a2b0
Fix task input id reset
sitaowang1998 56f7866
Fix comment grammar
sitaowang1998 638d577
Fix tuple check
sitaowang1998 2bcc4a4
Add guard for function with no argument
sitaowang1998 aa1723e
Disallow varargs in task function
sitaowang1998 2469008
Disallow variable tuple return in task function
sitaowang1998 082476d
Use identity check for Parameter.empty sentinel
sitaowang1998 72239f7
Fix error msg for ruff
sitaowang1998 e13acec
Make create_task private
sitaowang1998 84a6256
Split create_task into multiple functions
sitaowang1998 fe904b0
Bug fix
sitaowang1998 75793de
Bug fix
sitaowang1998 72a0920
Bug fix
sitaowang1998 fecdc68
Bug fix
sitaowang1998 08ce667
Restructure code to perpare for merge
sitaowang1998 4dc9623
Merge branch 'main' into python_client_task
sitaowang1998 ff76756
Style improvement
sitaowang1998 b9607ae
Use index instead of uuid to identify task in graph
sitaowang1998 75c8b76
Rename files
sitaowang1998 8e98c1e
Fix task input refs
sitaowang1998 fe306bc
Add test for task input output ref size
sitaowang1998 d0fb7e2
Avoid deep copy tasks
sitaowang1998 6fe936d
Fix chain task input output refs
sitaowang1998 82dab16
Merge branch 'main' into python_client_task
sitaowang1998 5ef3520
Add spider-py unit test tasks
sitaowang1998 b879134
Revert "Add spider-py unit test tasks"
sitaowang1998 831e399
Apply suggestions from code review
sitaowang1998 ff7f06a
Wrap intput output tuple in a class
sitaowang1998 a09b499
Rename variables to clarify indices
sitaowang1998 1051b51
Merge branch 'main' into python_client_task
sitaowang1998 bca62dd
Apply suggestions from code review
sitaowang1998 437b39d
Update function rename
sitaowang1998 2c28e9d
Rename functions
sitaowang1998 0e87a54
Merge branch 'main' into python_client_task
sitaowang1998 6af6770
Use RAII for task creation
sitaowang1998 608036e
Apply suggestions from code review
sitaowang1998 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,11 @@ | ||
| """Spider python client.""" | ||
|
|
||
| from .task import TaskContext | ||
| from .task_graph import chain, group, TaskGraph | ||
|
|
||
| __all__ = [ | ||
| "TaskContext", | ||
| "TaskGraph", | ||
| "chain", | ||
| "group", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| """Spider client Data module.""" | ||
|
|
||
|
|
||
| class Data: | ||
| """Represents a spider client data.""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| """Spider client task module.""" | ||
|
|
||
| import inspect | ||
| from collections.abc import Callable | ||
| from types import FunctionType, GenericAlias | ||
| from typing import get_args, get_origin | ||
|
|
||
| from spider_py import core | ||
| from spider_py.client.data import Data | ||
| from spider_py.core import TaskInput, TaskOutput, TaskOutputData, TaskOutputValue | ||
| from spider_py.type import to_tdl_type_str | ||
|
|
||
|
|
||
| class TaskContext: | ||
| """Spider task context.""" | ||
|
|
||
| # TODO: Implement task context for use in task executor | ||
|
|
||
|
|
||
| # NOTE: This type alias is for clarification purposes only. It does not enforce static type checks. | ||
| # Instead, we rely on the runtime check to ensure the first argument is `TaskContext`. To statically | ||
| # enforce the first argument to be `TaskContext`, `Protocol` is required, which is not compatible | ||
| # with `Callable` without explicit type casting. | ||
| TaskFunction = Callable[..., object] | ||
|
|
||
|
|
||
| def _is_tuple(t: type | GenericAlias) -> bool: | ||
| """ | ||
| :param t: | ||
| :return: Whether t is a tuple. | ||
| """ | ||
| return get_origin(t) is tuple | ||
|
|
||
|
|
||
| def _validate_and_convert_params(signature: inspect.Signature) -> list[TaskInput]: | ||
| """ | ||
| Validates the task parameters and converts them into a list of `core.TaskInput`. | ||
| :param signature: | ||
| :return: The converted task parameters. | ||
| :raises TypeError: If the parameters are invalid. | ||
| """ | ||
| params = list(signature.parameters.values()) | ||
| inputs = [] | ||
| if not params or params[0].annotation is not TaskContext: | ||
| msg = "First argument is not a TaskContext." | ||
| raise TypeError(msg) | ||
| for param in params[1:]: | ||
| if param.kind in {inspect.Parameter.VAR_POSITIONAL, inspect.Parameter.VAR_KEYWORD}: | ||
| msg = "Variadic parameters are not supported." | ||
| raise TypeError(msg) | ||
| if param.annotation is inspect.Parameter.empty: | ||
| msg = "Parameters must have type annotation." | ||
| raise TypeError(msg) | ||
| tdl_type_str = to_tdl_type_str(param.annotation) | ||
| inputs.append(TaskInput(tdl_type_str, None)) | ||
| return inputs | ||
|
|
||
|
|
||
| def _validate_and_convert_return(signature: inspect.Signature) -> list[TaskOutput]: | ||
| """ | ||
| Validates the task returns and converts them into a list of `core.TaskOutput`. | ||
| :param signature: | ||
| :return: The converted task returns. | ||
| :raises TypeError: If the return type is invalid. | ||
| """ | ||
| returns = signature.return_annotation | ||
| outputs = [] | ||
| if returns is inspect.Parameter.empty: | ||
| msg = "Return type must have type annotation." | ||
| raise TypeError(msg) | ||
|
|
||
| if not _is_tuple(returns): | ||
| tdl_type_str = to_tdl_type_str(returns) | ||
| if returns is Data: | ||
| outputs.append(TaskOutput(tdl_type_str, TaskOutputData())) | ||
| else: | ||
| outputs.append(TaskOutput(tdl_type_str, TaskOutputValue())) | ||
| return outputs | ||
|
|
||
| args = get_args(returns) | ||
| if Ellipsis in args: | ||
| msg = "Variable-length tuple return types are not supported." | ||
| raise TypeError(msg) | ||
| for arg in args: | ||
| tdl_type_str = to_tdl_type_str(arg) | ||
| if arg is Data: | ||
| outputs.append(TaskOutput(tdl_type_str, TaskOutputData())) | ||
| else: | ||
| outputs.append(TaskOutput(tdl_type_str, TaskOutputValue())) | ||
| return outputs | ||
|
|
||
|
|
||
| def create_task(func: TaskFunction) -> core.Task: | ||
| """ | ||
| Creates a core Task object from the task function. | ||
| :param func: | ||
| :return: The created core Task object. | ||
| :raise TypeError: If the function signature contains unsupported types. | ||
| """ | ||
| if not isinstance(func, FunctionType): | ||
| msg = "`func` is not a function." | ||
| raise TypeError(msg) | ||
| signature = inspect.signature(func) | ||
| return core.Task( | ||
| function_name=func.__qualname__, | ||
| task_inputs=_validate_and_convert_params(signature), | ||
| task_outputs=_validate_and_convert_return(signature), | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| """Spider client TaskGraph module.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import TYPE_CHECKING | ||
|
|
||
| from spider_py import core | ||
| from spider_py.client.task import create_task, TaskFunction | ||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import Sequence | ||
|
|
||
|
|
||
| class TaskGraph: | ||
| """ | ||
| Represents a client-side task graph. | ||
|
|
||
| This class is a wrapper of `spider_py.core.Task`. | ||
| """ | ||
|
|
||
| def __init__(self) -> None: | ||
| """Initializes TaskGraph.""" | ||
| self._impl = core.TaskGraph() | ||
|
|
||
|
|
||
| def group(tasks: Sequence[TaskFunction | TaskGraph]) -> TaskGraph: | ||
| """ | ||
| Groups task functions and task graph into a single task graph. | ||
| :param tasks: List of task functions or task graphs. | ||
| :return: The new task graph. | ||
| """ | ||
| graph = TaskGraph() | ||
| for task in tasks: | ||
| if callable(task): | ||
| graph._impl.add_task(create_task(task)) | ||
| else: | ||
| graph._impl.merge_graph(task._impl) | ||
|
|
||
| return graph | ||
|
|
||
|
|
||
| def chain(parent: TaskFunction | TaskGraph, child: TaskFunction | TaskGraph) -> TaskGraph: | ||
| """ | ||
| Chains two task functions or task graphs into a single task graph. | ||
| :param parent: | ||
| :param child: | ||
| :return: The new task graph. | ||
| :raises TypeError: If the parent outputs and child inputs do not match. | ||
| """ | ||
| parent_core_graph: core.TaskGraph | ||
| child_core_graph: core.TaskGraph | ||
|
|
||
| if callable(parent): | ||
| parent_core_graph = core.TaskGraph() | ||
| parent_core_graph.add_task(create_task(parent)) | ||
| else: | ||
| parent_core_graph = parent._impl | ||
|
|
||
| if callable(child): | ||
| child_core_graph = core.TaskGraph() | ||
| child_core_graph.add_task(create_task(child)) | ||
| else: | ||
| child_core_graph = child._impl | ||
|
|
||
| graph = TaskGraph() | ||
| graph._impl = core.TaskGraph.chain_graph(parent_core_graph, child_core_graph) | ||
| return graph |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should move these utility functions into core's task so that the creation of a task is more like RAII. The parameter and return type checks can be inside
__init__, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A major problem here is the circular includes of core's Task and client's Task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then can we justify why we need two tasks? Like can we directly use core's Task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core's tasks contains details that should be hidden from user. We could change the underlying implementation later. (E.g. Add a language column).