You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(pipeline): make StepContext generic by moving domain fields to subclasses
Strip StepContext down to sample + metadata only; domain-specific fields
(skillbook, agent_output, reflection, etc.) are added via subclassing.
Update branch merge functions to inspect subclass fields via type(ctxs[0]),
accept pre-built StepContext in run()/run_async() instead of raw samples,
and add background_stats() for monitoring background thread progress.
The engine never reads anything beyond `sample` and `metadata`. All domain-specific fields are added by subclassing.
79
+
80
+
#### Subclassing for domain fields
81
+
82
+
Consuming applications subclass `StepContext` to add named fields for concepts shared across their pipelines:
83
+
84
+
```python
85
+
@dataclass(frozen=True)
86
+
classACEContext(StepContext):
87
+
# Shared across all ACE pipelines
88
+
skillbook: Skillbook |None=None
89
+
environment: TaskEnvironment |None=None
90
+
91
+
# Produced by steps (None until the providing step runs)
92
+
agent_output: AgentOutput |None=None
93
+
environment_result: EnvironmentResult |None=None
94
+
reflection: ReflectorOutput |None=None
95
+
skill_manager_output: UpdateBatch |None=None
96
+
97
+
# Runner bookkeeping
98
+
epoch: int=1
99
+
total_epochs: int=1
100
+
step_index: int=0
101
+
total_steps: int=0
102
+
```
103
+
104
+
The `requires`/`provides` validation works on attribute names (strings) — it checks that the field exists on the context object at runtime, so it is subclass-agnostic. A step that declares `requires = {"skillbook"}` works whether the context is `ACEContext` or any other subclass that has a `skillbook` attribute.
105
+
106
+
Data that is specific to a single integration or step goes in `metadata` to prevent field accumulation on the subclass. For example, `metadata["browser_history"]` for browser-use or `metadata["transcript_path"]` for Claude Code.
107
+
108
+
#### Immutable update patterns
109
+
79
110
Updating metadata follows the same immutable pattern as any other field:
`frozen=True` makes mutation a hard error at runtime rather than a subtle bug. It also makes `Branch` safe by default — since `StepContext` is immutable, all branches can receive the same object without risk; no deep copy is needed.
94
125
95
-
**Field naming rule:** Named fields (`agent_output`, `reflection`) are reserved for concepts shared across all ACE pipelines. Integration-specific data always goes in `metadata`. This prevents the base class from accumulating fields over time as integrations are added.
96
-
97
126
---
98
127
99
128
## Pipeline
@@ -121,10 +150,10 @@ pipe = (
121
150
)
122
151
```
123
152
124
-
**Fan-out across samples:**
153
+
**Fan-out across contexts:**
125
154
126
155
```python
127
-
pipe.run(samples, workers=4) # same pipeline, N samples in parallel
156
+
pipe.run(contexts, workers=4) # same pipeline, N contexts in parallel
128
157
```
129
158
130
159
### Inner pipeline as a fan-out step
@@ -135,10 +164,11 @@ A `Pipeline`-as-`Step` receives one context and must return one context — but
135
164
classMultiSearchStep:
136
165
"""Generates N queries from one context, runs them in parallel, merges."""
return ctx.replace(agent_output=merge(results)) # N → 1
142
172
```
143
173
144
174
`sub_pipe.run()` is a top-level runner call, so `async_boundary` and `workers` on its inner steps fire normally. From the outer pipeline's perspective, `MultiSearchStep` is a black box that takes one context and returns one context — the fan-out is an internal implementation detail.
@@ -265,7 +295,7 @@ for step in self.steps:
265
295
ctx =await asyncio.to_thread(step, ctx)
266
296
```
267
297
268
-
Pipeline entry points: `pipe.run(samples)` for sync contexts, `await pipe.run_async(samples)` for async contexts (e.g. inside browser-use).
298
+
Pipeline entry points: `pipe.run(contexts)` for sync callers, `await pipe.run_async(contexts)` for async callers (e.g. inside browser-use).
269
299
270
300
This type is about **not blocking**. Nothing runs in parallel — the pipeline is still sequential, it just yields the thread during waits.
271
301
@@ -394,7 +424,7 @@ These two knobs control different thread pools and do not interact:
394
424
395
425
| Knob | Pool | Controls |
396
426
|---|---|---|
397
-
|`pipe.run(samples, workers=N)`| foreground pool | how many samples run through pre-boundary steps simultaneously |
427
+
|`pipe.run(contexts, workers=N)`| foreground pool | how many contexts run through pre-boundary steps simultaneously |
398
428
|`step.max_workers = K`| background pool per step class | how many instances of that step run in the background simultaneously |
399
429
400
430
A sample leaves the foreground pool when it crosses the `async_boundary` point and enters the background step's pool. With `workers=4` and `ReflectStep.max_workers=3`, you can have 4 samples in Agent/Evaluate and 3 reflections running concurrently — two separate pools, no multiplication.
@@ -413,15 +443,14 @@ Failure semantics differ depending on which side of the `async_boundary` a step
**Background steps** (after the boundary): the caller has already moved on, so exceptions cannot propagate. Background failures are captured and attached to the `SampleResult` — nothing is dropped silently.
@@ -442,60 +471,9 @@ When a `Branch` step fails, `failed_at` is `"Branch"` and `error` is a `BranchEr
442
471
443
472
Retry logic is the responsibility of individual steps, not the pipeline.
444
473
445
-
**Shutdown:**`wait_for_learning(timeout=N)` raises `TimeoutError` if background steps have not drained within `N` seconds. Individual step implementations are responsible for their own per-call timeouts (e.g. LLM API call timeouts).
446
-
447
-
---
448
-
449
-
## Integrations as Pipelines
450
-
451
-
Each external framework integration (browser-use, LangChain, Claude Code) is its own `Pipeline` subclass with integration-specific steps. It is **not** embedded as a step inside `ACEPipeline`.
452
-
453
-
```
454
-
ace/integrations/
455
-
browser_use/
456
-
pipeline.py ← BrowserPipeline
457
-
steps/
458
-
execute.py ← BrowserExecuteStep
459
-
langchain/
460
-
pipeline.py ← LangChainPipeline
461
-
steps/
462
-
execute.py ← LangChainExecuteStep
463
-
claude_code/
464
-
pipeline.py ← ClaudeCodePipeline
465
-
steps/
466
-
execute.py ← ClaudeCodeExecuteStep
467
-
persist.py ← PersistStep
468
-
```
469
-
470
-
Each integration pipeline replaces `AgentStep + EvaluateStep` with its own execute step, then reuses the shared `ReflectStep` and `UpdateStep`:
`ace/pipeline/steps/` contains only steps that are reusable across any pipeline — one file per class:
488
-
489
-
```
490
-
ace/pipeline/steps/
491
-
__init__.py
492
-
agent.py ← AgentStep
493
-
evaluate.py ← EvaluateStep
494
-
reflect.py ← ReflectStep
495
-
update.py ← UpdateStep
496
-
```
474
+
**Shutdown:**`wait_for_background(timeout=N)` raises `TimeoutError` if background steps have not drained within `N` seconds. Individual step implementations are responsible for their own per-call timeouts (e.g. LLM API call timeouts).
497
475
498
-
Integration-specific steps live next to their pipeline, not here.
476
+
**Monitoring:**`background_stats()` returns a `dict` with `active` and `completed` counts for background threads. Thread-safe — can be called from any thread while the pipeline is running. This is the public API for monitoring background progress; callers should not access `_bg_lock` or `_bg_threads` directly.
0 commit comments