diff --git a/docs/concepts/pipeline-wrapper.md b/docs/concepts/pipeline-wrapper.md index 0372d85d..63280348 100644 --- a/docs/concepts/pipeline-wrapper.md +++ b/docs/concepts/pipeline-wrapper.md @@ -176,6 +176,69 @@ async def run_chat_completion_async(self, model: str, messages: List[dict], body ) ``` +## Streaming from Multiple Components + +!!! info "Automatic Multi-Component Streaming" + Hayhooks automatically enables streaming for **all** streaming-capable components in your pipeline - no special configuration needed! + +When your pipeline contains multiple components that support streaming (e.g., multiple LLMs), all of them stream their outputs automatically as the pipeline executes. + +### Example: Sequential LLMs with Streaming + +```python +class MultiLLMWrapper(BasePipelineWrapper): + def setup(self) -> None: + from haystack.components.builders import ChatPromptBuilder + from haystack.components.generators.chat import OpenAIChatGenerator + from haystack.dataclasses import ChatMessage + + self.pipeline = Pipeline() + + # First LLM - initial answer + self.pipeline.add_component( + "prompt_1", + ChatPromptBuilder( + template=[ + ChatMessage.from_system("You are a helpful assistant."), + ChatMessage.from_user("{{query}}") + ] + ) + ) + self.pipeline.add_component("llm_1", OpenAIChatGenerator(model="gpt-4o-mini")) + + # Second LLM - refines the answer using Jinja2 to access ChatMessage attributes + self.pipeline.add_component( + "prompt_2", + ChatPromptBuilder( + template=[ + ChatMessage.from_system("You are a helpful assistant that refines responses."), + ChatMessage.from_user( + "Previous response: {{previous_response[0].text}}\n\nRefine this." + ) + ] + ) + ) + self.pipeline.add_component("llm_2", OpenAIChatGenerator(model="gpt-4o-mini")) + + # Connect components - LLM 1's replies go directly to prompt_2 + self.pipeline.connect("prompt_1.prompt", "llm_1.messages") + self.pipeline.connect("llm_1.replies", "prompt_2.previous_response") + self.pipeline.connect("prompt_2.prompt", "llm_2.messages") + + def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Generator: + question = get_last_user_message(messages) + + # Both LLMs will stream automatically! + return streaming_generator( + pipeline=self.pipeline, + pipeline_run_args={"prompt_1": {"template_variables": {"query": question}}} + ) +``` + +**What happens:** Both LLMs **automatically stream** their responses token by token as the pipeline executes. The second prompt builder uses Jinja2 syntax (`{{previous_response[0].text}}`) to access the text content from the first LLM's `ChatMessage` response. **No custom extraction components needed** - streaming works for any number of components. + +See the [Multi-LLM Streaming Example](https://github.com/deepset-ai/hayhooks/tree/main/examples/pipeline_wrappers/multi_llm_streaming) for a complete working implementation. + ## File Upload Support Hayhooks can handle file uploads by adding a `files` parameter: diff --git a/examples/README.md b/examples/README.md index 2eafe761..c0921929 100644 --- a/examples/README.md +++ b/examples/README.md @@ -6,6 +6,7 @@ This directory contains various examples demonstrating different use cases and f | Example | Description | Key Features | Use Case | |---------|-------------|--------------|----------| +| [multi_llm_streaming](./pipeline_wrappers/multi_llm_streaming/) | Multiple LLM components with automatic streaming | • Two sequential LLMs
• Automatic multi-component streaming
• No special configuration needed
• Shows default streaming behavior | Demonstrating how hayhooks automatically streams from all components in a pipeline | | [async_question_answer](./pipeline_wrappers/async_question_answer/) | Async question-answering pipeline with streaming support | • Async pipeline execution
• Streaming responses
• OpenAI Chat Generator
• Both API and chat completion interfaces | Building conversational AI systems that need async processing and real-time streaming responses | | [chat_with_website](./pipeline_wrappers/chat_with_website/) | Answer questions about website content | • Web content fetching
• HTML to document conversion
• Content-based Q&A
• Configurable URLs | Creating AI assistants that can answer questions about specific websites or web-based documentation | | [chat_with_website_mcp](./pipeline_wrappers/chat_with_website_mcp/) | MCP-compatible website chat pipeline | • MCP (Model Context Protocol) support
• Website content analysis
• API-only interface
• Simplified deployment | Integrating website analysis capabilities into MCP-compatible AI systems and tools | diff --git a/examples/pipeline_wrappers/multi_llm_streaming/README.md b/examples/pipeline_wrappers/multi_llm_streaming/README.md new file mode 100644 index 00000000..a134cbbb --- /dev/null +++ b/examples/pipeline_wrappers/multi_llm_streaming/README.md @@ -0,0 +1,98 @@ +# Multi-LLM Streaming Example + +This example demonstrates hayhooks' automatic multi-component streaming support. + +## Overview + +The pipeline contains **two LLM components in sequence**: + +1. **LLM 1** (`gpt-5-nano` with `reasoning_effort: low`): Provides a short, concise initial answer to the user's question +2. **LLM 2** (`gpt-5-nano` with `reasoning_effort: medium`): Refines and expands the answer into a detailed, professional response + +Both LLMs automatically stream their responses - no special configuration needed! + +![Multi-LLM Streaming Example](./multi_stream.gif) + +## How It Works + +Hayhooks automatically enables streaming for **all** streaming-capable components. Both LLMs stream their responses serially (one after another) without any special configuration. + +The pipeline connects LLM 1's replies directly to the second prompt builder. Using Jinja2 template syntax, the second prompt builder can access the `ChatMessage` attributes directly: `{{previous_response[0].text}}`. This approach is simple and doesn't require any custom extraction components. + +This example also demonstrates injecting a visual separator (`**[LLM 2 - Refining the response]**`) between the two LLM outputs using `StreamingChunk.component_info` to detect component transitions. + +## Usage + +### Deploy with Hayhooks + +```bash +# Set your OpenAI API key +export OPENAI_API_KEY=your_api_key_here + +# Deploy the pipeline +hayhooks deploy examples/pipeline_wrappers/multi_llm_streaming + +# Test it via OpenAI-compatible API +curl -X POST http://localhost:1416/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "multi_llm_streaming", + "messages": [{"role": "user", "content": "What is machine learning?"}], + "stream": true + }' +``` + +### Use Directly in Code + +```python +from haystack import Pipeline +from haystack.components.builders import ChatPromptBuilder +from haystack.dataclasses import ChatMessage +from hayhooks import streaming_generator + +# Create your pipeline with multiple streaming components +pipeline = Pipeline() +# ... add LLM 1 and prompt_builder_1 ... + +# Add second prompt builder that accesses ChatMessage attributes via Jinja2 +pipeline.add_component( + "prompt_builder_2", + ChatPromptBuilder( + template=[ + ChatMessage.from_system("You are a helpful assistant."), + ChatMessage.from_user("Previous: {{previous_response[0].text}}\n\nRefine this.") + ] + ) +) +# ... add LLM 2 ... + +# Connect: LLM 1 replies directly to prompt_builder_2 +pipeline.connect("llm_1.replies", "prompt_builder_2.previous_response") + +# streaming_generator automatically streams from ALL components +for chunk in streaming_generator( + pipeline=pipeline, + pipeline_run_args={"prompt_builder_1": {"template_variables": {"query": "Your question"}}} +): + print(chunk.content, end="", flush=True) +``` + +## Integration with OpenWebUI + +This pipeline works seamlessly with OpenWebUI: + +1. Configure OpenWebUI to connect to hayhooks (see [OpenWebUI Integration docs](../../../docs/features/openwebui-integration.md)) +2. Deploy this pipeline +3. Select it as a model in OpenWebUI +4. Watch both LLMs stream their responses in real-time + +## Technical Details + +- **Pipeline Flow**: `LLM 1 → Prompt Builder 2 → LLM 2` +- **Jinja2 Templates**: `ChatPromptBuilder` uses Jinja2, allowing direct access to `ChatMessage` attributes in templates +- **Template Variables**: LLM 1's `List[ChatMessage]` replies are passed directly as `previous_response` to the second prompt builder +- **Accessing ChatMessage Content**: Use `{{previous_response[0].text}}` in templates to access the text content +- **Streaming**: Serial execution with automatic callback management for all components +- **Transition Detection**: Uses `StreamingChunk.component_info.name` to detect when LLM 2 starts +- **Visual Separator**: Injects a `StreamingChunk` between LLM outputs +- **Error Handling**: Stream terminates gracefully if any component fails diff --git a/examples/pipeline_wrappers/multi_llm_streaming/multi_stream.gif b/examples/pipeline_wrappers/multi_llm_streaming/multi_stream.gif new file mode 100644 index 00000000..673d1f6b Binary files /dev/null and b/examples/pipeline_wrappers/multi_llm_streaming/multi_stream.gif differ diff --git a/examples/pipeline_wrappers/multi_llm_streaming/pipeline_wrapper.py b/examples/pipeline_wrappers/multi_llm_streaming/pipeline_wrapper.py new file mode 100644 index 00000000..d1abe1c4 --- /dev/null +++ b/examples/pipeline_wrappers/multi_llm_streaming/pipeline_wrapper.py @@ -0,0 +1,129 @@ +from collections.abc import Generator +from typing import Any, List, Union # noqa: UP035 + +from haystack import Pipeline +from haystack.components.builders import ChatPromptBuilder +from haystack.components.generators.chat import OpenAIChatGenerator +from haystack.dataclasses import ChatMessage, StreamingChunk +from haystack.utils import Secret + +from hayhooks import BasePipelineWrapper, get_last_user_message, streaming_generator + + +class PipelineWrapper(BasePipelineWrapper): + """ + A pipeline with two sequential LLM components that both stream. + + The first LLM (low reasoning) provides a concise answer, and the second LLM + (medium reasoning) refines and expands it with more detail. + Both automatically stream their responses - this is the default behavior in hayhooks. + """ + + def setup(self) -> None: + """Initialize the pipeline with two streaming LLM components.""" + self.pipeline = Pipeline() + + # First stage: Initial answer + self.pipeline.add_component( + "prompt_builder_1", + ChatPromptBuilder( + template=[ + ChatMessage.from_system( + "You are a helpful assistant. \nAnswer the user's question in a short and concise manner." + ), + ChatMessage.from_user("{{query}}"), + ] + ), + ) + self.pipeline.add_component( + "llm_1", + OpenAIChatGenerator( + api_key=Secret.from_env_var("OPENAI_API_KEY"), + model="gpt-5-nano", + generation_kwargs={ + "reasoning_effort": "low", + }, + ), + ) + + # Second stage: Refinement + # The prompt builder can directly access ChatMessage attributes via Jinja2 + self.pipeline.add_component( + "prompt_builder_2", + ChatPromptBuilder( + template=[ + ChatMessage.from_system("You are a helpful assistant that refines and improves responses."), + ChatMessage.from_user( + "Here is the previous response:\n\n{{previous_response[0].text}}\n\n" + "Please refine and improve this response. " + "Make it a bit more detailed, clear, and professional. " + "Please state that you're refining the response in the beginning of your answer." + ), + ] + ), + ) + self.pipeline.add_component( + "llm_2", + OpenAIChatGenerator( + api_key=Secret.from_env_var("OPENAI_API_KEY"), + model="gpt-5-nano", + generation_kwargs={ + "reasoning_effort": "medium", + }, + ), + ) + + # Connect the components + self.pipeline.connect("prompt_builder_1.prompt", "llm_1.messages") + self.pipeline.connect("llm_1.replies", "prompt_builder_2.previous_response") + self.pipeline.connect("prompt_builder_2.prompt", "llm_2.messages") + + def run_api(self, query: str) -> dict[str, Any]: + """Run the pipeline in non-streaming mode.""" + result = self.pipeline.run( + { + "prompt_builder_1": {"template_variables": {"query": query}}, + } + ) + return {"reply": result["llm_2"]["replies"][0].text if result["llm_2"]["replies"] else ""} + + def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Union[str, Generator]: # noqa: ARG002, UP006 + """ + Run the pipeline in streaming mode. + + Both LLMs will automatically stream their responses thanks to + hayhooks' built-in multi-component streaming support. + + We inject a visual separator between LLM 1 and LLM 2 outputs. + """ + question = get_last_user_message(messages) + + def custom_streaming(): + """ + Enhanced streaming that injects a visual separator between LLM outputs. + + Uses StreamingChunk.component_info.name to reliably detect which component + is streaming, avoiding fragile chunk counting or heuristics. + + NOTE: This is simply a workaround to inject a visual separator between LLM outputs. + """ + llm2_started = False + + for chunk in streaming_generator( + pipeline=self.pipeline, + pipeline_run_args={ + "prompt_builder_1": {"template_variables": {"query": question}}, + }, + ): + # Use component_info to detect which LLM is streaming + if hasattr(chunk, "component_info") and chunk.component_info: + component_name = chunk.component_info.name + + # When we see llm_2 for the first time, inject a visual separator + if component_name == "llm_2" and not llm2_started: + llm2_started = True + yield StreamingChunk(content="\n\n**[LLM 2 - Refining the response]**\n\n") + + yield chunk + + return custom_streaming() diff --git a/src/hayhooks/server/pipelines/utils.py b/src/hayhooks/server/pipelines/utils.py index 9d84792a..06c7c78a 100644 --- a/src/hayhooks/server/pipelines/utils.py +++ b/src/hayhooks/server/pipelines/utils.py @@ -40,33 +40,32 @@ def get_last_user_message(messages: list[Union[Message, dict]]) -> Union[str, No return None -def find_streaming_component(pipeline: Union[Pipeline, AsyncPipeline]) -> tuple[Component, str]: +def find_all_streaming_components(pipeline: Union[Pipeline, AsyncPipeline]) -> list[tuple[Component, str]]: """ - Finds the component in the pipeline that supports streaming_callback + Finds all components in the pipeline that support streaming_callback. Returns: - The first component that supports streaming + A list of tuples containing (component, component_name) for all streaming components """ - streaming_component = None - streaming_component_name = "" + streaming_components = [] for name, component in pipeline.walk(): if hasattr(component, "streaming_callback"): log.trace(f"Streaming component found in '{name}' with type {type(component)}") - streaming_component = component - streaming_component_name = name - if not streaming_component: - msg = "No streaming-capable component found in the pipeline" + streaming_components.append((component, name)) + + if not streaming_components: + msg = "No streaming-capable components found in the pipeline" raise ValueError(msg) - return streaming_component, streaming_component_name + return streaming_components def _setup_streaming_callback_for_pipeline( pipeline: Union[Pipeline, AsyncPipeline], pipeline_run_args: dict[str, Any], streaming_callback: Any ) -> dict[str, Any]: """ - Sets up streaming callback for pipeline components. + Sets up streaming callbacks for all streaming-capable components in the pipeline. Args: pipeline: The pipeline to configure @@ -76,16 +75,17 @@ def _setup_streaming_callback_for_pipeline( Returns: Updated pipeline run arguments """ - _, streaming_component_name = find_streaming_component(pipeline) + streaming_components = find_all_streaming_components(pipeline) - # Ensure component args exist in pipeline run args - if streaming_component_name not in pipeline_run_args: - pipeline_run_args[streaming_component_name] = {} + for _, component_name in streaming_components: + # Ensure component args exist in pipeline run args + if component_name not in pipeline_run_args: + pipeline_run_args[component_name] = {} - # Set the streaming callback on the component - streaming_component = pipeline.get_component(streaming_component_name) - assert hasattr(streaming_component, "streaming_callback") - streaming_component.streaming_callback = streaming_callback + # Set the streaming callback on the component + streaming_component = pipeline.get_component(component_name) + assert hasattr(streaming_component, "streaming_callback") + streaming_component.streaming_callback = streaming_callback return pipeline_run_args @@ -157,7 +157,8 @@ def streaming_generator( # noqa: C901, PLR0912 """ Creates a generator that yields streaming chunks from a pipeline or agent execution. - Automatically finds the streaming-capable component in pipelines or uses the agent's streaming callback. + Automatically finds all streaming-capable components in pipelines and sets up streaming for all of them. + For agents, uses the agent's streaming callback. Args: pipeline: The Pipeline, AsyncPipeline, or Agent to execute @@ -171,8 +172,9 @@ def streaming_generator( # noqa: C901, PLR0912 OpenWebUIEvent: Event for tool call str: Tool name or stream content - NOTE: This generator works with sync/async pipelines and agents, but pipeline components - which support streaming must have a _sync_ `streaming_callback`. + NOTE: This generator works with sync/async pipelines and agents. Pipeline components + which support streaming must have a _sync_ `streaming_callback`. All streaming-capable + components in the pipeline will stream their outputs serially as the pipeline executes. """ if pipeline_run_args is None: pipeline_run_args = {} @@ -247,26 +249,27 @@ def run_pipeline() -> None: def _validate_async_streaming_support(pipeline: Union[Pipeline, AsyncPipeline]) -> None: """ - Validates that the pipeline supports async streaming callbacks. + Validates that all streaming components in the pipeline support async streaming callbacks. Args: pipeline: The pipeline to validate Raises: - ValueError: If the pipeline doesn't support async streaming + ValueError: If any streaming component doesn't support async streaming """ - streaming_component, streaming_component_name = find_streaming_component(pipeline) - - # Check if the streaming component supports async streaming callbacks - # We check for run_async method as an indicator of async support - if not hasattr(streaming_component, "run_async"): - component_type = type(streaming_component).__name__ - msg = ( - f"Component '{streaming_component_name}' of type '{component_type}' seems to not support async streaming " - "callbacks. Use the sync 'streaming_generator' function instead, or switch to a component that supports " - "async streaming callbacks (e.g., OpenAIChatGenerator instead of OpenAIGenerator)." - ) - raise ValueError(msg) + streaming_components = find_all_streaming_components(pipeline) + + for streaming_component, streaming_component_name in streaming_components: + # Check if the streaming component supports async streaming callbacks + # We check for run_async method as an indicator of async support + if not hasattr(streaming_component, "run_async"): + component_type = type(streaming_component).__name__ + msg = ( + f"Component '{streaming_component_name}' of type '{component_type}' seems to not support async " + "streaming callbacks. Use the sync 'streaming_generator' function instead, or switch to a component " + "that supports async streaming callbacks (e.g., OpenAIChatGenerator instead of OpenAIGenerator)." + ) + raise ValueError(msg) async def _execute_pipeline_async( @@ -352,7 +355,8 @@ async def async_streaming_generator( # noqa: C901, PLR0912 """ Creates an async generator that yields streaming chunks from a pipeline or agent execution. - Automatically finds the streaming-capable component in pipelines or uses the agent's streaming callback. + Automatically finds all streaming-capable components in pipelines and sets up streaming for all of them. + For agents, uses the agent's streaming callback. Args: pipeline: The Pipeline, AsyncPipeline, or Agent to execute @@ -366,8 +370,9 @@ async def async_streaming_generator( # noqa: C901, PLR0912 OpenWebUIEvent: Event for tool call str: Tool name or stream content - NOTE: This generator works with sync/async pipelines and agents. For pipelines, the streaming component + NOTE: This generator works with sync/async pipelines and agents. For pipelines, the streaming components must support an _async_ `streaming_callback`. Agents have built-in async streaming support. + All streaming-capable components in the pipeline will stream their outputs serially as the pipeline executes. """ # Validate async streaming support for pipelines (not needed for agents) if pipeline_run_args is None: diff --git a/tests/test_it_pipeline_utils.py b/tests/test_it_pipeline_utils.py index 6f238044..91c00667 100644 --- a/tests/test_it_pipeline_utils.py +++ b/tests/test_it_pipeline_utils.py @@ -14,7 +14,11 @@ from hayhooks import callbacks from hayhooks.open_webui import OpenWebUIEvent, create_notification_event -from hayhooks.server.pipelines.utils import async_streaming_generator, find_streaming_component, streaming_generator +from hayhooks.server.pipelines.utils import ( + async_streaming_generator, + find_all_streaming_components, + streaming_generator, +) QUESTION = "Is Haystack a framework for developing AI applications? Answer Yes or No" @@ -141,33 +145,10 @@ def mocked_pipeline_with_streaming_component(mocker): return streaming_component, pipeline -def test_find_streaming_component_no_streaming_component(): - pipeline = Pipeline() - - with pytest.raises(ValueError, match="No streaming-capable component found in the pipeline"): - find_streaming_component(pipeline) - - -def test_find_streaming_component_finds_streaming_component(mocker): - streaming_component = MockComponent(has_streaming=True) - non_streaming_component = MockComponent(has_streaming=False) - - pipeline = mocker.Mock(spec=Pipeline) - pipeline.walk.return_value = [ - ("component1", non_streaming_component), - ("streaming_component", streaming_component), - ("component2", non_streaming_component), - ] - - component, name = find_streaming_component(pipeline) - assert component == streaming_component - assert name == "streaming_component" - - def test_streaming_generator_no_streaming_component(): pipeline = Pipeline() - with pytest.raises(ValueError, match="No streaming-capable component found in the pipeline"): + with pytest.raises(ValueError, match="No streaming-capable components found in the pipeline"): list(streaming_generator(pipeline)) @@ -219,7 +200,7 @@ def test_streaming_generator_empty_output(mocked_pipeline_with_streaming_compone async def test_async_streaming_generator_no_streaming_component(): pipeline = Pipeline() - with pytest.raises(ValueError, match="No streaming-capable component found in the pipeline"): + with pytest.raises(ValueError, match="No streaming-capable components found in the pipeline"): _ = [chunk async for chunk in async_streaming_generator(pipeline)] @@ -961,3 +942,114 @@ def custom_on_pipeline_end(output): logger.add(lambda msg: messages.append(msg), level="ERROR") _ = [chunk async for chunk in generator] assert "Callback error" in messages[0] + + +def test_find_all_streaming_components_finds_multiple(mocker): + streaming_component1 = MockComponent(has_streaming=True) + streaming_component2 = MockComponent(has_streaming=True) + non_streaming_component = MockComponent(has_streaming=False) + + pipeline = mocker.Mock(spec=Pipeline) + pipeline.walk.return_value = [ + ("component1", streaming_component1), + ("non_streaming", non_streaming_component), + ("component2", streaming_component2), + ] + + components = find_all_streaming_components(pipeline) + assert len(components) == 2 + assert components[0] == (streaming_component1, "component1") + assert components[1] == (streaming_component2, "component2") + + +def test_find_all_streaming_components_raises_when_none_found(): + pipeline = Pipeline() + + with pytest.raises(ValueError, match="No streaming-capable components found in the pipeline"): + find_all_streaming_components(pipeline) + + +@pytest.fixture +def pipeline_with_multiple_streaming_components(mocker): + streaming_component1 = MockComponent(has_streaming=True) + streaming_component2 = MockComponent(has_streaming=True) + non_streaming_component = MockComponent(has_streaming=False) + + pipeline = mocker.Mock(spec=AsyncPipeline) + pipeline._spec_class = AsyncPipeline + pipeline.walk.return_value = [ + ("component1", streaming_component1), + ("non_streaming", non_streaming_component), + ("component2", streaming_component2), + ] + + def mock_get_component(name): + if name == "component1": + return streaming_component1 + elif name == "component2": + return streaming_component2 + return non_streaming_component + + pipeline.get_component.side_effect = mock_get_component + + return streaming_component1, streaming_component2, pipeline + + +def test_streaming_generator_with_multiple_components(pipeline_with_multiple_streaming_components): + streaming_component1, streaming_component2, pipeline = pipeline_with_multiple_streaming_components + + mock_chunks = [ + StreamingChunk(content="chunk1_from_component1"), + StreamingChunk(content="chunk2_from_component1"), + StreamingChunk(content="chunk1_from_component2"), + StreamingChunk(content="chunk2_from_component2"), + ] + + def mock_run(data): + # Simulate both components streaming + if streaming_component1.streaming_callback: + streaming_component1.streaming_callback(mock_chunks[0]) + streaming_component1.streaming_callback(mock_chunks[1]) + if streaming_component2.streaming_callback: + streaming_component2.streaming_callback(mock_chunks[2]) + streaming_component2.streaming_callback(mock_chunks[3]) + + pipeline.run.side_effect = mock_run + + generator = streaming_generator(pipeline) + chunks = list(generator) + + assert chunks == mock_chunks + # Verify both components had their callbacks set + assert streaming_component1.streaming_callback is not None + assert streaming_component2.streaming_callback is not None + + +@pytest.mark.asyncio +async def test_async_streaming_generator_with_multiple_components(mocker, pipeline_with_multiple_streaming_components): + streaming_component1, streaming_component2, pipeline = pipeline_with_multiple_streaming_components + + mock_chunks = [ + StreamingChunk(content="async_chunk1_from_component1"), + StreamingChunk(content="async_chunk2_from_component1"), + StreamingChunk(content="async_chunk1_from_component2"), + StreamingChunk(content="async_chunk2_from_component2"), + ] + + async def mock_run_async(data): + # Simulate both components streaming + if streaming_component1.streaming_callback: + await streaming_component1.streaming_callback(mock_chunks[0]) + await streaming_component1.streaming_callback(mock_chunks[1]) + if streaming_component2.streaming_callback: + await streaming_component2.streaming_callback(mock_chunks[2]) + await streaming_component2.streaming_callback(mock_chunks[3]) + + pipeline.run_async = mocker.AsyncMock(side_effect=mock_run_async) + + chunks = [chunk async for chunk in async_streaming_generator(pipeline)] + + assert chunks == mock_chunks + # Verify both components had their callbacks set + assert streaming_component1.streaming_callback is not None + assert streaming_component2.streaming_callback is not None