Skip to content

Conversation

mpangrazzi
Copy link
Contributor

@mpangrazzi mpangrazzi commented Sep 2, 2025

Fixes #156.

  • Resolve declared inputs and outputs params / types from a Haystack pipeline YAML definition
  • Add YAML pipeline to registry - dynamically create Pydantic request/response models using above parsed data (in metadata)
  • Dynamically add API route for YAML pipeline
  • Add /deploy-yaml API route for deploying YAML pipelines
  • Discard YAML pipelines without inputs / outputs fields (note that this old way of deployment would have been deprecated / removed)
  • Add hayhooks pipeline deploy-yaml CLI command
  • Ensure inputs / outputs YAML pipelines are deployed at startup (so not using old YAML deploy logic)
  • Remove all old YAML deploy logic and /deploy endpoint - NOTE: This was needed to avoid confusion between new and old logic
  • Update README adding sections for YAML pipelines and remove all legacy info regarding old logic
  • Use async handler instead of sync one by default
  • Enable YAML pipelines to be usable as MCP Tools (when using Hayhooks MCP Server)

Note: initially I wanted to remove old YAML logic in another PR, but it would end to be quite confusing. Better to remove it now and update README accordingly.

@mpangrazzi mpangrazzi self-assigned this Sep 2, 2025
@mpangrazzi
Copy link
Contributor Author

mpangrazzi commented Sep 2, 2025

I'll put here a question as a reminder. Considering this situation (coming from a complex Haystack pipeline):

inputs:
  query:
  - bm25_retriever.query
  - query_embedder.text
  - ConditionalRouter.question
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

It's always safe to assume that bm25_retriever.query, query_embedder.text and ConditionalRouter.question will have the same input type? (Same can be said for filters). I assume yes of course 😉

@sjrl sjrl self-requested a review September 9, 2025 07:45
@sjrl
Copy link

sjrl commented Sep 9, 2025

I'll put here a question as a reminder. Considering this situation (coming from a complex Haystack pipeline):

inputs:
  query:
  - bm25_retriever.query
  - query_embedder.text
  - ConditionalRouter.question
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

It's always safe to assume that bm25_retriever.query, query_embedder.text and ConditionalRouter.question will have the same input type? (Same can be said for filters). I assume yes of course 😉

@mpangrazzi yes I'd say based on the provided mapping here we can assume they will all have the same input type. Technically you could inspect it but that would require creating the pipeline first.

@mpangrazzi mpangrazzi requested a review from anakin87 September 15, 2025 14:38
@mpangrazzi mpangrazzi marked this pull request as ready for review September 16, 2025 07:44
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments.

Some general notes:

  • I would improve the PR title (for release notes) to make it clear that the change is breaking and that we are introducing a non-backward compatible way to deploy YAML pipelines.
  • Related to the previous point: are you considering releasing a major version?

README.md Outdated
@@ -162,8 +163,9 @@ CLI commands are basically wrappers around the HTTP API of the server. The full
hayhooks run # Start the server
hayhooks status # Check the status of the server and show deployed pipelines

hayhooks pipeline deploy-files <path_to_dir> # Deploy a pipeline using PipelineWrapper
hayhooks pipeline deploy <pipeline_name> # Deploy a pipeline from a YAML file
hayhooks pipeline deploy-yaml <path_to_yaml> # Deploy a pipeline from a YAML file (preferred)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this option is preferred?

It seems a convenient option if you are used to working with YAML, but is less flexible (no OpenAI/OpenWebUI compatibility), so I would not indicate it as preferred. But maybe I am missing something. WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also had the same question :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a typo sorry, deploy-files should obviously be the preferred one due exactly to what you said ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some mentions of "preferred" here and there

tools = await client.list_tools()
# Find YAML tool by name, e.g., "calc" (the pipeline name)
result = await client.call_tool("calc", {"value": 3})
assert result.content[0].text == '{"double": {"value": 10}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only understand this example by looking at the calc pipeline code.
Perhaps we can use something easier to grasp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you have in mind? Or maybe we can reference better the calc pipeline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

tools = await client.list_tools()
# Find YAML tool by name, e.g., "multiply" (the pipeline name)
result = await client.call_tool("multiply", {"x": 3, "y": 4})
assert result.content[0].text == '{"product": 12}'

I just find it hard to understand the example without knowing the Pipeline.

@@ -91,6 +119,10 @@ def deploy_files(
_deploy_with_progress(ctx=ctx, name=name, endpoint="deploy_files", payload=payload)


# Register alias: `deploy` -> `deploy-files`
pipeline.command(name="deploy")(deploy_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two thoughts:

  • this reinforces the idea expressed above, that the preferred option is using pipeline wrappers

  • even if the overall impact of this PR is highly breaking, I find it confusing that now deploy is an alias for deploy-files while previously it was used to deploy YAML pipelines in the old way. For me, it would be clearer to remove deploy altogether at the moment.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe we can remove deploy command completely since this will already contain breaking changes. Or maybe add a message which tells to use deploy-files or deploy-yaml?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let's add an error message explaining that to deploy YAML pipelines you now need to use deploy-yaml with a new YAML structure.

@@ -1,79 +0,0 @@
components:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm that most of these removed files are not used in tests?
(My impression is that they were present for manual tests before proper tests were put in place)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I confirm! Those files were used when we were iterating on improving type handling of components inputs in old YAML logic (i.e. without inputs/outputs fields).

Comment on lines +102 to +103
msg = f"Failed to save YAML pipeline file: {e!s}"
raise PipelineFilesError(msg) from e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth including the pipeline_name as part of the error message?

Comment on lines +316 to +319
# Ensure the registered object is a Haystack Pipeline, not a wrapper
if not isinstance(pipeline_instance, AsyncPipeline):
msg = f"Pipeline '{pipeline_name}' is not a Haystack AsyncPipeline instance"
raise PipelineYamlError(msg)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev comment doesn't quite line up with the actual check which is that it must be an AsyncPipeline and not a normal Pipeline?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also mean this deploying with yaml only works with AsyncPipeline?

Comment on lines +465 to +466
clog.error(f"Failed creating request/response models for YAML pipeline: {e!s}")
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, should we include pipeline_name in the erorr message?

Comment on lines +478 to +479
# NOTE: We want to create an AsyncPipeline here so we can avoid using
# run_in_threadpool when running the pipeline.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make it more clear in docstrings that only AsyncPipeline is supported when deploying with yaml. E.g. Use "Haystack AsyncPipeline.run_async" instead of "Haystack Pipeline.run"

Comment on lines +369 to +371
Limitations:

- YAML-deployed pipelines do not support OpenAI-compatible chat completion endpoints, so they cannot be used with Open WebUI. If you need chat completion/streaming, use a `PipelineWrapper` and implement `run_chat_completion` or `run_chat_completion_async` (see the OpenAI compatibility section below).
Copy link

@sjrl sjrl Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add here as well to say that YAML-deployed pipelines only work with AsyncPipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve YAML-only deployment supporting inputs and outputs fields
3 participants