-
Notifications
You must be signed in to change notification settings - Fork 31
Enhance YAML pipeline deployments using inputs
/ outputs
fields
#161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'll put here a question as a reminder. Considering this situation (coming from a complex Haystack pipeline): inputs:
query:
- bm25_retriever.query
- query_embedder.text
- ConditionalRouter.question
filters:
- bm25_retriever.filters
- embedding_retriever.filters It's always safe to assume that |
@mpangrazzi yes I'd say based on the provided mapping here we can assume they will all have the same input type. Technically you could inspect it but that would require creating the pipeline first. |
…yaml ; refactoring
…sing inputs/outputs
…t using old YAML deploy logic)
…threadpool when running it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments.
Some general notes:
- I would improve the PR title (for release notes) to make it clear that the change is breaking and that we are introducing a non-backward compatible way to deploy YAML pipelines.
- Related to the previous point: are you considering releasing a major version?
README.md
Outdated
@@ -162,8 +163,9 @@ CLI commands are basically wrappers around the HTTP API of the server. The full | |||
hayhooks run # Start the server | |||
hayhooks status # Check the status of the server and show deployed pipelines | |||
|
|||
hayhooks pipeline deploy-files <path_to_dir> # Deploy a pipeline using PipelineWrapper | |||
hayhooks pipeline deploy <pipeline_name> # Deploy a pipeline from a YAML file | |||
hayhooks pipeline deploy-yaml <path_to_yaml> # Deploy a pipeline from a YAML file (preferred) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this option is preferred?
It seems a convenient option if you are used to working with YAML, but is less flexible (no OpenAI/OpenWebUI compatibility), so I would not indicate it as preferred. But maybe I am missing something. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also had the same question :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably a typo sorry, deploy-files
should obviously be the preferred one due exactly to what you said ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some mentions of "preferred" here and there
tools = await client.list_tools() | ||
# Find YAML tool by name, e.g., "calc" (the pipeline name) | ||
result = await client.call_tool("calc", {"value": 3}) | ||
assert result.content[0].text == '{"double": {"value": 10}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only understand this example by looking at the calc pipeline code.
Perhaps we can use something easier to grasp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you have in mind? Or maybe we can reference better the calc pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
tools = await client.list_tools()
# Find YAML tool by name, e.g., "multiply" (the pipeline name)
result = await client.call_tool("multiply", {"x": 3, "y": 4})
assert result.content[0].text == '{"product": 12}'
I just find it hard to understand the example without knowing the Pipeline.
@@ -91,6 +119,10 @@ def deploy_files( | |||
_deploy_with_progress(ctx=ctx, name=name, endpoint="deploy_files", payload=payload) | |||
|
|||
|
|||
# Register alias: `deploy` -> `deploy-files` | |||
pipeline.command(name="deploy")(deploy_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two thoughts:
-
this reinforces the idea expressed above, that the preferred option is using pipeline wrappers
-
even if the overall impact of this PR is highly breaking, I find it confusing that now
deploy
is an alias fordeploy-files
while previously it was used to deploy YAML pipelines in the old way. For me, it would be clearer to removedeploy
altogether at the moment.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, maybe we can remove deploy
command completely since this will already contain breaking changes. Or maybe add a message which tells to use deploy-files
or deploy-yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Let's add an error message explaining that to deploy YAML pipelines you now need to use deploy-yaml
with a new YAML structure.
@@ -1,79 +0,0 @@ | |||
components: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you confirm that most of these removed files are not used in tests?
(My impression is that they were present for manual tests before proper tests were put in place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I confirm! Those files were used when we were iterating on improving type handling of components inputs in old YAML logic (i.e. without inputs/outputs fields).
msg = f"Failed to save YAML pipeline file: {e!s}" | ||
raise PipelineFilesError(msg) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth including the pipeline_name
as part of the error message?
# Ensure the registered object is a Haystack Pipeline, not a wrapper | ||
if not isinstance(pipeline_instance, AsyncPipeline): | ||
msg = f"Pipeline '{pipeline_name}' is not a Haystack AsyncPipeline instance" | ||
raise PipelineYamlError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dev comment doesn't quite line up with the actual check which is that it must be an AsyncPipeline and not a normal Pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also mean this deploying with yaml only works with AsyncPipeline
?
clog.error(f"Failed creating request/response models for YAML pipeline: {e!s}") | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, should we include pipeline_name
in the erorr message?
# NOTE: We want to create an AsyncPipeline here so we can avoid using | ||
# run_in_threadpool when running the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make it more clear in docstrings that only AsyncPipeline
is supported when deploying with yaml. E.g. Use "Haystack AsyncPipeline.run_async" instead of "Haystack Pipeline.run"
Limitations: | ||
|
||
- YAML-deployed pipelines do not support OpenAI-compatible chat completion endpoints, so they cannot be used with Open WebUI. If you need chat completion/streaming, use a `PipelineWrapper` and implement `run_chat_completion` or `run_chat_completion_async` (see the OpenAI compatibility section below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add here as well to say that YAML-deployed pipelines only work with AsyncPipeline
Fixes #156.
inputs
andoutputs
params / types from a Haystack pipeline YAML definition/deploy-yaml
API route for deploying YAML pipelineshayhooks pipeline deploy-yaml
CLI command/deploy
endpoint - NOTE: This was needed to avoid confusion between new and old logicNote: initially I wanted to remove old YAML logic in another PR, but it would end to be quite confusing. Better to remove it now and update README accordingly.