Skip to content

Conversation

mpangrazzi
Copy link
Contributor

@mpangrazzi mpangrazzi commented Sep 2, 2025

Fixes #156.

  • Resolve declared inputs and outputs params / types from a Haystack pipeline YAML definition
  • Add YAML pipeline to registry - dynamically create Pydantic request/response models using above parsed data (in metadata)
  • Dynamically add API route for YAML pipeline
  • Add /deploy-yaml API route for deploying YAML pipelines
  • Discard YAML pipelines without inputs / outputs fields (note that this old way of deployment would have been deprecated / removed)
  • Add hayhooks pipeline deploy-yaml CLI command
  • Ensure inputs / outputs YAML pipelines are deployed at startup (so not using old YAML deploy logic)
  • Remove all old YAML deploy logic and /deploy endpoint - NOTE: This was needed to avoid confusion between new and old logic
  • Update README adding sections for YAML pipelines and remove all legacy info regarding old logic
  • Use async handler instead of sync one by default
  • Enable YAML pipelines to be usable as MCP Tools (when using Hayhooks MCP Server)

Note: initially I wanted to remove old YAML logic in another PR, but it would end to be quite confusing. Better to remove it now and update README accordingly.

@mpangrazzi mpangrazzi self-assigned this Sep 2, 2025
@mpangrazzi
Copy link
Contributor Author

mpangrazzi commented Sep 2, 2025

I'll put here a question as a reminder. Considering this situation (coming from a complex Haystack pipeline):

inputs:
  query:
  - bm25_retriever.query
  - query_embedder.text
  - ConditionalRouter.question
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

It's always safe to assume that bm25_retriever.query, query_embedder.text and ConditionalRouter.question will have the same input type? (Same can be said for filters). I assume yes of course 😉

@sjrl sjrl self-requested a review September 9, 2025 07:45
@sjrl
Copy link

sjrl commented Sep 9, 2025

I'll put here a question as a reminder. Considering this situation (coming from a complex Haystack pipeline):

inputs:
  query:
  - bm25_retriever.query
  - query_embedder.text
  - ConditionalRouter.question
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

It's always safe to assume that bm25_retriever.query, query_embedder.text and ConditionalRouter.question will have the same input type? (Same can be said for filters). I assume yes of course 😉

@mpangrazzi yes I'd say based on the provided mapping here we can assume they will all have the same input type. Technically you could inspect it but that would require creating the pipeline first.

@mpangrazzi mpangrazzi requested a review from anakin87 September 15, 2025 14:38
@mpangrazzi mpangrazzi marked this pull request as ready for review September 16, 2025 07:44
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments.

Some general notes:

  • I would improve the PR title (for release notes) to make it clear that the change is breaking and that we are introducing a non-backward compatible way to deploy YAML pipelines.
  • Related to the previous point: are you considering releasing a major version?

@mpangrazzi
Copy link
Contributor Author

@sjrl @anakin87 I should have addressed all your very useful feedbacks, thanks! Let me know if you have more of them of course.

I would improve the PR title (for release notes) to make it clear that the change is breaking and that we are introducing a non-backward compatible way to deploy YAML pipelines.

Makes sense since it became bigger than expected, but I don't have a clear idea here. If you have a suggestion @anakin87 feel free to update it!

Related to the previous point: are you considering releasing a major version?

Hmm maybe we can do it. We didn't strictly followed SemVer here, but maybe we can start doing it from e.g. v1.0.0. WDYT?

@mpangrazzi mpangrazzi requested review from anakin87 and sjrl September 17, 2025 09:50
@sjrl
Copy link

sjrl commented Sep 17, 2025

Hmm maybe we can do it. We didn't strictly followed SemVer here, but maybe we can start doing it from e.g. v1.0.0. WDYT?

Yeah I think we should start following SemVar and starting with v1.0.0 sounds good!

"""
Create a flat Pydantic request model from declared inputs resolved by yaml_utils.

Args:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a general comment for a separate PR but I wonder if we should use the same style of do strings we use in Haystack. Is there an advantage to using this style here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google-style docstrings were already there when I started working on Hayhooks, I simply kept using that style. IMHO for consistency yes, we can maybe migrate them to reST-style in another PR!

Copy link

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've left a few more minor comments.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

For the title, I'd suggest something like (maybe long)

Breaking: Remove hayhooks deploy; introduce hayhooks deploy-yaml to deploy YAML pipelines with required inputs/outputs fields

@mpangrazzi mpangrazzi changed the title Enhance YAML pipeline deployments using inputs / outputs fields Breaking: Remove hayhooks deploy; introduce hayhooks deploy-yaml to deploy YAML pipelines with required inputs/outputs fields Sep 17, 2025
@mpangrazzi mpangrazzi changed the title Breaking: Remove hayhooks deploy; introduce hayhooks deploy-yaml to deploy YAML pipelines with required inputs/outputs fields Breaking: Remove legacy hayhooks pipeline deploy ; introduce hayhooks pipeline deploy-yaml to deploy YAML pipelines with required inputs/outputs fields Sep 17, 2025
@mpangrazzi mpangrazzi merged commit f688f7b into main Sep 17, 2025
5 checks passed
@mpangrazzi mpangrazzi deleted the yaml_deploy_enhancements branch September 17, 2025 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve YAML-only deployment supporting inputs and outputs fields
3 participants