Skip to content

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Sep 17, 2025

Opening as a draft until PR #1909 get merged. My testing went well, there are 2 things that we might address as part of the review or as follow-ups:

  • the status doesn't print the deployed servers
  • the update of the Deployment is not implemented

Fixes: #1746

Extends the registry API server to support reading registry data from
local files in addition to Kubernetes ConfigMaps. This enables
deployment scenarios where ConfigMaps are mounted as volumes rather
than accessed via the Kubernetes API.

The server now accepts either:
- `--configmap` flag: Reads from Kubernetes ConfigMap via API
- `--file` flag: Reads from mounted file (for volume-mounted ConfigMaps)
@jhrozek jhrozek marked this pull request as draft September 17, 2025 23:29
Copy link
Collaborator

@dmartinol dmartinol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this draft version! Added some comments but I generally like the idea

desc: Build the ToolHive runtime and Operator image locally and deploy it to the K8s cluster
desc: |
Build the ToolHive runtime and Operator image locally and deploy it to the K8s cluster.
Set DEPLOY_REGISTRY_API=true to also build and deploy the registry API image.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking aloud: to enable the registry controller we use ENABLE_EXPERIMENTAL_FEATURES=true, shouldn't we have a single flag for these experimental features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm going to just use ENABLE_EXPERIMENTAL_FEATURES=true for -deploy-local because there we need to also do a conditional build. For non-local installations from registry we already have a value one can set during the helm invocation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a leftover of previously closed issue: the ENABLE_EXPERIMENTAL_FEATURES is not propagated to the *-deploy-* tasks. The only way to deploy the registry controller so far is by executing ENABLE_EXPERIMENTAL_FEATURES=true task operator-run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this will be now fixed for deploy-local


// GetStorageReference returns a reference to where the data is stored
GetStorageReference(mcpRegistry *mcpv1alpha1.MCPRegistry) *mcpv1alpha1.StorageReference

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original purpose of the StorageManager interface in the sources package was to implement data storage functions for the data imported from the configured data source.
Shouldn't we have a dedicate package (registry_api? deployment?...) for the functions related to deploying the Registry API?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, I'd move the functions to deploy the registry API to a dedicated package

func (r *MCPRegistryReconciler) isAPIReady(ctx context.Context, mcpRegistry *mcpv1alpha1.MCPRegistry) bool {
ctxLogger := log.FromContext(ctx).WithValues("mcpregistry", mcpRegistry.Name)

deploymentName := fmt.Sprintf("%s-api", mcpRegistry.Name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this and similar name generators will go to dedicated functions

// Get current hash from source using the existing handler
hash, err := sourceHandler.CurrentHash(context.Background(), mcpRegistry)
if err != nil {
// If we can't get the hash, return a time-based hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think fmt.Sprintf("error-%d", time.Now().Unix()) is a "time-based hash". Either update the comment or the code?

ctxLogger := log.FromContext(ctx).WithValues("mcpregistry", mcpRegistry.Name)

// Build the desired service configuration
service := buildRegistryAPIService(mcpRegistry)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I can't follow the flows in the review, but it seems that the ensureDeployment method first checks for the existence of the deployment before invoking buildRegistryAPIDeployment, while here we do the other way: first we invoke buildRegistryAPIService then check for the existence of the service.

},
containerName: "registry-api",
expectError: false,
expectedArgs: []string{"serve", "--file=/data/registry/registry.json"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this /data/registry/registry.json should be a new constant or a function using the 2 constants, the mount path and ConfigMapStorageDataKey

fetchResult, err := s.fetchAndProcessRegistryData(ctx, mcpRegistry)
if err != nil {
return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
return ctrl.Result{RequeueAfter: time.Minute * 5}, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning both a ctrl.Result and an error would trigger a WARNING while runnig the controller, because the reconciliation loop expects either a requeue request or an error. You should then see a warning like:

9:44AM  INFO    Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes requeuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't have gone to this PR. I'll back off this change. I think I was testing failure conditions and updates yesterday and managed to run git -am.

# -- Service account configuration for the registry API
serviceAccount:
# -- Specifies whether a service account should be created
create: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the alternative, if the SA is not created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reused the pattern from deploy/charts/operator/templates/serviceaccount.yaml .

I will get rid of this condition and just rely on the experimental feature one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Kubernetes MCPRegistry] Registry API deployment from controller
2 participants