Finetune Controller is a robust and flexible system designed to manage and streamline the fine-tuning of machine learning models on Kubernetes, particularly within OpenShift clusters. This project leverages modern tools and workflows, enabling efficient development and deployment processes for AI-driven applications.
- Local Development: Get started quickly with a streamlined setup process using uv, a high-performance Python package and project manager.
- OpenShift Integration: Simplify deployment and scaling with OpenShift-specific configurations and GPU support for intensive workloads.
- MongoDB Backend: Seamlessly connect to a local or cluster-based MongoDB database.
- Extensibility: Easily integrate with the Kubeflow Training Operator and other components for advanced workflows.
If the cluster is already set up continue else follow the cluster setup instructions here
-
Recommend using uv, an extremely fast Python package and project manager
pip install uv
-
A container engine such as Docker or Podman
-
Create virtual environment and install dependencies
uv sync
-
Start a local developement mongo database (or connect to one on cluster with port-forward)
Local
docker run -d --rm --name mongodb \ -e MONGODB_INITDB_ROOT_USERNAME="default-user" \ -e MONGODB_INITDB_ROOT_PASSWORD="admin123456789" \ -e MONGODB_INITDB_DATABASE="finetune" \ -p 27017:27017 \ mongodb/mongodb-community-server:latestyou can port-forward this connection to your local machine
oc port-forward service/mongodb-community-server 27017:27017 -n <namespace>
-
Connect to the Openshift cluster with the cli login command
oc login. If cluster not already set up follow these steps -
Create a project level
.envfile (see.env.example) and update the variables.cp .env.example .env
-
Make sure the virtual environment is activated and start the local finetuning controller application.
source .venv/bin/activate uvicorn app.main:app --reload
This will:
- Start MongoDB with the required configuration
- Build and start the FastAPI server
- Make the application available at http://localhost:8000
Setup pre-commit to keep linting and code styling up to standard.
uv sync
pre-commit installName can be descriptive for these examples we will use finetune-controller
oc new-project finetune-controlleroc new-project kubeflowkubectl apply --server-side -k "github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.8.1"Requires Kubernetes 1.29 or newer
Follow the latest docs
Install a released version
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yamlTo wait for Kueue to be fully available, run:
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=5mRestart pods
kubectl delete pods -lcontrol-plane=controller-manager -nkueue-systemFirst update the namepspace for the crd LocalQueue object in default-user-queue.yaml. default namepsace: "default"
yq e '.metadata.namespace = "finetune-controller"' -i crds/kueue/default-user-queue.yamlApply the default CRD config for Kueue or update by following their docs
kubectl apply -f crds/kueue/Example configuration. do properly configure for production
oc new-app -e MONGODB_INITDB_ROOT_USERNAME="default-user" -e MONGODB_INITDB_ROOT_PASSWORD="admin123456789" -e MONGODB_INITDB_DATABASE="finetune" mongodb/mongodb-community-server:latest --namespace finetune-controllerGo to your cluster on redhat console admin dashboard. Add a machine pool of your choosing with the following configuration:
Taints
key: nvidia.com/gpu
value: <machine pool type or other>
effect: NoSchedule
Node Labels
Key: cluster-api/accelerator
Value: <gpu type e.g. V100 or empty>
Example aws config
# aws_credentials.yaml
apiVersion: v1
data:
AWS_ACCESS_KEY_ID: |base64 encoded secret
AWS_SECRET_ACCESS_KEY: |base64 encoded secret
AWS_REGION: |base64 encoded string
kind: Secret
metadata:
name: aws-credentials
type: Opaque
Example for base 64 command in terminal
echo -n "VALUE" | base64Example docker pull secret config
# pull_secret.yaml
apiVersion: v1
data:
.dockerconfigjson: ...
kind: Secret
metadata:
name: cr-pull-secret
type: kubernetes.io/dockerconfigjsonApply these secrets
oc apply -f aws-credentials.yaml -n finetune-controller-
Create a
.env.productionfile and update the defaults. For this example setMONGODB_URL=mongodb://mongodb-community-server.finetune-controller.svc.cluster.local:27017cp .env.example .env.production
-
create the application
oc new-app --strategy=docker --binary --name finetune-controller --env-file=".env.production" --namespace finetune-controller -
expose services and patch tls config
oc expose deployment/finetune-controller --port=8000 oc expose svc/finetune-controller --port=8000 oc patch route finetune-controller --type=merge -p '{"spec":{"tls":{"termination":"edge"}}}' -
add cluster role binding permissions to the application
-
start a build
oc start-build finetune-controller --from-dir=. --namespace=finetune-controller
Publish From current project
./scripts/publish.shPublish From git ~HEAD
./scripts/publish_git.shlook at setting up a finetune model