-
Notifications
You must be signed in to change notification settings - Fork 5
CH-189 docs for cluster configuration #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
filippomc
wants to merge
4
commits into
develop
Choose a base branch
from
CH-189
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
67363cb
CH-189 docs for cluster configuration
filippomc 1724b0e
Update docs/build-deploy/cluster-configuration.md
filippomc c7ee1bb
Update infrastructure/cluster-configuration/README.md
filippomc f3a812e
Apply suggestions from code review
filippomc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Cluster configuration | ||
|
|
||
| Cloud Harness deploys with helm and uses an ingress controller to serve application endpoints. | ||
|
|
||
| The ingress controller is not installed as a part of the application, and has to be installed separately. The installation process can be different depending on the Kubernetes provider. | ||
|
|
||
| See [here](../../infrastructure/cluster-configuration/README.md) for more information and resources for setting up your cluster. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,36 @@ | ||
| # Cluster configuration | ||
|
|
||
| 1. Initialize kubectl credentials to work with your cluster (e.g minikube or google cloud) | ||
| 1. Run `source cluster-init.sh` | ||
| ## Simple setup on an existing cluster | ||
|
|
||
| ### TLDR; | ||
| 1. Create Kubernetes cluster (e.g minikube or google cloud) | ||
| 1. Initialize kubectl credentials to work with your cluster | ||
| 1. Run `source cluster-init.sh` (This script installs the ingress-nginx controller and cert-manager using Helm in the configured k8s cluster.) | ||
|
|
||
| ### Cert-manager | ||
|
|
||
| Follow [this](https://cert-manager.io/docs/installation/kubernetes/) instructions to deploy cert-manager | ||
|
|
||
| ### Ingress | ||
|
|
||
| Ingress controller is the entry point of all cloudharness applications. | ||
| Info on how to deploy nginx-ingress can be found [here](https://kubernetes.github.io/ingress-nginx/deploy/). | ||
|
|
||
| ``` | ||
| helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx | ||
| helm repo update | ||
| helm install ingress-nginx ingress-nginx/ingress-nginx | ||
| ``` | ||
|
|
||
| On localclusters and GCP/GKE, the nginx-ingress chart will deploy a Load Balancer with a given IP address, while in other environments, you may need to configure the Load Balancer manually. Use that address to create the CNames and A records for the website. | ||
|
|
||
|
|
||
|
|
||
| ## GCP GKE cluster setup | ||
|
|
||
| GKE setup is pretty straighforward. Can create a cluster and a node pool from the google console and internet facing load balancers are directly created with the ingress controller. | ||
|
|
||
| For additional info see [here](gcp-setup.md). | ||
|
|
||
| ## AWS EKS setup | ||
| AWS requires come additional steps to install the load balancer and the ingress, see [here](./aws-setup.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
|
|
||
|
|
||
|
|
||
| ## Create the EKS cluster | ||
|
|
||
| There are many ways to create a cluster and which specific one to use depends on specifications that are outside of the generic scope. | ||
|
|
||
| This is a good starting point: https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html. | ||
| In doubt, [Auto Mode Cluster](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-automode.html) is a good place to start. | ||
|
|
||
| ## Ingress setup | ||
|
|
||
| The following is inspired by https://aws.amazon.com/blogs/containers/exposing-kubernetes-applications-part-3-nginx-ingress-controller/, section "Exposing Ingress-Nginx Controller via a Load Balancer". | ||
| Be aware that the article is from 2022 and it doesn't work 100%. | ||
| Following the steps that worked for us on May 2025 | ||
|
|
||
| ### Setup the policy and service account | ||
|
|
||
| Note that have to pay attention to the version of the aws-load-balancer-controller to match with the policy. Wrong version will make things fail | ||
|
|
||
| ```bash | ||
| curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json | ||
| aws iam create-policy --policy-name AWSLoadBalancerControllerIAMPolicy --policy-document file://iam-policy.json | ||
| AWS_ACCOUNT=123456789 # Your AWS account | ||
| eksctl create iamserviceaccount \ | ||
| --cluster=metacell-dev \ | ||
| --name=aws-load-balancer-controller \ | ||
| --namespace=kube-system \ | ||
| --attach-policy-arn=arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy \ | ||
| --approve | ||
| ``` | ||
|
|
||
| ### Install the aws-load-balancer-controller | ||
|
|
||
| First, apply custom resource definition | ||
| ```bash | ||
| wget https://raw.githubusercontent.com/aws/eks-charts/refs/heads/master/stable/aws-load-balancer-controller/crds/crds.yaml | ||
| kubectl apply -f crds.yaml | ||
| ``` | ||
|
|
||
| Then install the helm chart | ||
| From https://github.com/aws/eks-charts/tree/master/stable/aws-load-balancer-controller | ||
| ```bash | ||
| helm repo add eks https://aws.github.io/eks-charts | ||
| # If using IAM Roles for service account install as follows - NOTE: you need to specify both of the chart values `serviceAccount.create=false` and `serviceAccount.name=aws-load-balancer-controller` | ||
| helm install aws-load-balancer-controller eks/aws-load-balancer-controller --set clusterName=metacell-dev -n kube-system --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller | ||
| ``` | ||
|
|
||
|
|
||
| ### Fix vpc | ||
|
|
||
| If encounter the following error related to vpc | ||
|
|
||
| > {"level":"info","ts":"2025-05-21T13:53:48Z","msg":"version","GitVersion":"v2.13.2","GitCommit":"4236bd7928711874ae4d8aff6b97870b5625140f","BuildDate":"2025-05-15T17:37:55+0000"} | ||
| > {"level":"error","ts":"2025-05-21T13:53:53Z","logger":"setup","msg":"unable to initialize AWS cloud","error":"failed to get VPC ID: failed to fetch VPC ID from instance metadata: error in fetching vpc id through ec2 metadata: get mac metadata: operation error ec2imds: GetMetadata, canceled, context deadline exceeded"} | ||
|
|
||
| First get the vpc id: | ||
|
|
||
| ```bash | ||
| aws eks describe-cluster \ | ||
| --name metacell-dev \ | ||
| --region us-west-2 \ | ||
| --query "cluster.resourcesVpcConfig.vpcId" \ | ||
| --output text | ||
| ``` | ||
|
|
||
| Then fix the vpc id value | ||
| ```bash | ||
| helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --reuse-values --set vpcId=$VPC_ID | ||
| ``` | ||
|
|
||
| ### Install ingress nginx | ||
|
|
||
| ```bash | ||
| helm upgrade -i ingress-nginx ingress-nginx/ingress-nginx \ | ||
| --namespace kube-system \ | ||
| --values ingress/values-aws.yaml | ||
|
|
||
| kubectl -n kube-system rollout status deployment ingress-nginx-controller | ||
|
|
||
| kubectl get deployment -n kube-system ingress-nginx-controller | ||
| ``` | ||
|
|
||
| ### Associate the DNS | ||
|
|
||
| The endpoint can be assigned with 2 CNAME entries. | ||
| For instance, if you run `harness-deployment ... -d myapp.mydomain.com`, | ||
| the following CNAME entries are needed | ||
| - myapp [LB_ADDRESS] | ||
| - *.myapp [LB_ADDRESS] | ||
|
|
||
|
|
||
| The easiest way to get the load balancer address is to do the deployment and | ||
| from the ingress with | ||
|
|
||
| ``` | ||
| kubectl get ingress | ||
| ``` | ||
|
|
||
| ## Storage class | ||
|
|
||
| EKS does not provide a default storage class. | ||
| To create one, run | ||
|
|
||
| ```bash | ||
| kubectl apply -f storageclass-default-aws.yaml | ||
| ``` | ||
|
|
||
| ## Container registry | ||
|
|
||
| CloudHarness pushes images on a container registry, which has to be readable from EKS | ||
|
|
||
| Any public registry can be used seamlessly, while ECR is recommended to pull private images | ||
|
|
||
| 1. Create a new ECR registry | ||
| 2. Create all the repositories within the deployment (ECR does not create repositories automatically on push, unless this is implemented https://aws.amazon.com/blogs/containers/dynamically-create-repositories-upon-image-push-to-amazon-ecr/) | ||
| 3. Give the permissions to the Node IAM role | ||
| https://docs.aws.amazon.com/AmazonECR/latest/userguide/ECR_on_EKS.html (the role should be AmazonSSMRoleForInstancesQuickSetup for Auto Mode Clusters) | ||
|
|
||
| To push images, have to authenticate to the registry. | ||
|
|
||
| To authenticate from the local console, the command looks like the following: | ||
|
|
||
| ```bash | ||
| aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 527966638683.dkr.ecr.us-west-2.amazonaws.com | ||
| ``` | ||
|
|
||
| The exact command can also be viewed by hitting "View push commands" from the web console. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Azure AKS setup | ||
|
|
||
| The main complication within AKS is given by the fact that the nginx ingress controller | ||
| is not easy to setup, hence the AGIC controller has to be preferred. | ||
|
|
||
|
|
||
| 1. Create new public IP for the AGIC load balancer | ||
| 2. Create the application gateway | ||
| 3. Enable AGIC add-on in existing AKS cluster with Azure CLI | ||
| 4. Peer the AKS and AG virtual networks together | ||
|
|
||
| See https://learn.microsoft.com/en-us/azure/application-gateway/tutorial-ingress-controller-add-on-existing. | ||
|
|
||
| ## Adapt / override Ingress template | ||
| In addition to this, the [Ingress template](../../deployment-configuration/helm/templates/ingress.yaml) has to be adapted to use the agic controller. | ||
|
|
||
| An adaptation to the Ingress controller template that optionally supports AGIC is the following | ||
| ```yaml | ||
| kind: Ingress | ||
| metadata: | ||
| name: {{ .Values.ingress.name | quote }} | ||
| annotations: | ||
| {{- if eq .Values.ingress.className "azure-application-gateway" }} | ||
| appgw.ingress.kubernetes.io/backend-path-prefix: / | ||
| {{- if $tls }} | ||
| appgw.ingress.kubernetes.io/appgw-ssl-certificate: "nwwcssl" | ||
| {{- end }} | ||
| appgw.ingress.kubernetes.io/ssl-redirect: {{ (and $tls .Values.ingress.ssl_redirect) | quote }} | ||
| {{- else }} | ||
| nginx.ingress.kubernetes.io/ssl-redirect: {{ (and $tls .Values.ingress.ssl_redirect) | quote }} | ||
| nginx.ingress.kubernetes.io/proxy-body-size: '{{ .Values.proxy.payload.max }}m' | ||
| nginx.ingress.kubernetes.io/proxy-buffer-size: '128k' | ||
| nginx.ingress.kubernetes.io/from-to-www-redirect: 'true' | ||
| nginx.ingress.kubernetes.io/rewrite-target: /$1 | ||
| nginx.ingress.kubernetes.io/auth-keepalive-timeout: {{ .Values.proxy.timeout.keepalive | quote }} | ||
| nginx.ingress.kubernetes.io/proxy-read-timeout: {{ .Values.proxy.timeout.read | quote }} | ||
| nginx.ingress.kubernetes.io/proxy-send-timeout: {{ .Values.proxy.timeout.send | quote }} | ||
| nginx.ingress.kubernetes.io/use-forwarded-headers: {{ .Values.proxy.forwardedHeaders | quote }} | ||
|
|
||
| {{- end }} | ||
| {{- if and (and (not .Values.local) (not .Values.certs)) $tls }} | ||
| kubernetes.io/tls-acme: 'true' | ||
| cert-manager.io/issuer: {{ printf "%s-%s" "letsencrypt" .Values.namespace }} | ||
| {{- end }} | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,3 +1,14 @@ | ||||||||||||||
| helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx | ||||||||||||||
| helm repo update | ||||||||||||||
| helm install ingress ingress-nginx/ingress-nginx -f ingress/values.yaml -n kube-system | ||||||||||||||
| helm install ingress ingress-nginx/ingress-nginx -f ingress/values.yaml -n kube-system | ||||||||||||||
|
|
||||||||||||||
| helm repo add jetstack https://charts.jetstack.io --force-update | ||||||||||||||
| helm install \ | ||||||||||||||
| cert-manager jetstack/cert-manager \ | ||||||||||||||
| --namespace cert-manager \ | ||||||||||||||
| --create-namespace \ | ||||||||||||||
| --version v1.17.2 \ | ||||||||||||||
| --set crds.enabled=true | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| helm install --name cert-manager --namespace cert-manager --version v0.14.0 jetstack/cert-manager --set webhook.enabled=false | ||||||||||||||
|
Comment on lines
+11
to
+14
|
||||||||||||||
| --set crds.enabled=true | |
| helm install --name cert-manager --namespace cert-manager --version v0.14.0 jetstack/cert-manager --set webhook.enabled=false | |
| --set crds.enabled=true \ | |
| --set webhook.enabled=false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| ## Google GKE setup | ||
|
|
||
| The easiest way to create the cluster is to use the GCP web console. Define a node pool that satisfies application requirements (no other particular requirements are generally necessary for the node pool/s). | ||
| The node pool can be always scaled to have a different node number, so the only important choice is the node type. | ||
| Also other node pools can be added at any point of time to the cluster to replace the original one. | ||
|
|
||
| After creating the cluster, use `gcloud` command line client to get the credentials in your machine: | ||
| - `gcloud init` | ||
| - `gcloud container clusters get-credentials --zone us-central1-a <CLUSTER_NAME>` | ||
|
|
||
|
|
16 changes: 16 additions & 0 deletions
16
infrastructure/cluster-configuration/ingress/values-aws.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| controller: | ||
| service: | ||
| annotations: | ||
| service.beta.kubernetes.io/aws-load-balancer-name: apps-ingress | ||
| service.beta.kubernetes.io/aws-load-balancer-type: external | ||
| service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing | ||
| service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip | ||
| service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: http | ||
| service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz | ||
| service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: 10254 | ||
| config: | ||
| http-snippet: | | ||
| proxy_cache_path /tmp/nginx-cache levels=1:2 keys_zone=static-cache:2m max_size=100m inactive=7d use_temp_path=off; | ||
| proxy_cache_key $scheme$proxy_host$request_uri; | ||
| proxy_cache_lock on; | ||
| proxy_cache_use_stale updating; |
10 changes: 10 additions & 0 deletions
10
infrastructure/cluster-configuration/storageclass-default-aws.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| apiVersion: storage.k8s.io/v1 | ||
| kind: StorageClass | ||
| metadata: | ||
| name: default | ||
| annotations: | ||
| storageclass.kubernetes.io/is-default-class: "true" | ||
| reclaimPolicy: Delete | ||
| provisioner: ebs.csi.eks.amazonaws.com | ||
| volumeBindingMode: Immediate | ||
| --- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,8 @@ | ||
| apiVersion: storage.k8s.io/v1 | ||
| kind: StorageClass | ||
| metadata: | ||
| name: persisted | ||
| reclaimPolicy: Retain | ||
| volumeProvisioner: kubernetes.io/gce-pd | ||
| name: standard | ||
| reclaimPolicy: Delete | ||
| provisioner: kubernetes.io/gce-pd | ||
| volumeBindingMode: Immediate | ||
| --- |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IAM policy should not use main by default, but rather should be the version of the controller installed, eg:
https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/**v2.13.2**/docs/install/iam_policy.jsonSimilarly, a specific controller version, compatible with the policy should be installed.