From 67363cbc04f39ab4d3d1940a69226fd1202c36ac Mon Sep 17 00:00:00 2001 From: Filippo Ledda Date: Thu, 22 May 2025 19:59:29 +0200 Subject: [PATCH 1/4] CH-189 docs for cluster configuration --- docs/build-deploy/cluster-configuration.md | 7 + .../cluster-configuration/README.md | 36 ++++- .../cluster-configuration/aws-setup.md | 128 ++++++++++++++++++ .../cluster-configuration/azure-setup.md | 45 ++++++ .../cluster-configuration/cluster-init.sh | 13 +- .../cluster-configuration/gcp-setup.md | 11 ++ .../ingress/values-aws.yaml | 16 +++ .../storageclass-default-aws.yaml | 10 ++ .../cluster-configuration/storageclass.yaml | 6 +- 9 files changed, 266 insertions(+), 6 deletions(-) create mode 100644 docs/build-deploy/cluster-configuration.md create mode 100644 infrastructure/cluster-configuration/aws-setup.md create mode 100644 infrastructure/cluster-configuration/azure-setup.md create mode 100644 infrastructure/cluster-configuration/gcp-setup.md create mode 100644 infrastructure/cluster-configuration/ingress/values-aws.yaml create mode 100644 infrastructure/cluster-configuration/storageclass-default-aws.yaml diff --git a/docs/build-deploy/cluster-configuration.md b/docs/build-deploy/cluster-configuration.md new file mode 100644 index 000000000..9ca1eed7e --- /dev/null +++ b/docs/build-deploy/cluster-configuration.md @@ -0,0 +1,7 @@ +# Cluster configuration + +Cloud Harness deploys with helm and uses an ingress controller to serve application endpoints. + +The ingress controller is not installed as a part of the application, and has to be installed separately. The imstallation process can be different depending on the Kubernetes provider. + +See [here](../../infrastructure/cluster-configuration/README.md) for more information and resources for setting up your cluster. \ No newline at end of file diff --git a/infrastructure/cluster-configuration/README.md b/infrastructure/cluster-configuration/README.md index a96d972ee..ed181ad21 100644 --- a/infrastructure/cluster-configuration/README.md +++ b/infrastructure/cluster-configuration/README.md @@ -1,4 +1,36 @@ # Cluster configuration -1. Initialize kubectl credentials to work with your cluster (e.g minikube or google cloud) -1. Run `source cluster-init.sh` \ No newline at end of file +## Simple setup on an existing cluster + +### TLDR; +1. Create Kubernetes cluster (e.g minikube or google cloud) +1. Initialize kubectl credentials to work with your cluster +1. Run `source cluster-init.sh` + +### Cert-manager + +Follow [this](https://cert-manager.io/docs/installation/kubernetes/) instructions to deploy cert-manager + +### Ingress + +Ingress controller is the entry point of all cloudharness applications. +Info on how to deploy nginx-ingress can be found [here](https://kubernetes.github.io/ingress-nginx/deploy/). + +``` +helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx +helm repo update +helm install ingress-nginx ingress-nginx/ingress-nginx +``` + +On localclusters and GCP/GKE, the nginx-ingress chart will deploy a Load Balancer with a given IP address, while . Use that address to create the CNames and A records for the website. + + + +## GCP GKE cluster setup + +GKE setup is pretty straighforward. Can create a cluster and a node pool from the google console and internet facing load balancers are directly created with the ingress controller. + +For additional info see [here](gcp-setup.md). + +## AWS EKS setup +AWS requires come additional steps to install the load balancer and the ingress, see [here](./aws-setup.md) diff --git a/infrastructure/cluster-configuration/aws-setup.md b/infrastructure/cluster-configuration/aws-setup.md new file mode 100644 index 000000000..9dd63f2a5 --- /dev/null +++ b/infrastructure/cluster-configuration/aws-setup.md @@ -0,0 +1,128 @@ + + + +## Create the EKS cluster + +There are many ways to create a cluster and which specific one to use depends on specifications that are outside of the generic scope. + +This is a good starting point: https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html. +In doubt, [Auto Mode Cluster](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-automode.html) is a good place to start. + +## Ingress setup + +The following is inspired by https://aws.amazon.com/blogs/containers/exposing-kubernetes-applications-part-3-nginx-ingress-controller/, section "Exposing Ingress-Nginx Controller via a Load Balancer". +Be aware that the article is from 2022 and it doesn't work 100%. +Following the steps that worked for us on May 2025 + +### Setup the policy and service account + +Note that have to pay attention to the version of the aws-load-balancer-controller to match with the policy. Wrong version will make things fail + +```bash +curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json +aws iam create-policy --policy-name AWSLoadBalancerControllerIAMPolicy --policy-document file://iam-policy.json +AWS_ACCOUNT=123456789 # Your AWS account +eksctl create iamserviceaccount \ + --cluster=metacell-dev \ + --name=aws-load-balancer-controller \ + --namespace=kube-system \ + --attach-policy-arn=arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy \ + --approve +``` + +### Install the aws-load-balancer-controller + +First, apply custom resource definition +```bash +wget https://raw.githubusercontent.com/aws/eks-charts/refs/heads/master/stable/aws-load-balancer-controller/crds/crds.yaml +kubectl apply -f crds.yaml +``` + +Then install the helm chart +From https://github.com/aws/eks-charts/tree/master/stable/aws-load-balancer-controller +```bash +helm repo add eks https://aws.github.io/eks-charts +# If using IAM Roles for service account install as follows - NOTE: you need to specify both of the chart values `serviceAccount.create=false` and `serviceAccount.name=aws-load-balancer-controller` +helm install aws-load-balancer-controller eks/aws-load-balancer-controller --set clusterName=metacell-dev -n kube-system --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller +``` + + +### Fix vpc + +If encounter the following error related to vpc + +> {"level":"info","ts":"2025-05-21T13:53:48Z","msg":"version","GitVersion":"v2.13.2","GitCommit":"4236bd7928711874ae4d8aff6b97870b5625140f","BuildDate":"2025-05-15T17:37:55+0000"} +> {"level":"error","ts":"2025-05-21T13:53:53Z","logger":"setup","msg":"unable to initialize AWS cloud","error":"failed to get VPC ID: failed to fetch VPC ID from instance metadata: error in fetching vpc id through ec2 metadata: get mac metadata: operation error ec2imds: GetMetadata, canceled, context deadline exceeded"} + +First get the vpc id: + +```bash +aws eks describe-cluster \ + --name metacell-dev \ + --region us-west-2 \ + --query "cluster.resourcesVpcConfig.vpcId" \ + --output text +``` + +Then fix the vpc id value +```bash +helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --reuse-values --set vpcId=$VPC_ID +``` + +### Install ingress nginx + +```bash +helm upgrade -i ingress-nginx ingress-nginx/ingress-nginx \ + --namespace kube-system \ + --values ingress/values-aws.yaml + +kubectl -n kube-system rollout status deployment ingress-nginx-controller + +kubectl get deployment -n kube-system ingress-nginx-controller +``` + +### Associate the DNS + +The endpoint can be assigned with 2 CNAME entries. +For instance, if you run `harness-deployment ... -d myapp.mydomain.com`, +the following CNAME entries are needed +- myapp [LB_ADDRESS] +- *.myapp [LB_ADDRESS] + + +The easiest way to get the load balancer addressis to do the deployment and +from the ingress with + +``` +kubectl get ingress +``` + +## Storage class + +EKS does not provide a default storage class. +To create one, run + +```bash +kubectl apply -f storageclass-default-aws.yaml +``` + +## Container registry + +CloudHarness pushes images on a container registry, which has to be readable from EKS + +Any public registry can be used seamlessly, while ECR is recommended to pull private images + +1. Create a new ECR registry +2. Create all the repositories within the deployment (ECR does not create repositories automatically on push, unless this is implemented https://aws.amazon.com/blogs/containers/dynamically-create-repositories-upon-image-push-to-amazon-ecr/) +3. Give the permissions to the Node IAM role +https://docs.aws.amazon.com/AmazonECR/latest/userguide/ECR_on_EKS.html (the role should be AmazonSSMRoleForInstancesQuickSetup for Auto Mode Clusters) + +To push images, have to authenticate to the registry. + +To authenticate from the local console, the command looks like the following: + +```bash +aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 527966638683.dkr.ecr.us-west-2.amazonaws.com +``` + +The exact command can also be viewed by hitting "View push commands" from the web console. \ No newline at end of file diff --git a/infrastructure/cluster-configuration/azure-setup.md b/infrastructure/cluster-configuration/azure-setup.md new file mode 100644 index 000000000..d0ca8a867 --- /dev/null +++ b/infrastructure/cluster-configuration/azure-setup.md @@ -0,0 +1,45 @@ +# Azure AKS setup + +The main complication within AKS is given by the fact that the nginx ingress controller +is not easy to setup, hence the AGIC controller has to be preferred. + + +1. Create new public IP for the AGIC load balancer +2. Create the application gateway +3. Enable AGIC add-on in existing AKS cluster with Azure CLI +4. Peer the AKS and AG virtual networks together + +See https://learn.microsoft.com/en-us/azure/application-gateway/tutorial-ingress-controller-add-on-existing. + +## Adapt / override Ingress template +In addition to this, the [Ingress template](../../deployment-configuration/helm/templates/ingress.yaml) has to be adapted to use the agic controller. + +An adaptation to the Ingress controller template that optionally supports AGIC is the following +```yaml +kind: Ingress +metadata: + name: {{ .Values.ingress.name | quote }} + annotations: + {{- if eq .Values.ingress.className "azure-application-gateway" }} + appgw.ingress.kubernetes.io/backend-path-prefix: / + {{- if $tls }} + appgw.ingress.kubernetes.io/appgw-ssl-certificate: "nwwcssl" + {{- end }} + appgw.ingress.kubernetes.io/ssl-redirect: {{ (and $tls .Values.ingress.ssl_redirect) | quote }} + {{- else }} + nginx.ingress.kubernetes.io/ssl-redirect: {{ (and $tls .Values.ingress.ssl_redirect) | quote }} + nginx.ingress.kubernetes.io/proxy-body-size: '{{ .Values.proxy.payload.max }}m' + nginx.ingress.kubernetes.io/proxy-buffer-size: '128k' + nginx.ingress.kubernetes.io/from-to-www-redirect: 'true' + nginx.ingress.kubernetes.io/rewrite-target: /$1 + nginx.ingress.kubernetes.io/auth-keepalive-timeout: {{ .Values.proxy.timeout.keepalive | quote }} + nginx.ingress.kubernetes.io/proxy-read-timeout: {{ .Values.proxy.timeout.read | quote }} + nginx.ingress.kubernetes.io/proxy-send-timeout: {{ .Values.proxy.timeout.send | quote }} + nginx.ingress.kubernetes.io/use-forwarded-headers: {{ .Values.proxy.forwardedHeaders | quote }} + + {{- end }} + {{- if and (and (not .Values.local) (not .Values.certs)) $tls }} + kubernetes.io/tls-acme: 'true' + cert-manager.io/issuer: {{ printf "%s-%s" "letsencrypt" .Values.namespace }} + {{- end }} +``` \ No newline at end of file diff --git a/infrastructure/cluster-configuration/cluster-init.sh b/infrastructure/cluster-configuration/cluster-init.sh index bf9a7f5fb..748ca8b3d 100644 --- a/infrastructure/cluster-configuration/cluster-init.sh +++ b/infrastructure/cluster-configuration/cluster-init.sh @@ -1,3 +1,14 @@ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update -helm install ingress ingress-nginx/ingress-nginx -f ingress/values.yaml -n kube-system \ No newline at end of file +helm install ingress ingress-nginx/ingress-nginx -f ingress/values.yaml -n kube-system + +helm repo add jetstack https://charts.jetstack.io --force-update +helm install \ + cert-manager jetstack/cert-manager \ + --namespace cert-manager \ + --create-namespace \ + --version v1.17.2 \ + --set crds.enabled=true + + +helm install --name cert-manager --namespace cert-manager --version v0.14.0 jetstack/cert-manager --set webhook.enabled=false \ No newline at end of file diff --git a/infrastructure/cluster-configuration/gcp-setup.md b/infrastructure/cluster-configuration/gcp-setup.md new file mode 100644 index 000000000..afb41d34a --- /dev/null +++ b/infrastructure/cluster-configuration/gcp-setup.md @@ -0,0 +1,11 @@ +## Google GKE setup + +The easiest way to create the cluster is to use the GCP web console. Define a node pool that satisfies application requirements (no other particular requirements are generally necessary for the node pool/s). +The node pool can be always scaled to have a different node number, so the only important choice is the node type. +Also other node pools can be added at any point of time to the cluster to replace the original one. + +After creating the cluster, use `gcloud` command line client to get the credentials in your machine: +- `gcloud init` +- `gcloud container clusters get-credentials --zone us-central1-a ` + + diff --git a/infrastructure/cluster-configuration/ingress/values-aws.yaml b/infrastructure/cluster-configuration/ingress/values-aws.yaml new file mode 100644 index 000000000..fcbf780a7 --- /dev/null +++ b/infrastructure/cluster-configuration/ingress/values-aws.yaml @@ -0,0 +1,16 @@ +controller: + service: + annotations: + service.beta.kubernetes.io/aws-load-balancer-name: apps-ingress + service.beta.kubernetes.io/aws-load-balancer-type: external + service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing + service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip + service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: http + service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz + service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: 10254 + config: + http-snippet: | + proxy_cache_path /tmp/nginx-cache levels=1:2 keys_zone=static-cache:2m max_size=100m inactive=7d use_temp_path=off; + proxy_cache_key $scheme$proxy_host$request_uri; + proxy_cache_lock on; + proxy_cache_use_stale updating; \ No newline at end of file diff --git a/infrastructure/cluster-configuration/storageclass-default-aws.yaml b/infrastructure/cluster-configuration/storageclass-default-aws.yaml new file mode 100644 index 000000000..6c450e900 --- /dev/null +++ b/infrastructure/cluster-configuration/storageclass-default-aws.yaml @@ -0,0 +1,10 @@ +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: default + annotations: + storageclass.kubernetes.io/is-default-class: "true" +reclaimPolicy: Delete +provisioner: ebs.csi.eks.amazonaws.com +volumeBindingMode: Immediate +--- \ No newline at end of file diff --git a/infrastructure/cluster-configuration/storageclass.yaml b/infrastructure/cluster-configuration/storageclass.yaml index 97e42d96e..c17cfb680 100644 --- a/infrastructure/cluster-configuration/storageclass.yaml +++ b/infrastructure/cluster-configuration/storageclass.yaml @@ -1,8 +1,8 @@ apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: - name: persisted -reclaimPolicy: Retain -volumeProvisioner: kubernetes.io/gce-pd + name: standard +reclaimPolicy: Delete +provisioner: kubernetes.io/gce-pd volumeBindingMode: Immediate --- \ No newline at end of file From 1724b0e6f351a88c83761478e57d5c213ac67bee Mon Sep 17 00:00:00 2001 From: Filippo Ledda <46561561+filippomc@users.noreply.github.com> Date: Fri, 23 May 2025 10:28:26 +0200 Subject: [PATCH 2/4] Update docs/build-deploy/cluster-configuration.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/build-deploy/cluster-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/build-deploy/cluster-configuration.md b/docs/build-deploy/cluster-configuration.md index 9ca1eed7e..26bad003c 100644 --- a/docs/build-deploy/cluster-configuration.md +++ b/docs/build-deploy/cluster-configuration.md @@ -2,6 +2,6 @@ Cloud Harness deploys with helm and uses an ingress controller to serve application endpoints. -The ingress controller is not installed as a part of the application, and has to be installed separately. The imstallation process can be different depending on the Kubernetes provider. +The ingress controller is not installed as a part of the application, and has to be installed separately. The installation process can be different depending on the Kubernetes provider. See [here](../../infrastructure/cluster-configuration/README.md) for more information and resources for setting up your cluster. \ No newline at end of file From c7ee1bbc929ade1c7953cf8a4c94ff683a1e23f2 Mon Sep 17 00:00:00 2001 From: Filippo Ledda <46561561+filippomc@users.noreply.github.com> Date: Fri, 23 May 2025 10:28:43 +0200 Subject: [PATCH 3/4] Update infrastructure/cluster-configuration/README.md Co-authored-by: Alex --- infrastructure/cluster-configuration/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/infrastructure/cluster-configuration/README.md b/infrastructure/cluster-configuration/README.md index ed181ad21..8a0406eb8 100644 --- a/infrastructure/cluster-configuration/README.md +++ b/infrastructure/cluster-configuration/README.md @@ -5,7 +5,7 @@ ### TLDR; 1. Create Kubernetes cluster (e.g minikube or google cloud) 1. Initialize kubectl credentials to work with your cluster -1. Run `source cluster-init.sh` +1. Run `source cluster-init.sh` (This script installs the ingress-nginx controller and cert-manager using Helm in the configured k8s cluster.) ### Cert-manager From f3a812e18e7d2739d318d02a46dac9cc5aa3f5e2 Mon Sep 17 00:00:00 2001 From: Filippo Ledda <46561561+filippomc@users.noreply.github.com> Date: Fri, 23 May 2025 10:29:49 +0200 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- infrastructure/cluster-configuration/README.md | 2 +- infrastructure/cluster-configuration/aws-setup.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/infrastructure/cluster-configuration/README.md b/infrastructure/cluster-configuration/README.md index 8a0406eb8..21c341383 100644 --- a/infrastructure/cluster-configuration/README.md +++ b/infrastructure/cluster-configuration/README.md @@ -22,7 +22,7 @@ helm repo update helm install ingress-nginx ingress-nginx/ingress-nginx ``` -On localclusters and GCP/GKE, the nginx-ingress chart will deploy a Load Balancer with a given IP address, while . Use that address to create the CNames and A records for the website. +On localclusters and GCP/GKE, the nginx-ingress chart will deploy a Load Balancer with a given IP address, while in other environments, you may need to configure the Load Balancer manually. Use that address to create the CNames and A records for the website. diff --git a/infrastructure/cluster-configuration/aws-setup.md b/infrastructure/cluster-configuration/aws-setup.md index 9dd63f2a5..06099cc6e 100644 --- a/infrastructure/cluster-configuration/aws-setup.md +++ b/infrastructure/cluster-configuration/aws-setup.md @@ -90,7 +90,7 @@ the following CNAME entries are needed - *.myapp [LB_ADDRESS] -The easiest way to get the load balancer addressis to do the deployment and +The easiest way to get the load balancer address is to do the deployment and from the ingress with ```