|
| 1 | +--- |
| 2 | +id: eks |
| 3 | +title: EKS |
| 4 | +--- |
| 5 | + |
| 6 | +import useBaseUrl from '@docusaurus/useBaseUrl'; |
| 7 | + |
| 8 | +Xenit Kubernetes Framework supports both AKS and EKS, though AKS is our main platform. |
| 9 | +In this document we will describe how to setup XKF on EKS and how it differs from AKS. |
| 10 | + |
| 11 | +## Differences |
| 12 | + |
| 13 | +To setup XKF using EKS you still need a Azure environment. |
| 14 | + |
| 15 | +XKF is heavy relying on Azure AD (AAD) and we have developed our own tool to |
| 16 | +manage access to our clusters called [azad-kube-proxy](https://github.com/XenitAB/azad-kube-proxy). |
| 17 | + |
| 18 | +So our governance solution is still fully located in Azure together with our Terraform state |
| 19 | + |
| 20 | +### Repo structure |
| 21 | + |
| 22 | +This is how a AWS repo structure can look like: |
| 23 | + |
| 24 | +```txt |
| 25 | +├── Makefile |
| 26 | +├── README.md |
| 27 | +├── aws-core |
| 28 | +│ ├── main.tf |
| 29 | +│ ├── outputs.tf |
| 30 | +│ ├── variables |
| 31 | +│ │ ├── common.tfvars |
| 32 | +│ │ ├── dev.tfvars |
| 33 | +│ │ ├── prod.tfvars |
| 34 | +│ │ └── qa.tfvars |
| 35 | +│ └── variables.tf |
| 36 | +├── aws-eks |
| 37 | +│ ├── main.tf |
| 38 | +│ ├── outputs.tf |
| 39 | +│ ├── variables |
| 40 | +│ │ ├── common.tfvars |
| 41 | +│ │ ├── dev.tfvars |
| 42 | +│ │ ├── prod.tfvars |
| 43 | +│ │ └── qa.tfvars |
| 44 | +│ └── variables.tf |
| 45 | +├── azure-governance |
| 46 | +│ ├── main.tf |
| 47 | +│ ├── outputs.tf |
| 48 | +│ ├── variables |
| 49 | +│ │ ├── common.tfvars |
| 50 | +│ │ ├── dev.tfvars |
| 51 | +│ │ ├── prod.tfvars |
| 52 | +│ │ └── qa.tfvars |
| 53 | +│ └── variables.tf |
| 54 | +├── global.tfvars |
| 55 | +``` |
| 56 | + |
| 57 | +### EKS |
| 58 | + |
| 59 | +Just like in AKS we use Calico as our CNI. |
| 60 | + |
| 61 | +- AWS CNI don't support network policies |
| 62 | +- AWS CNI heavily limits how many pods we can run on a single node |
| 63 | +- We want to be consistent with AKS |
| 64 | + |
| 65 | +Just after setting up the EKS cluster we use a null_resource to first delete |
| 66 | +the AWS CNI daemon set and then install calico. |
| 67 | +This is all done before we add a single node to the cluster. |
| 68 | + |
| 69 | +After this we add a eks node group and Calico starts. |
| 70 | + |
| 71 | +### IRSA |
| 72 | + |
| 73 | +In AKS we use AAD Pod Identity to support access to Azure resources. |
| 74 | +We support the same thing in EKS but use IAM roles for service accounts IRSA. |
| 75 | + |
| 76 | +To make it easier to use IRSA we have developed a small terraform [module](https://github.com/XenitAB/terraform-modules/blob/main/modules/aws/irsa/README.md). |
| 77 | + |
| 78 | +## Bootstrap |
| 79 | + |
| 80 | +By default AWS CNI limits the amount of pods that you can have on a single node. |
| 81 | +Since we are using Calico we don't have this limit, |
| 82 | +but when setting up a default EKS environment the EKS [bootstrap script](https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh) |
| 83 | +defines a pod limit. To remove this limit we have created our own AWS launch template for our EKS node group. |
| 84 | + |
| 85 | +It sets `--use-max-pods false` and some needed kubernetes node labels, if these labels aren't set the EKS cluster is unable to "find" the nodes in the node group. |
| 86 | + |
| 87 | +## Tenants account peering |
| 88 | + |
| 89 | +In Azure we separates XKF and our tenants by using Resource Groups, in AWS we use separate accounts. |
| 90 | + |
| 91 | +To setup a VPC peering you need to know the target VPC id, this creates a chicken and egg problem. |
| 92 | +To workaround this problem we sadly have to run the eks/core module multiple times in both the XKF side and the tenant. |
| 93 | + |
| 94 | +Run Terraform in the following order: |
| 95 | + |
| 96 | +- XKF core without any vpc_peering_config_requester defined. |
| 97 | +- Tenant core without any vpc_peering_config_accepter defined. |
| 98 | +- XKF core define vpc_peering_config_requester, manually getting the needed information from the tenant account. |
| 99 | +- Tenant core define vpc_peering_config_accepter, manually getting the needed information from the XKF account. |
| 100 | + |
| 101 | +Make sure that you only have one peering request open at the same time, else the accepter side won't be able to find a unique request. |
| 102 | +Now you should be able to see the VPC peering connected on both sides. |
| 103 | + |
| 104 | +## Break glass |
| 105 | + |
| 106 | +We are very dependent on azad-proxy to work but if something happens with the |
| 107 | +ingress, azad-proxy or the AAD we need have ways of reaching the cluster. |
| 108 | + |
| 109 | +```bash |
| 110 | +aws eks --region eu-west-1 update-kubeconfig --name dev-eks1 --alias dev-eks1 --role-arn arn:aws:iam::111111111111:role/xkf-eu-west-1-dev-eks-admin |
| 111 | +``` |
0 commit comments