Kubernetes notes repo used for reference and studying to hopefully get certification some day.
k8s - "kates"
kubectl - "koob e" or "koob" or "cube" plus "cuttle" or "C T L" or "control". It is the command line interface for working with a Kubernetes cluster.
Kubernetes doesn't exonerate administrators from having to thoroughly know and understand the complexity of the applications they're deploying. If you want a well tuned cluster, especially when running on cloud, it is very important to first understand the details of your application. Don't think to yourself that Kubernetes lets you reap all the benefits and skip the complexity. If anything it forces you to take time to deeply understand your architecture.
Started as Google open source project announced in 2014. Google used lessons learned from Borg, their internal data center cluster management platform. Goal was to make running containers in production easier since complexity of deploying, monitoring, networking, etc. increases greatly when going from single machine to multiple distributed machines. Google donated Kubernetes to Cloud Native Computing Foundation (CNCF), which is a project within the Linux Foundation, in July 2015.
Originally only supported Docker, but after Container Runtime Interface (CRI) and Open Container Initiatve (OCI) standardization things opened up a lot to allow other container runtimes.
Kubernetes clusters are made up of control plane node(s) and worker node(s). A control plane node runs the main manager (kube-controller-manager), the API server (kube-apiserver), a scheduler (kube-scheduler, optionally a cloud controller (cloud-controller-manager), and a datastore (e.g. etcd) which stores the state of the cluster, container settings, and the networking configuration.
kube-apiserverexposes a RESTful API for the cluster. You can communicate with it using thekubectlcommand line interface, write a custom client, or even use something likecurlto interact with the API directly. Primary manager of the cluster.kube-schedulerdetermines which is the best node to host a Pod of containers and uses an algorithm to do this. More details here.kube-controller-managermain manager?node-controllertakes care of nodes in the cluster. Things like onboarding new nodes in the cluster, handling when nodes become unavailable.replication-controllerensures desired number of containers are running.cloud-controller-managerinteracts with other tools, such as Rancher or DigitalOcean for third-party cluster management and reporting.
Every node in the system including the control plane and worker nodes run two containers, kube-proxy and kubelet.
- Container runtime like
containerdorcri-oto run containers kubeletcontainer receives PodSpecs for container configuration, downloads and manages any necessary resources and works with the container runtime on the local node to ensure containers are running as well as handle error modes like restarting containers upon failure. A PodSpec is a JSON or YAML blob that describes a Pod.kubeletwill work to configure the local node until the PodSpec has been met. It also sends back status to thekube-apiserverfor eventual persistence.kube-proxycontainer creates and manages local firewall rules and networking configuration to expose containers on the network. This allows containers on the cluster to connect to eachother.
kubeadm is a command line tool for bootstrapping a cluster. It allows for easy deployment of a control plane and joining workers to the cluster. It can even setup multi-control plane cluster. kubeadm uses Container Network Interface (CNI) specification as the default network interface mechanism. CNI is an emerging specification with associated libraries to write plugins that configure container networking and remove allocated resources when the container is deleted. Its aim is to provide a common interface between the various networking solutions and container runtimes. See more details here. There are many providers of CNI implementations now. Common ones are Calico, Flannel, Cilium, Kube Router.
crictl used for debugging container runtimes. Works across all container runtimes unlike ctr and nerdctl which are just for Docker containerd. Not generally used for Kubernetes cluster administration.
etcd is a key-value store used by Kubernetes to store stateful information about the cluster, container settings, and the networking configuration. It can be run as backing service on externan nodes or stacked on control plane notes.
etcdctl is a command line tool to interact with an etcd data store. Be aware that there are different versions of etcdctl and that etcdctl itself supports different API versions. Set the API version using the ETCDCTL_API environment variable. Commands are very different between version 2 and 3 of the API. See etcdctl usage for details.
A Pod is one or more containers which share an IP address, access to storage and namespace. Typically, one container in a Pod runs a primary application, while other containers in the Pod support the primary application.
Controllers interrogate the kube-apiserver for a particular resource state, then modify the cluster until it matches the declared state. One commonly used resource for containers is a Deployment. A Deployment deploys and manages a different resource called a ReplicaSet. A ReplicaSet is a resource which deploys multiple Pods, each with the same spec information. There are many other Resources such as Jobs and CronJobs to handle single or recurring tasks. You can also write Custom Resource Definitions which allows you to add your own Resources.
Other native Kubernetes resources:
- PersistentVolumeClaim (PVC) - Stores state (e.g. volume for database content, drupal code resources). Most cloud Kubernetes providers have default storage mechanism (e.g. Block Volume on Linode). Lots of the default volumes only allow mounting to one container which means if you try to scale up the Deployment you'll see Kubernetes throw multi-mount errors.
- Deployment - Stateless containerized component (e.g. MariaDB server process, a web fontend, etc.). Often reference PVCs to store state. Use initContainers to do pre-run tasks in a container (e.g. drupal themes setup which require a running drupal instance). Setup liveness and readiness probes which are basically heartbeat mechanisms. Be careful not to set these too stringent.
- Service - Exposes a Deployment. Can expose internal to the Kubernetes cluster or outside the cluster.
- ConfigMap - Configuration such as variables, files, etc. Config maps can be directly mounted into Deployments. You can use template to build config file, change file permissions.
- ClusterIssuer - Used to obtain things like certificates for exposed services.
- Job and CronJob - Run something once or periodically, similar to Linux Cron jobs. There are options to handle concurrency and parallelism. Be aware that k8s CronJobs are not super precise, for example it may run several seconds after your configured time so if you need very precise runs better to use different tool or build into your application.
- DaemonSet - Ensure that a single Pod is deployed on every node. Often used for logging, metrics, and security pods
- StatefulSet - Deploy Pods in a particular order, such that subsequent Pods are only deployed if previous Pods report a ready status. This can be useful for legacy applications which have runtime dependencies.
Context
A combination of user, cluster name and namespace. A convenient way to switch between combinations of permissions and restrictions. For example you may have a development cluster and a production cluster, or may be part of both the operations and architecture namespaces. This information is referenced from ~/.kube/config.
Resource Limits A way to limit the amount of resources consumed by a pod, or to request a minimum amount of resources reserved, but not necessarily consumed, by a pod. Limits can also be set per-namespaces, which have priority over those in the PodSpec.
Pod Security Admission A beta feature to restrict pod behavior in an easy-to-implement and easy-to-understand manner, applied at the namespace level when a pod is created. These will leverage three profiles: Privileged, Baseline, and Restricted policies.
Network Policies The ability to have an inside-the-cluster firewall. Ingress and Egress traffic can be limited according to namespaces and labels as well as typical network traffic characteristics.
Kubernetes Manifests are files that define a set of Kubernetes Operators that describe the state of the cluster in a declarative way.
When writing Manifests, separate Operators based on how you want to manage the cluster (e.g. separate database from the app). It is good practice to use labels and set resource limits even though these things aren't required. Labels are arbitrary strings which become part of the object metadata. These are selectors which can then be used when checking or changing the state of objects. A good way to start learning Manifests is to copy an existing Kubernetes Manifests or Helm chart and teak it. You'll get working examples running quickly while learning the configuration fields.
A namespace is a segregation of resources, upon which resource quotas and permissions can be applied. Kubernetes objects may be created in a namespace or be cluster-scoped. Users can be limited by the object verbs allowed per namespace. Use namespaces to organize related components of your application. Kubernetes allows you to deploy into a namespace. If no namespace given, Kubernetes will deploy to the default namespace.
When designing applications to run on k8s, there are lots of things to consider. Here are some helpful guiding questions:
- Is my application as decoupled as it could be?
- Is there anything that could be taken out and made into its own container?
- Is each container transient and does it properly react when other containers are transient? If yes, test it out with Chaos Money.
- Can I scale any particular component to meet workload demand?
Common design patterns for multi-container pods
- Sidecar: Adds a function not present in the main container. This code may not be needed in all deployments so rather than bloating an existing container, instead run it in a sidecar container. For example, log reporting. Use sidecar container when you need to run another service in support of an existing service or container. Since Kubernetes likes one service per container, it can get messy to try to run more than one service from a single container. This may come into play when you need something like a syslog server for a component. Run the syslog server in a sidecar container which will come up with the main container
- Adapter: Modfies data on ingress or egress to match needs of deployment. For example, converting a JSON payload to XML or a different format.
- Ambassador: Allows access to outside without having to implement a service. For example, routing/splitting traffic going to other containers.
- initContainer: Allows one or more containers to run only if one or more previous contaienrs run and exit successfully. For example, checksum verify or data and settings initialization.
Custom Resource Definitions allow you to add new k8s resources. A CRD always has a controller to retrieve the resource spec and apply it to the cluster. The functions encoded into a controller should be all the tasks a human would need to perform if deploying the application outside of Kubernetes. For example, Calico is an addon that provides several network and security related CRDs.
Lots of cloud providers have fully managed Kubernetes clusters. Here are some:
Simple way to think of Helm is that it's a way to share Kubernetes Manifests. These are called Helm charts and versions are referred to as "releases".
Be careful about using Helm early on in learning because there is a lot of complexity that Helm hides from you. While this hiding is done on purpose and part of its value add, it doesn't allow you to master Kubernetes.
Ingress is a Kubernetes managed way to configure inbound data requests to the cluster. It provides load balancing to set of backend services via an ingress controller. Can use nginx, haproxy, traefik, etc. for the ingress controller.
For example, we can use Helm chart to install nginx ingress.
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx
Shows up as another Kubernetes service. So you can find it with kubectl get svc
Kubernetes can hook directly into DNS provider using ExternalDNS.
Can use cert-manager to manage certificates for Kubernetes. Can install it a lot of ways, including Helm. After installing, you'll need to setup ClusterIssuer, which can be done via a Kubernetes manifest. See documentation.
Simple way to think about Rook is that it's RAID for Kuberentes volumes. Similar to Helm, be careful because it is more complex than it first appears. There's a lot of layers of things going on with it. If you need it, it is a great tool but don't start here for 101 volume mounts. Most use cases can probably use a Network File System (NFS).
Apache Bench is a command line HTTP benchmark tool, ab. Great tool to quickly test scaled sites and APIs that use HTTP. It's part of the Apache HTTP project. You can install it with apt install apache2-utils.
Sometimes called watch-loops or controllers allow you to write your own logic for managing Kubernetes native applications. This is useful for more complex applications/pods that you need to deploy. See more info here or browse existing ones on Operator Hub
kubevirt is one way to move existing Virtual Machine-based workloads that cannot be easily containerized to run on Kubernetes clusters. Can be helpful when migrating legacy applications.
You can use one of the many cloud services to do this such as Elastic, sumo logic, or Datadog. Be aware of cost because it can grow quite large if you send absolutely every log to the service. Another option is to run your own Elastic, LogStash, Kibana (ELK) stack within the Kubernetes cluster. This really only works to a certain scale and can be problematic if the cluster if having issues because you may not be able to see logs.
Cluster-wide metrics is not quite fully mature in native Kubernetes, so Prometheus is one common option used to gather metrics from nodes and even potentially some applications.
kubespray, kOps, or kubeadm to create Kubernetes cluster.
The Cloud Native Computing Fountation publishes a cirriculum outlining skills that are expected for each certification type.
There is also a Candidate Handbook which should be read and understood prior to sitting for the exam.
The following are fundamental k8s skills. Find more details here:
- Define, build and modify container images. For example, time yourself creating new pod with an nginx image. Show all containers running with a Ready status. Create a new service exposing the nginx pod as a nodePort. Update the pod to run thenginx:1.11-alpineimage and re-verify you can view the webserver via a nodePort
- Create, update, and troubleshoot services
- Utilize container logs
- Understand multi-container Pod design
- Implement probes and health check patterns. Add a LivenessProbe and a ReadinessProbe on port 80 of an nginx container.
- In
lab3folder there is abuild-review1.yamlthat creates a non-working deployment. Practice by fixing the deployment such that both containersare running and in a Ready state. The web server should listen on port 80, and the goproxy should listen on port 8080. Test it by viewing the default root page of the web server, then verify the HTTP GET activity logs in the container log. Remove all the resources you just created.
