|
| 1 | += Guided OpenShift setup |
| 2 | + |
| 3 | +[abstract] |
| 4 | +-- |
| 5 | +Architecture documentation for a guided OpenShift setup tool that provides an interactive, state-aware installation experience for OpenShift clusters on VSHN supported cloud providers. |
| 6 | + |
| 7 | +The goal is an easy-to-use, and extensible installation framework that abstracts cloud provider specifics while ensuring consistent and repeatable deployments. |
| 8 | +-- |
| 9 | + |
| 10 | +== Overview |
| 11 | + |
| 12 | +== Problem statement |
| 13 | + |
| 14 | +Setting up OpenShift clusters on diverse cloud providers such as cloudscale and Exoscale is a complex, error-prone process requiring technical expertise, manual coordination, loads of state in environment variables (30+), ~100 steps, configuration files, Git repos, and the VSHN portal. |
| 15 | + |
| 16 | +The existing installation workflows are somewhat fragmented and hard to extend due to an array of assorted templates loosely tied together. |
| 17 | +A change in one template can have unforeseen consequences on other parts of the installation process. |
| 18 | +Every new cloud provider requires more branching paths and adds to the overall complexity. |
| 19 | + |
| 20 | +We eventually want to support a fully automated setup, without any manual steps involved. |
| 21 | +As we're not there quite yet and might never be, we need a solution where we can gradually automate more and more steps. |
| 22 | + |
| 23 | +.An example of the branching complexity in the installation process |
| 24 | +image::installation-branching.drawio.svg[alt="installation branching",width=400] |
| 25 | + |
| 26 | +== Goals |
| 27 | + |
| 28 | +* Provide an interactive, state-aware installation experience for OpenShift clusters on VSHN supported cloud providers. |
| 29 | +* Abstract away cloud provider specifics to ensure a consistent and repeatable deployment process. |
| 30 | +* Create an easy-to-use and extensible installation framework that can adapt to new requirements and cloud providers. |
| 31 | +* Enable gradual automation of the setup process, reducing the need for manual intervention over time. |
| 32 | +* Automate installation state management while still allowing the user to fix state issues manually if needed. |
| 33 | +* Allow static analysis if all inputs are given for every step, allowing easier iteration of the installation process. |
| 34 | + |
| 35 | +== Non-Goals |
| 36 | + |
| 37 | +* Fully automated setup without any manual steps involved (at least not initially). |
| 38 | +* Replacement of existing tools like `openshift-install` or `terraform`, but rather complementing them. |
| 39 | + |
| 40 | +== Architecture overview |
| 41 | + |
| 42 | +Setting up a cluster consists of multiple steps, each responsible for a specific part of the installation process. |
| 43 | + |
| 44 | +We've got plain text installation files containing the steps to perform, and a runner tool that looks up how to execute these steps while managing the installation state. |
| 45 | + |
| 46 | +=== Step definitions |
| 47 | + |
| 48 | +Steps are defined in plain text, each line representing a single step to perform. |
| 49 | +The format is heavily inspired by https://cucumber.io/docs#what-are-step-definitions[Gherkin] syntax used in BDD testing frameworks. |
| 50 | + |
| 51 | +. Gherkin like definition |
| 52 | +[source,gherkin] |
| 53 | +---- |
| 54 | +Given a cloudscale organization |
| 55 | +Given a Lieutenant cluster ID |
| 56 | +I upload the OpenShift image to cloudscale |
| 57 | +I prepare the Terraform configuration |
| 58 | +I create the loadbalancer on cloudscale |
| 59 | +I create the DNS records in our hieradata |
| 60 | +I create the bootstrap VM on cloudscale |
| 61 | +The bootstrap VM should be reachable |
| 62 | +I create the master VMs on cloudscale |
| 63 | +I create the infra VMs on cloudscale |
| 64 | +---- |
| 65 | + |
| 66 | +A step can be interactive ("Given a cloudscale organization"), asking the user for input, or non-interactive ("I create the bootstrap VM on cloudscale"), performing automated tasks based on the current state. |
| 67 | + |
| 68 | +Steps can depend on the output of previous steps, creating a directed acyclic graph (DAG) of dependencies which we should be able to statically analyze if all inputs are given. |
| 69 | + |
| 70 | +=== Step implementations |
| 71 | + |
| 72 | +Steps will be defined in a YAML file and the guided setup tool can load multiple step definition files. |
| 73 | +While YAML has well-documented issues, it's parsable by many languages and somewhat easy to read and write. |
| 74 | +Additionally, with a reasonable YAML linting configuration, the most egregious ambiguities can be caught before they become issues. |
| 75 | +The tools matches the step text using regex to find the correct implementation for each step. |
| 76 | +Steps can contain a script to execute, prompt for user input, and have metadata such as extended descriptions, inputs and outputs attached. |
| 77 | + |
| 78 | +All prompted user input can be provided by environment variables to allow for non-interactive execution as well. |
| 79 | + |
| 80 | +[source,yaml] |
| 81 | +---- |
| 82 | +steps: |
| 83 | + - match: Given a cloudscale organization <1> |
| 84 | + inputs: [] |
| 85 | + outputs: |
| 86 | + - cloudscale_rw_token |
| 87 | + description: | |
| 88 | + The cloudscale token might be retrieved from https://control.cloudscale.ch/service/MY_PROJECT/api-token. |
| 89 | +
|
| 90 | + The token needs to have read and write permissions. |
| 91 | + interaction: <2> |
| 92 | + type: prompt |
| 93 | + prompt: Please enter your cloudscale read/write API token |
| 94 | + into: cloudscale_rw_token |
| 95 | + run: | <3> |
| 96 | + echo "cloudscale_rw_token=$cloudscale_rw_token" >> $STATE <4> |
| 97 | + - match: I upload the OpenShift image to cloudscale |
| 98 | + inputs: |
| 99 | + - cloudscale_rw_token |
| 100 | + - cloudscale_zone <5> |
| 101 | + run: | |
| 102 | + ... upload logic ... |
| 103 | + outputs: |
| 104 | + - image_id |
| 105 | + - match: I prepare the Terraform configuration |
| 106 | + inputs: |
| 107 | + - cloudscale_rw_token |
| 108 | + - image_id |
| 109 | + outputs: |
| 110 | + - terraform_config |
| 111 | + - match: I create the cloudscale loadbalancer |
| 112 | + inputs: |
| 113 | + - terraform_config |
| 114 | + outputs: |
| 115 | + - loadbalancer_id |
| 116 | + - match: I create the bootstrap VM on cloudscale |
| 117 | + inputs: |
| 118 | + - terraform_config |
| 119 | + outputs: |
| 120 | + - loadbalancer_id <6> |
| 121 | +---- |
| 122 | +<1> Match field containing a regex. |
| 123 | +Used to identify the step implementation. |
| 124 | +<2> Interaction metadata, text prompt, yes/no, or selection from a list of options. |
| 125 | +<3> Each step can execute arbitrary shell scripts. |
| 126 | +<4> Scripts can write outputs to a state file for later steps to consume. |
| 127 | +This is managed by the runner tool, $STATE is an environment variable pointing to a temporary state file. |
| 128 | +<5> We don't define this input anywhere, this should error out during static analysis. |
| 129 | +<6> Optimally we don't allow redefining outputs, and we should error out during static analysis. |
| 130 | + |
| 131 | +=== State file |
| 132 | + |
| 133 | +The state file needs to be human-readable and human-fixable. |
| 134 | +We use a YAML file here as well. |
| 135 | + |
| 136 | +The tool should be able to upload the state file to a S3 compatible object storage to allow for other team members to resume an interrupted installation or help debugging issues. |
| 137 | +As there are secrets in the state file the tool should support encrypting the state file with a user provided password before uploading it. |
| 138 | +It should be possible to always ask for personalized tokens instead of storing them in the state file. |
| 139 | + |
| 140 | +[source,yaml] |
| 141 | +---- |
| 142 | +current_step: I upload the OpenShift image to cloudscale <1> |
| 143 | +
|
| 144 | +completed_steps: <2> |
| 145 | + - Given a cloudscale organization |
| 146 | + - Given a Lieutenant cluster ID |
| 147 | +
|
| 148 | +outputs: <3> |
| 149 | + cloudscale_rw_token: |
| 150 | + value: "mysecrettoken" |
| 151 | + image_id: |
| 152 | + value: "1234-5678-90ab-cdef" |
| 153 | +
|
| 154 | +artifacts: <4> |
| 155 | + terraform_config: |
| 156 | + path: "/path/to/generated/terraform.tfvars" |
| 157 | +---- |
| 158 | +<1> The current step or __FINAL__ if all steps are completed. |
| 159 | +This allows resuming an interrupted installation. |
| 160 | +We might also use last_step and derive the current step from that. |
| 161 | +This would allow us to remove the final marker, but might make user interaction with the state file harder. |
| 162 | +<2> A list of completed steps, technically not required, for easier debugging. |
| 163 | +<3> A map of all outputs from completed steps. |
| 164 | +<4> We might need to store files generated during the installation here as well. |
| 165 | +The simpler approach would be for the steps to just return paths to files, but cleanup might be tricky then. |
| 166 | + |
| 167 | +== Runner tool |
| 168 | + |
| 169 | +A runner tool will be responsible for executing the steps defined in the installation and YAML files. |
| 170 | +The tool has an interactive TUI showing the current step, progress, and terminal output of the current step. |
| 171 | + |
| 172 | +[source] |
| 173 | +---- |
| 174 | +$ guided-setup run cloudscale.guide.txt --state ./install-state.yaml --steps ./steps/*.yaml |
| 175 | +
|
| 176 | += Step 1/34: Given a cloudscale organization |
| 177 | +
|
| 178 | + The cloudscale token might be retrieved from https://control.cloudscale.ch/service/MY_PROJECT/api-token. |
| 179 | +
|
| 180 | + The token needs to have read and write permissions. |
| 181 | +
|
| 182 | +Please enter your cloudscale read/write API token: |
| 183 | +> *** |
| 184 | +---- |
| 185 | + |
| 186 | +[source] |
| 187 | +---- |
| 188 | +$ guided-setup run cloudscale.guide.txt --state ./install-state.yaml --steps ./steps/*.yaml |
| 189 | +
|
| 190 | += Step 3/34: I upload the OpenShift image to cloudscale |
| 191 | +
|
| 192 | + Checks for the presence of the OpenShift image in cloudscale and uploads it if not found. |
| 193 | +
|
| 194 | ++ mc cp vshncloudscale/openshift-vshn-4.12.6-cloudscale.qcow2.gz . |
| 195 | +[########################################] 100% |
| 196 | +
|
| 197 | +---- |
| 198 | + |
| 199 | +=== Static analysis |
| 200 | + |
| 201 | +The tools checks if all inputs for every step are satisfied by the previous steps and if no outputs are redefined. |
| 202 | + |
| 203 | +[source,bash] |
| 204 | +---- |
| 205 | +guided-setup analyze cloudscale.guide.txt --state ./install-state.yaml --steps ./steps/*.yaml |
| 206 | +
|
| 207 | +Error: Step "I upload the OpenShift image to cloudscale" is missing input "cloudscale_zone" at position 3 |
| 208 | +Error: Step "I create the bootstrap VM on cloudscale" output "loadbalancer_id" is redefined at position 5 |
| 209 | +Error: Step "I prepare the Terraform configuration" is defined multiple times at cloudscale-steps.yml:7 and exoscale-steps.yml:15 |
| 210 | +---- |
| 211 | + |
| 212 | +=== Documentation generation |
| 213 | + |
| 214 | +The tool can generate documentation for the installation process based on the step definitions, including descriptions, inputs, and outputs. |
| 215 | + |
| 216 | +[source,markdown] |
| 217 | +---- |
| 218 | +# Generated by: guided-setup generate-docs cloudscale.guide.txt --steps ./steps/*.yaml |
| 219 | +
|
| 220 | += TOC |
| 221 | +
|
| 222 | +* [Given a cloudscale organization](#i-have-a-cloudscale-organization) |
| 223 | +* [I upload the OpenShift image to cloudscale](#i-upload-the-openshift-image-to-cloudscale) |
| 224 | +
|
| 225 | += Steps |
| 226 | +
|
| 227 | +== Given a cloudscale organization |
| 228 | +
|
| 229 | +The cloudscale token might be retrieved from https://control.cloudscale.ch/service/MY_PROJECT/api-token. |
| 230 | +The token needs to have read and write permissions. |
| 231 | +
|
| 232 | +=== Inputs |
| 233 | +
|
| 234 | +None |
| 235 | +
|
| 236 | +=== Outputs |
| 237 | +
|
| 238 | +* cloudscale_rw_token |
| 239 | +
|
| 240 | +=== Prompts |
| 241 | +
|
| 242 | +* Please enter your cloudscale read/write API token |
| 243 | +
|
| 244 | +=== Script |
| 245 | +
|
| 246 | +``` |
| 247 | +echo "cloudscale_rw_token=$cloudscale_rw_token" >> $STATE |
| 248 | +``` |
| 249 | +
|
| 250 | +== I upload the OpenShift image to cloudscale |
| 251 | +
|
| 252 | +Checks for the presence of the OpenShift image in cloudscale and uploads it if not found. |
| 253 | +
|
| 254 | +=== Inputs |
| 255 | +
|
| 256 | +* cloudscale_rw_token |
| 257 | +* cloudscale_zone |
| 258 | +
|
| 259 | +=== Outputs |
| 260 | +
|
| 261 | +* image_id |
| 262 | +
|
| 263 | +=== Script |
| 264 | +
|
| 265 | +``` |
| 266 | +... upload logic ... |
| 267 | +``` |
| 268 | +---- |
| 269 | + |
| 270 | +=== Tool programming language |
| 271 | + |
| 272 | +We will implement the guided setup tool in Go. |
| 273 | +Go provides excellent support for IO operations and building standalone binaries. |
| 274 | +The team has lots of experience with Go, making it easier to maintain and extend the tool in the future. |
| 275 | +https://github.com/charmbracelet/bubbletea[Bubble Tea] allows building rich TUIs with a nice ELM-like architecture. |
| 276 | + |
| 277 | +=== Distribution |
| 278 | + |
| 279 | +The runner tool and all required binaries to execute the steps are bundled into a single container image for easy distribution and execution. |
0 commit comments