Skip to content

Commit 1eebd8b

Browse files
authored
Initial draft of guided setup (#434)
1 parent 4895786 commit 1eebd8b

File tree

4 files changed

+375
-0
lines changed

4 files changed

+375
-0
lines changed

docs/modules/ROOT/assets/images/installation-branching.drawio.svg

Lines changed: 4 additions & 0 deletions
Loading
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
= Guided Setup Tool for Cluster Installations
2+
3+
== Problem
4+
5+
Setting up OpenShift clusters on diverse cloud providers such as cloudscale and Exoscale is a complex, error-prone process requiring technical expertise, manual coordination, loads of state in environment variables (30+), ~100 steps, configuration files, Git repos, and the VSHN portal.
6+
7+
The existing installation workflows are somewhat fragmented and hard to extend due to an array of assorted templates loosely tied together.
8+
A change in one template can have unforeseen consequences on other parts of the installation process.
9+
Every new cloud provider requires more branching paths and adds to the overall complexity.
10+
11+
We eventually want to support a fully automated setup, without any manual steps involved.
12+
As we're not there quite yet and might never be, we need a solution where we can gradually automate more and more steps.
13+
14+
15+
.An example of the branching complexity in the installation process
16+
image::installation-branching.drawio.svg[alt="installation branching",width=400]
17+
18+
=== Goals
19+
20+
* Guide users through the necessary steps, ensuring all prerequisites are met before proceeding.
21+
* Allow user input where necessary.
22+
* Abstract away cloud provider specifics to ensure a consistent and repeatable deployment process.
23+
* Allow error recovery and resumption of interrupted installations.
24+
* Enable gradual automation of the setup process, reducing the need for manual intervention over time.
25+
* Allow easier iteration of interconnected steps/templates without breaking the overall installation process.
26+
27+
== Non-Goals
28+
29+
* Fully automated setup without any manual steps involved (at least not initially).
30+
* Replacement of existing tools like `openshift-install` or `terraform`, but rather complementing them.
31+
32+
== Proposals
33+
34+
=== Proposal 1: Use config management tool (for example Ansible) to create a guided setup tool
35+
36+
We use a configuration management tool like Ansible to create a guided setup tool that orchestrates the installation process.
37+
38+
There is a myriad of existing config management tools (Ansible, SaltStack, Puppet, Chef, etc.).
39+
Some of them allow for interactive prompts and guided workflows.
40+
Most of them also have good support for modularization, allowing us to break down the installation process into smaller, manageable tasks or roles.
41+
We can create a series of playbooks or roles that represent each step of the installation process.
42+
43+
State management and interruption handling isn't trivial with most of these tools, but can be achieved with some custom logic and careful planning.
44+
Most tools don't have a single state file that can be modified easily by the user, which makes it harder to resume from a specific point after fixing an issue in the state.
45+
46+
==== Advantages
47+
48+
* Big ecosystem
49+
* Mature tooling
50+
* Active community
51+
52+
==== Disadvantages
53+
54+
* Complexity in setup, configuration and maintenance
55+
* Not every framkework has good support for interactive prompts
56+
* State management can be tricky, especially when dealing with interrupted installations and resuming from a specific point.
57+
* Learning curve for team members unfamiliar with the chosen tool.
58+
59+
=== Proposal 2: Write a custom guided setup tool
60+
61+
We develop a custom tool tailored specifically for our installation process, focusing on the unique requirements and challenges we face.
62+
63+
A big focus can be put on state management and interruption handling, allowing users to easily resume from where they left off.
64+
We can design a user-friendly interface that guides users through the installation steps, providing clear instructions and feedback.
65+
The state management can be implemented in a way that allows users to easily modify the state file to fix issues and resume the installation process.
66+
67+
==== Advantages
68+
69+
* Full control over the implementation and user experience
70+
* Ability to design the tool specifically for our use case
71+
* Easier integration with existing workflows and tools
72+
* Potential for simpler state management and interruption handling
73+
74+
==== Disadvantages
75+
76+
* Limited community support and resources compared to popular config management tools
77+
78+
== Decision
79+
80+
We will proceed with Proposal 2: Write a custom guided setup tool.
81+
82+
== Rationale
83+
84+
We believe that a custom tool will provide us with the flexibility and control we need to address the specific challenges of our installation process.
85+
By tailoring the tool to our requirements, we can create a more seamless and efficient user experience.
86+
87+
== References
88+
89+
* [Ansible Documentation](https://docs.ansible.com/)
90+
* [Chef Documentation](https://docs.chef.io/)
Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
= Guided OpenShift setup
2+
3+
[abstract]
4+
--
5+
Architecture documentation for a guided OpenShift setup tool that provides an interactive, state-aware installation experience for OpenShift clusters on VSHN supported cloud providers.
6+
7+
The goal is an easy-to-use, and extensible installation framework that abstracts cloud provider specifics while ensuring consistent and repeatable deployments.
8+
--
9+
10+
== Overview
11+
12+
== Problem statement
13+
14+
Setting up OpenShift clusters on diverse cloud providers such as cloudscale and Exoscale is a complex, error-prone process requiring technical expertise, manual coordination, loads of state in environment variables (30+), ~100 steps, configuration files, Git repos, and the VSHN portal.
15+
16+
The existing installation workflows are somewhat fragmented and hard to extend due to an array of assorted templates loosely tied together.
17+
A change in one template can have unforeseen consequences on other parts of the installation process.
18+
Every new cloud provider requires more branching paths and adds to the overall complexity.
19+
20+
We eventually want to support a fully automated setup, without any manual steps involved.
21+
As we're not there quite yet and might never be, we need a solution where we can gradually automate more and more steps.
22+
23+
.An example of the branching complexity in the installation process
24+
image::installation-branching.drawio.svg[alt="installation branching",width=400]
25+
26+
== Goals
27+
28+
* Provide an interactive, state-aware installation experience for OpenShift clusters on VSHN supported cloud providers.
29+
* Abstract away cloud provider specifics to ensure a consistent and repeatable deployment process.
30+
* Create an easy-to-use and extensible installation framework that can adapt to new requirements and cloud providers.
31+
* Enable gradual automation of the setup process, reducing the need for manual intervention over time.
32+
* Automate installation state management while still allowing the user to fix state issues manually if needed.
33+
* Allow static analysis if all inputs are given for every step, allowing easier iteration of the installation process.
34+
35+
== Non-Goals
36+
37+
* Fully automated setup without any manual steps involved (at least not initially).
38+
* Replacement of existing tools like `openshift-install` or `terraform`, but rather complementing them.
39+
40+
== Architecture overview
41+
42+
Setting up a cluster consists of multiple steps, each responsible for a specific part of the installation process.
43+
44+
We've got plain text installation files containing the steps to perform, and a runner tool that looks up how to execute these steps while managing the installation state.
45+
46+
=== Step definitions
47+
48+
Steps are defined in plain text, each line representing a single step to perform.
49+
The format is heavily inspired by https://cucumber.io/docs#what-are-step-definitions[Gherkin] syntax used in BDD testing frameworks.
50+
51+
. Gherkin like definition
52+
[source,gherkin]
53+
----
54+
Given a cloudscale organization
55+
Given a Lieutenant cluster ID
56+
I upload the OpenShift image to cloudscale
57+
I prepare the Terraform configuration
58+
I create the loadbalancer on cloudscale
59+
I create the DNS records in our hieradata
60+
I create the bootstrap VM on cloudscale
61+
The bootstrap VM should be reachable
62+
I create the master VMs on cloudscale
63+
I create the infra VMs on cloudscale
64+
----
65+
66+
A step can be interactive ("Given a cloudscale organization"), asking the user for input, or non-interactive ("I create the bootstrap VM on cloudscale"), performing automated tasks based on the current state.
67+
68+
Steps can depend on the output of previous steps, creating a directed acyclic graph (DAG) of dependencies which we should be able to statically analyze if all inputs are given.
69+
70+
=== Step implementations
71+
72+
Steps will be defined in a YAML file and the guided setup tool can load multiple step definition files.
73+
While YAML has well-documented issues, it's parsable by many languages and somewhat easy to read and write.
74+
Additionally, with a reasonable YAML linting configuration, the most egregious ambiguities can be caught before they become issues.
75+
The tools matches the step text using regex to find the correct implementation for each step.
76+
Steps can contain a script to execute, prompt for user input, and have metadata such as extended descriptions, inputs and outputs attached.
77+
78+
All prompted user input can be provided by environment variables to allow for non-interactive execution as well.
79+
80+
[source,yaml]
81+
----
82+
steps:
83+
- match: Given a cloudscale organization <1>
84+
inputs: []
85+
outputs:
86+
- cloudscale_rw_token
87+
description: |
88+
The cloudscale token might be retrieved from https://control.cloudscale.ch/service/MY_PROJECT/api-token.
89+
90+
The token needs to have read and write permissions.
91+
interaction: <2>
92+
type: prompt
93+
prompt: Please enter your cloudscale read/write API token
94+
into: cloudscale_rw_token
95+
run: | <3>
96+
echo "cloudscale_rw_token=$cloudscale_rw_token" >> $STATE <4>
97+
- match: I upload the OpenShift image to cloudscale
98+
inputs:
99+
- cloudscale_rw_token
100+
- cloudscale_zone <5>
101+
run: |
102+
... upload logic ...
103+
outputs:
104+
- image_id
105+
- match: I prepare the Terraform configuration
106+
inputs:
107+
- cloudscale_rw_token
108+
- image_id
109+
outputs:
110+
- terraform_config
111+
- match: I create the cloudscale loadbalancer
112+
inputs:
113+
- terraform_config
114+
outputs:
115+
- loadbalancer_id
116+
- match: I create the bootstrap VM on cloudscale
117+
inputs:
118+
- terraform_config
119+
outputs:
120+
- loadbalancer_id <6>
121+
----
122+
<1> Match field containing a regex.
123+
Used to identify the step implementation.
124+
<2> Interaction metadata, text prompt, yes/no, or selection from a list of options.
125+
<3> Each step can execute arbitrary shell scripts.
126+
<4> Scripts can write outputs to a state file for later steps to consume.
127+
This is managed by the runner tool, $STATE is an environment variable pointing to a temporary state file.
128+
<5> We don't define this input anywhere, this should error out during static analysis.
129+
<6> Optimally we don't allow redefining outputs, and we should error out during static analysis.
130+
131+
=== State file
132+
133+
The state file needs to be human-readable and human-fixable.
134+
We use a YAML file here as well.
135+
136+
The tool should be able to upload the state file to a S3 compatible object storage to allow for other team members to resume an interrupted installation or help debugging issues.
137+
As there are secrets in the state file the tool should support encrypting the state file with a user provided password before uploading it.
138+
It should be possible to always ask for personalized tokens instead of storing them in the state file.
139+
140+
[source,yaml]
141+
----
142+
current_step: I upload the OpenShift image to cloudscale <1>
143+
144+
completed_steps: <2>
145+
- Given a cloudscale organization
146+
- Given a Lieutenant cluster ID
147+
148+
outputs: <3>
149+
cloudscale_rw_token:
150+
value: "mysecrettoken"
151+
image_id:
152+
value: "1234-5678-90ab-cdef"
153+
154+
artifacts: <4>
155+
terraform_config:
156+
path: "/path/to/generated/terraform.tfvars"
157+
----
158+
<1> The current step or __FINAL__ if all steps are completed.
159+
This allows resuming an interrupted installation.
160+
We might also use last_step and derive the current step from that.
161+
This would allow us to remove the final marker, but might make user interaction with the state file harder.
162+
<2> A list of completed steps, technically not required, for easier debugging.
163+
<3> A map of all outputs from completed steps.
164+
<4> We might need to store files generated during the installation here as well.
165+
The simpler approach would be for the steps to just return paths to files, but cleanup might be tricky then.
166+
167+
== Runner tool
168+
169+
A runner tool will be responsible for executing the steps defined in the installation and YAML files.
170+
The tool has an interactive TUI showing the current step, progress, and terminal output of the current step.
171+
172+
[source]
173+
----
174+
$ guided-setup run cloudscale.guide.txt --state ./install-state.yaml --steps ./steps/*.yaml
175+
176+
= Step 1/34: Given a cloudscale organization
177+
178+
The cloudscale token might be retrieved from https://control.cloudscale.ch/service/MY_PROJECT/api-token.
179+
180+
The token needs to have read and write permissions.
181+
182+
Please enter your cloudscale read/write API token:
183+
> ***
184+
----
185+
186+
[source]
187+
----
188+
$ guided-setup run cloudscale.guide.txt --state ./install-state.yaml --steps ./steps/*.yaml
189+
190+
= Step 3/34: I upload the OpenShift image to cloudscale
191+
192+
Checks for the presence of the OpenShift image in cloudscale and uploads it if not found.
193+
194+
+ mc cp vshncloudscale/openshift-vshn-4.12.6-cloudscale.qcow2.gz .
195+
[########################################] 100%
196+
197+
----
198+
199+
=== Static analysis
200+
201+
The tools checks if all inputs for every step are satisfied by the previous steps and if no outputs are redefined.
202+
203+
[source,bash]
204+
----
205+
guided-setup analyze cloudscale.guide.txt --state ./install-state.yaml --steps ./steps/*.yaml
206+
207+
Error: Step "I upload the OpenShift image to cloudscale" is missing input "cloudscale_zone" at position 3
208+
Error: Step "I create the bootstrap VM on cloudscale" output "loadbalancer_id" is redefined at position 5
209+
Error: Step "I prepare the Terraform configuration" is defined multiple times at cloudscale-steps.yml:7 and exoscale-steps.yml:15
210+
----
211+
212+
=== Documentation generation
213+
214+
The tool can generate documentation for the installation process based on the step definitions, including descriptions, inputs, and outputs.
215+
216+
[source,markdown]
217+
----
218+
# Generated by: guided-setup generate-docs cloudscale.guide.txt --steps ./steps/*.yaml
219+
220+
= TOC
221+
222+
* [Given a cloudscale organization](#i-have-a-cloudscale-organization)
223+
* [I upload the OpenShift image to cloudscale](#i-upload-the-openshift-image-to-cloudscale)
224+
225+
= Steps
226+
227+
== Given a cloudscale organization
228+
229+
The cloudscale token might be retrieved from https://control.cloudscale.ch/service/MY_PROJECT/api-token.
230+
The token needs to have read and write permissions.
231+
232+
=== Inputs
233+
234+
None
235+
236+
=== Outputs
237+
238+
* cloudscale_rw_token
239+
240+
=== Prompts
241+
242+
* Please enter your cloudscale read/write API token
243+
244+
=== Script
245+
246+
```
247+
echo "cloudscale_rw_token=$cloudscale_rw_token" >> $STATE
248+
```
249+
250+
== I upload the OpenShift image to cloudscale
251+
252+
Checks for the presence of the OpenShift image in cloudscale and uploads it if not found.
253+
254+
=== Inputs
255+
256+
* cloudscale_rw_token
257+
* cloudscale_zone
258+
259+
=== Outputs
260+
261+
* image_id
262+
263+
=== Script
264+
265+
```
266+
... upload logic ...
267+
```
268+
----
269+
270+
=== Tool programming language
271+
272+
We will implement the guided setup tool in Go.
273+
Go provides excellent support for IO operations and building standalone binaries.
274+
The team has lots of experience with Go, making it easier to maintain and extend the tool in the future.
275+
https://github.com/charmbracelet/bubbletea[Bubble Tea] allows building rich TUIs with a nice ELM-like architecture.
276+
277+
=== Distribution
278+
279+
The runner tool and all required binaries to execute the steps are bundled into a single container image for easy distribution and execution.

docs/modules/ROOT/partials/nav.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
** xref:oc4:ROOT:references/architecture/single_sign_on.adoc[]
1717
** xref:oc4:ROOT:references/architecture/espejote-in-cluster-templating-controller.adoc[]
1818
** xref:oc4:ROOT:references/architecture/sli_reporting.adoc[]
19+
** xref:oc4:ROOT:references/architecture/guided-setup-architecture.adoc[]
1920
2021
** xref:oc4:ROOT:references/cloudscale/architecture.adoc[cloudscale.ch]
2122

@@ -283,3 +284,4 @@
283284
** xref:oc4:ROOT:explanations/decisions/prometheusrule-controller.adoc[]
284285
** xref:oc4:ROOT:explanations/decisions/customer-facing-slo.adoc[]
285286
** xref:oc4:ROOT:explanations/decisions/feature-based-metering.adoc[]
287+
** xref:oc4:ROOT:explanations/decisions/guided-setup-tool.adoc[]

0 commit comments

Comments
 (0)