Tweaks for FOSDEM demo #1186

nirs · 2024-01-26T22:49:46Z

Various tweaks for FOSDEM demo:

Use local git server
Use local registry
Add drenv suspend and resume commands
Extend kubevirt certificates lifetime
Extend cdi certificates lifetime
Move cdi insecure registry config to kustomization

Based on #1140

The script are tiny but it is nice to have verify them with flake8, pylint and black. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Add "path" config, so test config looks like OpenShift UI: repo: https://github.com/RamenDR/ocm-ramen-samples.git path: subscription branch: main name: busybox-sample namespace: busybox-sample With this we can use basic test to test any subscription based application in ocm-ramen-samples[1] and ocm-kubevirt-samples[2]. [1] https://github.com/RamenDR/ocm-ramen-samples [2] https://github.com/aglitke/ocm-kubevirt-samples Signed-off-by: Nir Soffer <nsoffer@redhat.com>

It is possible now to enable Kubernetes feature gates[1] using minikube --feature-gates option[2]. We will use this to enable StatefulSetAutoDeletePVC feature gate. Example config: profiles: - name: featured feature_gates: - StatefulSetAutoDeletePVC=true [1] https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ [2] https://minikube.sigs.k8s.io/docs/handbook/config/#enabling-feature-gates Signed-off-by: Nir Soffer <nsoffer@redhat.com>

With this we can create a statesfulset with persistentVolumeClaimRetentionPolicy[1] to have the PVCs deleted when a stateful set is deleted. This policy is required for relocate, otherwise ramen get stuck waiting for vrs to become secondary. [1] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We use `--namespace ramen-system` but these are deployed in cluster scope. I guess the `--namespace` is ignored in this case since this code works as is. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Add `pvc_label` configuration so we can test any application. With this we can run basic-test with vms from ocm-kubeivrt-samples[1]. [1] https://github.com/aglitke/ocm-kubevirt-samples Signed-off-by: Nir Soffer <nsoffer@redhat.com>

The basic-test can be used now with OpenShift clusters (using new ocm-ramen-sampels providing subscription and dr kustomizations). The only issue is the dr policy - basic-test is hard coded to use `dr-policy` installed by ramenctl, which is not available in our OpenShift test clusters. Fix by using a dr policy owned by the test, created by the tests when deploying the application, and removed when undeploying the application. The name of the policy must be configured in the test `config.yaml`. To be able to run concurrent tests, each test config must have its own dr policy. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Since we can test any application now (e.g. busybox, kubvirt), we don't want to mention busybox in the logs. Use config['name'] when we can to make the logs more clear. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

The channel is part of the subscription kustomization in ocm-ramen-samples, so we don't need to deploy or undeploy it. The basic config uses now the new deployment from my repo. We will update the repo when the ocm-ramen-samples PR is merged. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

basic-test can be run now with custom configuration file. This can be used to run multiple tests concurrently. test/basic-test/run --config rbd-deploy.yaml $env 2>rbd.log & test/basic-test/run --config cephfs-deploy.yaml $env 2>cephfs.log & wait Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Add tests configurations for multiple applications for OpenShift and Kubernetes using ocm-ramen-samples repo. Currently using my own repo until the new applications are merged. To run test using a custom configuration use: basic-test/run \ --config configs/odr/busybox-regional-rbd-deploy.yaml \ env.yaml Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Using my repo until the relevant PR[1] is merged. [1] aglitke/ocm-kubevirt-samples#6 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This tiny tool reads test suites yaml and run the tests in parallel logging test logs to separate files. A test suite binds tests (e.g. basic-test) to application configurations (e.g. busybox-regional-rbd-deploy). We have 2 test suites: $ tree suites/ suites/ ├── basic-k8s.yaml └── basic-odr.yaml Example run with drenv created environment: $ ./drtest --outdir /tmp/k8s-logs suites/basic-k8s.yaml envs/regional-dr.yaml 2023-11-28 00:55:54,099 INFO Running 'Basic Kubernetes Regional DR tests' 2023-11-28 00:55:54,099 INFO Storing output to '/tmp/k8s-logs' 2023-11-28 00:55:54,101 INFO Starting test 'deploymnet' 2023-11-28 00:55:54,101 INFO Starting test 'statefulset' 2023-11-28 00:55:54,102 INFO Starting test 'daemonset' 2023-11-28 01:04:23,274 INFO Test 'daemonset' PASS 2023-11-28 01:04:24,161 INFO Test 'deploymnet' PASS 2023-11-28 01:04:53,600 INFO Test 'statefulset' PASS 2023-11-28 01:04:53,600 INFO PASS (3 pass, 0 fail) The test logs to separate file: $ tree /tmp/k8s-logs /tmp/k8s-logs ├── daemonset.log ├── deploymnet.log └── statefulset.log To test with OpenShift we need to create a tiny environment file: $ cat env.yaml ramen: hub: hub clusters: [cluster1, cluster2] topology: regional-dr And use a kubeconfig file with the clusters. The file can be created with `oc login` and some `oc config` commands, or using the oc-clusterset plugin: $ cat config.yaml clusters: - name: cluster1 url: perf1.example.com:6443 username: kubeadmin password: PeSkM-R6YcH-LyPZa-oTOO1 - name: cluster2 url: perf2.example.com:6443 username: kubeadmin password: ZjIZn-SFUyR-aE4gI-fJcfL - name: hub url: perf3.example.com:6443 username: kubeadmin password: 7C700-oVS3Q-25rtx-YMew5 current-context: hub $ oc clusterset login --config config.yaml --kubeconfig kubeconfig $ oc config get-contexts --kubeconfig kubeconfig CURRENT NAME CLUSTER AUTHINFO NAMESPACE cluster1 perf1-example-com:6443 kube:admin/perf1-example-com:6443 default cluster2 perf2-example-com:6443 kube:admin/perf2-example-com:6443 default * hub perf3-example-com:6443 kube:admin/perf3-example-com:6443 default Example run with the OpenShift environment: $ ./drtest --kubeconfig kubeconfig --outdir /tmp/odr-logs suites/basic-odr.yaml env.yaml 2023-11-29 23:45:14,849 INFO Running 'Basic OpenShift Regional DR tests' 2023-11-29 23:45:14,849 INFO Storing output to '/tmp/odr-logs' 2023-11-29 23:45:14,850 INFO Starting test 'rbd' 2023-11-29 23:45:14,850 INFO Starting test 'cephfs' 2023-11-29 23:54:24,599 INFO Test 'rbd' PASS 2023-11-29 23:54:51,461 INFO Test 'cephfs' PASS 2023-11-29 23:54:51,461 INFO PASS (2 pass, 0 fail) Signed-off-by: Nir Soffer <nsoffer@redhat.com>

https: //github.com/kubevirt/containerized-data-importer/releases/tag/v1.58.0 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

https: //github.com/kubevirt/kubevirt/releases/tag/v1.1.1 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

There is no point in using tow versions of the same image. Using this image in the CDI test can save time in the kubvirt tests later, using the cached image. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

CDI may become available before it is ready to use. If we try to use it while it is progressing we may fail with errors about missing CRDS. Wait until the progressing condition becomes false. Example run showing the issue: 2024-01-10 21:42:24,080 DEBUG [kubevirt/1] Deploying cdi cr 2024-01-10 21:42:25,674 DEBUG [kubevirt/1] Waiting until cdi cr is available 2024-01-10 21:42:26,005 DEBUG [kubevirt/1] cdi.cdi.kubevirt.io/cdi condition met We stopped waiting here... 2024-01-10 21:42:26,007 DEBUG [kubevirt/1] Waiting until cdi cr finished progressing 2024-01-10 21:42:39,472 DEBUG [kubevirt/1] cdi.cdi.kubevirt.io/cdi condition met But CDI finished progressing 13 seconds later. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We cannot use volsync with ramen yet, and the kubevirt environment is already too big. Without volsync we can remove the volumesnapshot addon and submariner, which does not handle well suspending of the machine running the minikube VMs. With this change we should be able to start an environment, suspend the laptop, and resume it in an environment with unreliable network or no network access. This will be useful for live demo in conferences. Keep volsync enabled in `regional-dr` and `regional-dr-hubless` to keep the submariner and volsync addons functional. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This is useful for starting a stopped working environment quickly without trying to redeploy everything. The main motivation is using a pre created environment in location with weak network like a conference. Other use cases are working around bugs in addons that do not work well when starting a stopped cluster, for example clusteradm. With `--skip-addons` we skip the `start` and `stop` hooks, but we do run the `test` hooks. This is useful for starting a stopped environment faster but testing that the environment works. To skip all hooks run with both `--skip-addons` and `--skip-tests`. Example run: $ drenv start --skip-addons --skip-tests $env 2023-11-20 00:59:25,341 INFO [rdr-kubevirt] Starting environment 2023-11-20 00:59:25,464 INFO [dr1] Starting minikube cluster 2023-11-20 00:59:29,566 INFO [hub] Starting minikube cluster 2023-11-20 00:59:29,578 INFO [dr2] Starting minikube cluster 2023-11-20 01:00:23,402 INFO [dr1] Cluster started in 57.94 seconds 2023-11-20 01:00:23,402 INFO [dr1] Configuring containerd 2023-11-20 01:00:24,936 INFO [dr1] Waiting until all deployments are available 2023-11-20 01:00:28,749 INFO [hub] Cluster started in 59.18 seconds 2023-11-20 01:00:28,750 INFO [hub] Waiting until all deployments are available 2023-11-20 01:00:53,834 INFO [dr2] Cluster started in 84.26 seconds 2023-11-20 01:00:53,834 INFO [dr2] Configuring containerd 2023-11-20 01:00:55,042 INFO [dr2] Waiting until all deployments are available 2023-11-20 01:01:01,063 INFO [hub] Deployments are available in 32.31 seconds 2023-11-20 01:01:09,482 INFO [dr1] Deployments are available in 44.55 seconds 2023-11-20 01:01:34,661 INFO [dr2] Deployments are available in 39.62 seconds 2023-11-20 01:01:34,661 INFO [rdr-kubevirt] Dumping ramen e2e config to '/home/nsoffer/.config/drenv/rdr-kubevirt' 2023-11-20 01:01:34,827 INFO [rdr-kubevirt] Environment started in 129.49 seconds Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Configure CDI to allow pulling from a local insecure registry. This is useful for demos in an environment with unreliable network, or for CI environment when we want to avoid random failures due to flaky network. The image must be pushed to the local registry, this is easy using standard podman push command. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Using a local git server we can deploy ocm applications without network access to github. This is useful for demos when the network is unreliable, for example in a conference. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Using local registry is useful for demos when network is unreliable, for example in a conference. It can also be used to avoid random failures when the network is flaky, by caching remove images locally. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Suspend or resume the underlying virtual machines. Assuming kvm2 driver to keep it simple for now, need to implement it better later so it works also with qemu2 driver. The use case is building the environment with good network, suspending it, and resuming it in an environment with flaky network for demo. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Using the local server to verify that we can demo kubevirt dr flows in environment with unreliable network. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

To avoid certificate renewals during testing. Without this I experienced this error: drenv.commands.Error: Command failed: command: ('kubectl', 'apply', '--context', 'dr1', '--kustomize=cr') exitcode: 1 error: Error from server (InternalError): error when applying patch: {"spec":{"configuration":{"developerConfiguration":{"featureGates":[]}}}} to: Resource: "kubevirt.io/v1, Resource=kubevirts", GroupVersionKind: "kubevirt.io/v1, Kind=KubeVirt" Name: "kubevirt", Namespace: "kubevirt" for: "cr": error when patching "cr": Internal error occurred: failed calling webhook "kubevirt-update-validator.kubevirt.io": failed to call webhook: Post "https://kubevirt-operator-webhook.kubevirt.svc:443/kubevirt-validate-update?timeout=10s": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-01-26T19:05:52Z is after 2024-01-26T16:24:46Z Thanks: Michael Henriksen <mhenriks@redhat.com> Signed-off-by: Nir Soffer <nsoffer@redhat.com>

To avoid certificate renewals during testing. Without this I experienced this error when starting a stopped environment after a day: drenv.commands.Error: Command failed: command: ('kubectl', 'apply', '--context', 'dr2', '--kustomize=disk') exitcode: 1 error: Error from server (InternalError): error when creating "disk": Internal error occurred: failed calling webhook "populator-validate.cdi.kubevirt.io": failed to call webhook: Post "https://cdi-api.cdi.svc:443/populator-validate?timeout=30s": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-01-28T14:08:01Z is after 2024-01-27T19:15:20Z Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Instead of patching the installed resource, patch it via kustomization. With this can can check the correctness using: kustomize build addons/cdi/cr Signed-off-by: Nir Soffer <nsoffer@redhat.com>

With this you can run the local registry as a systemd service starting at boot, instead of starting the registry manually when you want to use it. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Explain why we need Go 1.20 and how to maintain multiple Go versions so ramen can be built and tested while using newer default Go version. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When comparing PVs, skip comparing unset "Spec.ClaimRef.kind". This breaks validation when using KubeVirt VM, and actual resources in the system do not match the backed up resources in the s3 store. It is correct to ignore unset kind since this is an optional field[1]. Previously we failed with: Failed to restore PVs: failed to restore ClusterData for VolRep (failed to restore PVs and PVCs using profile list ([s3profile-perf8-ocs-storagecluster]): failed to restore all []v1.PersistentVolume. Total/Restored 1/0) And then the VRG will not make any progress. Now we consider unset "kind" as equal and continue the flow normally. [1] https://github.com/kubernetes/api/blob/f3648a53522eb60ea75d70d36a50c799f7e4e23b/core/v1/types.go#L6381 Bug: https://bugzilla.redhat.com/2262455 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs · 2024-02-22T21:52:43Z

Not needed now, replaced by #1213

nirs force-pushed the fosdem branch 4 times, most recently from f8aff04 to fb6c149 Compare January 28, 2024 21:31

nirs added 26 commits February 20, 2024 14:36

Lint also enable-dr and disable-dr scripts

fbb506c

The script are tiny but it is nice to have verify them with flake8, pylint and black. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Fix waiting for drcluster and drpolicy

fc39e36

We use `--namespace ramen-system` but these are deployed in cluster scope. I guess the `--namespace` is ignored in this case since this code works as is. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Fix hard coded pvc selector label

145a47c

Add `pvc_label` configuration so we can test any application. With this we can run basic-test with vms from ocm-kubeivrt-samples[1]. [1] https://github.com/aglitke/ocm-kubevirt-samples Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Move generic logs

11680b3

Since we can test any application now (e.g. busybox, kubvirt), we don't want to mention busybox in the logs. Use config['name'] when we can to make the logs more clear. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Add kubvirt configuration

f4daa30

Using my repo until the relevant PR[1] is merged. [1] aglitke/ocm-kubevirt-samples#6 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Upgrade CDI to latest release

d692122

https: //github.com/kubevirt/containerized-data-importer/releases/tag/v1.58.0 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Update kubvirt to latest release

cb0607d

https: //github.com/kubevirt/kubevirt/releases/tag/v1.1.1 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Use the standard cirros image in CDR test

6ac43a9

There is no point in using tow versions of the same image. Using this image in the CDI test can save time in the kubvirt tests later, using the cached image. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

How to configure local git server

eefc073

Using a local git server we can deploy ocm applications without network access to github. This is useful for demos when the network is unreliable, for example in a conference. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Use kubvirt sample from local git server

c16031d

Using the local server to verify that we can demo kubevirt dr flows in environment with unreliable network. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs added 2 commits February 20, 2024 14:37

Move cdi insecure registry config to kustomization

7f02490

Instead of patching the installed resource, patch it via kustomization. With this can can check the correctness using: kustomize build addons/cdi/cr Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Add systemd units for the registry

44aff58

With this you can run the local registry as a systemd service starting at boot, instead of starting the registry manually when you want to use it. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs force-pushed the fosdem branch from fb6c149 to 44aff58 Compare February 20, 2024 12:38

nirs added 2 commits February 20, 2024 15:06

Document Go 1.20 requirement

b6f0019

Explain why we need Go 1.20 and how to maintain multiple Go versions so ramen can be built and tested while using newer default Go version. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs closed this Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tweaks for FOSDEM demo #1186

Tweaks for FOSDEM demo #1186

Uh oh!

nirs commented Jan 26, 2024 •

edited

Loading

Uh oh!

nirs commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tweaks for FOSDEM demo #1186

Tweaks for FOSDEM demo #1186

Uh oh!

Conversation

nirs commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirs commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nirs commented Jan 26, 2024 •

edited

Loading