Skip to content

Conversation

@nirs
Copy link
Member

@nirs nirs commented Jan 26, 2024

Various tweaks for FOSDEM demo:

  • Use local git server
  • Use local registry
  • Add drenv suspend and resume commands
  • Extend kubevirt certificates lifetime
  • Extend cdi certificates lifetime
  • Move cdi insecure registry config to kustomization

Based on #1140

@nirs nirs force-pushed the fosdem branch 4 times, most recently from f8aff04 to fb6c149 Compare January 28, 2024 21:31
nirs added 26 commits February 20, 2024 14:36
The script are tiny but it is nice to have verify them with flake8,
pylint and black.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Add "path" config, so test config looks like OpenShift UI:

    repo: https://github.com/RamenDR/ocm-ramen-samples.git
    path: subscription
    branch: main
    name: busybox-sample
    namespace: busybox-sample

With this we can use basic test to test any subscription based
application in ocm-ramen-samples[1] and ocm-kubevirt-samples[2].

[1] https://github.com/RamenDR/ocm-ramen-samples
[2] https://github.com/aglitke/ocm-kubevirt-samples

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
It is possible now to enable Kubernetes feature gates[1] using minikube
--feature-gates option[2]. We will use this to enable
StatefulSetAutoDeletePVC feature gate.

Example config:

    profiles:
      - name: featured
        feature_gates:
          - StatefulSetAutoDeletePVC=true

[1] https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
[2] https://minikube.sigs.k8s.io/docs/handbook/config/#enabling-feature-gates

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
With this we can create a statesfulset with
persistentVolumeClaimRetentionPolicy[1] to have the PVCs deleted when a
stateful set is deleted. This policy is required for relocate, otherwise
ramen get stuck waiting for vrs to become secondary.

[1] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We use `--namespace ramen-system` but these are deployed in cluster
scope. I guess the `--namespace` is ignored in this case since this code
works as is.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Add `pvc_label` configuration so we can test any application. With this
we can run basic-test with vms from ocm-kubeivrt-samples[1].

[1] https://github.com/aglitke/ocm-kubevirt-samples

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The basic-test can be used now with OpenShift clusters (using new
ocm-ramen-sampels providing subscription and dr kustomizations).
The only issue is the dr policy - basic-test is hard coded to use
`dr-policy` installed by ramenctl, which is not available in our
OpenShift test clusters.

Fix by using a dr policy owned by the test, created by the tests when
deploying the application, and removed when undeploying the application.
The name of the policy must be configured in the test `config.yaml`.

To be able to run concurrent tests, each test config must have its own
dr policy.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Since we can test any application now (e.g. busybox, kubvirt), we don't
want to mention busybox in the logs. Use config['name'] when we can to
make the logs more clear.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The channel is part of the subscription kustomization in
ocm-ramen-samples, so we don't need to deploy or undeploy it.

The basic config uses now the new deployment from my repo. We will
update the repo when the ocm-ramen-samples PR is merged.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
basic-test can be run now with custom configuration file. This can be
used to run multiple tests concurrently.

    test/basic-test/run --config rbd-deploy.yaml $env 2>rbd.log &
    test/basic-test/run --config cephfs-deploy.yaml $env 2>cephfs.log &
    wait

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Add tests configurations for multiple applications for OpenShift and
Kubernetes using ocm-ramen-samples repo.

Currently using my own repo until the new applications are merged.

To run test using a custom configuration use:

    basic-test/run \
        --config configs/odr/busybox-regional-rbd-deploy.yaml \
        env.yaml

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Using my repo until the relevant PR[1] is merged.

[1] aglitke/ocm-kubevirt-samples#6

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This tiny tool reads test suites yaml and run the tests in parallel
logging test logs to separate files.

A test suite binds tests (e.g. basic-test) to application configurations
(e.g. busybox-regional-rbd-deploy).

We have 2 test suites:

    $ tree suites/
    suites/
    ├── basic-k8s.yaml
    └── basic-odr.yaml

Example run with drenv created environment:

    $ ./drtest --outdir /tmp/k8s-logs suites/basic-k8s.yaml envs/regional-dr.yaml
    2023-11-28 00:55:54,099 INFO    Running 'Basic Kubernetes Regional DR tests'
    2023-11-28 00:55:54,099 INFO    Storing output to '/tmp/k8s-logs'
    2023-11-28 00:55:54,101 INFO    Starting test 'deploymnet'
    2023-11-28 00:55:54,101 INFO    Starting test 'statefulset'
    2023-11-28 00:55:54,102 INFO    Starting test 'daemonset'
    2023-11-28 01:04:23,274 INFO    Test 'daemonset' PASS
    2023-11-28 01:04:24,161 INFO    Test 'deploymnet' PASS
    2023-11-28 01:04:53,600 INFO    Test 'statefulset' PASS
    2023-11-28 01:04:53,600 INFO    PASS (3 pass, 0 fail)

The test logs to separate file:

    $ tree /tmp/k8s-logs
    /tmp/k8s-logs
    ├── daemonset.log
    ├── deploymnet.log
    └── statefulset.log

To test with OpenShift we need to create a tiny environment file:

    $ cat env.yaml
    ramen:
      hub: hub
      clusters: [cluster1, cluster2]
      topology: regional-dr

And use a kubeconfig file with the clusters. The file can be created
with `oc login` and some `oc config` commands, or using the
oc-clusterset plugin:

    $ cat config.yaml
    clusters:
      - name: cluster1
        url: perf1.example.com:6443
        username: kubeadmin
        password: PeSkM-R6YcH-LyPZa-oTOO1
      - name: cluster2
        url: perf2.example.com:6443
        username: kubeadmin
        password: ZjIZn-SFUyR-aE4gI-fJcfL
      - name: hub
        url: perf3.example.com:6443
        username: kubeadmin
        password: 7C700-oVS3Q-25rtx-YMew5
    current-context: hub

    $ oc clusterset login --config config.yaml --kubeconfig kubeconfig

    $ oc config get-contexts --kubeconfig kubeconfig
    CURRENT   NAME       CLUSTER                  AUTHINFO                            NAMESPACE
              cluster1   perf1-example-com:6443   kube:admin/perf1-example-com:6443   default
              cluster2   perf2-example-com:6443   kube:admin/perf2-example-com:6443   default
    *         hub        perf3-example-com:6443   kube:admin/perf3-example-com:6443   default

Example run with the OpenShift environment:

    $ ./drtest --kubeconfig kubeconfig --outdir /tmp/odr-logs suites/basic-odr.yaml env.yaml
    2023-11-29 23:45:14,849 INFO    Running 'Basic OpenShift Regional DR tests'
    2023-11-29 23:45:14,849 INFO    Storing output to '/tmp/odr-logs'
    2023-11-29 23:45:14,850 INFO    Starting test 'rbd'
    2023-11-29 23:45:14,850 INFO    Starting test 'cephfs'
    2023-11-29 23:54:24,599 INFO    Test 'rbd' PASS
    2023-11-29 23:54:51,461 INFO    Test 'cephfs' PASS
    2023-11-29 23:54:51,461 INFO    PASS (2 pass, 0 fail)

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
https: //github.com/kubevirt/containerized-data-importer/releases/tag/v1.58.0
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
https: //github.com/kubevirt/kubevirt/releases/tag/v1.1.1
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
There is no point in using tow versions of the same image. Using this
image in the CDI test can save time in the kubvirt tests later, using
the cached image.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
CDI may become available before it is ready to use. If we try to use it
while it is progressing we may fail with errors about missing CRDS. Wait
until the progressing condition becomes false.

Example run showing the issue:

    2024-01-10 21:42:24,080 DEBUG   [kubevirt/1] Deploying cdi cr
    2024-01-10 21:42:25,674 DEBUG   [kubevirt/1] Waiting until cdi cr is available
    2024-01-10 21:42:26,005 DEBUG   [kubevirt/1] cdi.cdi.kubevirt.io/cdi condition met

We stopped waiting here...

    2024-01-10 21:42:26,007 DEBUG   [kubevirt/1] Waiting until cdi cr finished progressing
    2024-01-10 21:42:39,472 DEBUG   [kubevirt/1] cdi.cdi.kubevirt.io/cdi condition met

But CDI finished progressing 13 seconds later.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We cannot use volsync with ramen yet, and the kubevirt environment is
already too big. Without volsync we can remove the volumesnapshot addon
and submariner, which does not handle well suspending of the machine
running the minikube VMs.

With this change we should be able to start an environment, suspend the
laptop, and resume it in an environment with unreliable network or no
network access. This will be useful for live demo in conferences.

Keep volsync enabled in `regional-dr` and `regional-dr-hubless` to keep
the submariner and volsync addons functional.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This is useful for starting a stopped working environment quickly
without trying to redeploy everything. The main motivation is using a
pre created environment in location with weak network like a conference.

Other use cases are working around bugs in addons that do not work well
when starting a stopped cluster, for example clusteradm.

With `--skip-addons` we skip the `start` and `stop` hooks, but we do run
the `test` hooks. This is useful for starting a stopped environment
faster but testing that the environment works. To skip all hooks run
with both `--skip-addons` and `--skip-tests`.

Example run:

    $ drenv start --skip-addons --skip-tests $env
    2023-11-20 00:59:25,341 INFO    [rdr-kubevirt] Starting environment
    2023-11-20 00:59:25,464 INFO    [dr1] Starting minikube cluster
    2023-11-20 00:59:29,566 INFO    [hub] Starting minikube cluster
    2023-11-20 00:59:29,578 INFO    [dr2] Starting minikube cluster
    2023-11-20 01:00:23,402 INFO    [dr1] Cluster started in 57.94 seconds
    2023-11-20 01:00:23,402 INFO    [dr1] Configuring containerd
    2023-11-20 01:00:24,936 INFO    [dr1] Waiting until all deployments are available
    2023-11-20 01:00:28,749 INFO    [hub] Cluster started in 59.18 seconds
    2023-11-20 01:00:28,750 INFO    [hub] Waiting until all deployments are available
    2023-11-20 01:00:53,834 INFO    [dr2] Cluster started in 84.26 seconds
    2023-11-20 01:00:53,834 INFO    [dr2] Configuring containerd
    2023-11-20 01:00:55,042 INFO    [dr2] Waiting until all deployments are available
    2023-11-20 01:01:01,063 INFO    [hub] Deployments are available in 32.31 seconds
    2023-11-20 01:01:09,482 INFO    [dr1] Deployments are available in 44.55 seconds
    2023-11-20 01:01:34,661 INFO    [dr2] Deployments are available in 39.62 seconds
    2023-11-20 01:01:34,661 INFO    [rdr-kubevirt] Dumping ramen e2e config to '/home/nsoffer/.config/drenv/rdr-kubevirt'
    2023-11-20 01:01:34,827 INFO    [rdr-kubevirt] Environment started in 129.49 seconds

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Configure CDI to allow pulling from a local insecure registry. This is
useful for demos in an environment with unreliable network, or for CI
environment when we want to avoid random failures due to flaky network.

The image must be pushed to the local registry, this is easy using
standard podman push command.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Using a local git server we can deploy ocm applications without network
access to github. This is useful for demos when the network is
unreliable, for example in a conference.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Using local registry is useful for demos when network is unreliable, for
example in a conference. It can also be used to avoid random failures
when the network is flaky, by caching remove images locally.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Suspend or resume the underlying virtual machines. Assuming kvm2 driver
to keep it simple for now, need to implement it better later so it works
also with qemu2 driver.

The use case is building the environment with good network, suspending
it, and resuming it in an environment with flaky network for demo.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Using the local server to verify that we can demo kubevirt dr flows in
environment with unreliable network.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To avoid certificate renewals during testing.

Without this I experienced this error:

    drenv.commands.Error: Command failed:
       command: ('kubectl', 'apply', '--context', 'dr1', '--kustomize=cr')
       exitcode: 1
       error:
          Error from server (InternalError): error when applying patch:
          {"spec":{"configuration":{"developerConfiguration":{"featureGates":[]}}}}
          to:
          Resource: "kubevirt.io/v1, Resource=kubevirts", GroupVersionKind: "kubevirt.io/v1, Kind=KubeVirt"
          Name: "kubevirt", Namespace: "kubevirt"
          for: "cr": error when patching "cr": Internal error occurred: failed calling webhook
               "kubevirt-update-validator.kubevirt.io": failed to call webhook: Post
               "https://kubevirt-operator-webhook.kubevirt.svc:443/kubevirt-validate-update?timeout=10s":
               tls: failed to verify certificate: x509: certificate has expired or is not yet valid:
               current time 2024-01-26T19:05:52Z is after 2024-01-26T16:24:46Z

Thanks: Michael Henriksen <mhenriks@redhat.com>
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To avoid certificate renewals during testing.

Without this I experienced this error when starting a stopped
environment after a day:

   drenv.commands.Error: Command failed:
      command: ('kubectl', 'apply', '--context', 'dr2', '--kustomize=disk')
      exitcode: 1
      error:
         Error from server (InternalError): error when creating "disk": Internal
         error occurred: failed calling webhook "populator-validate.cdi.kubevirt.io":
         failed to call webhook: Post "https://cdi-api.cdi.svc:443/populator-validate?timeout=30s":
         tls: failed to verify certificate: x509: certificate has expired or is not yet valid:
         current time 2024-01-28T14:08:01Z is after 2024-01-27T19:15:20Z

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added 2 commits February 20, 2024 14:37
Instead of patching the installed resource, patch it via kustomization.
With this can can check the correctness using:

    kustomize build addons/cdi/cr

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
With this you can run the local registry as a systemd service starting
at boot, instead of starting the registry manually when you want to use
it.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added 2 commits February 20, 2024 15:06
Explain why we need Go 1.20 and how to maintain multiple Go versions so
ramen can be built and tested while using newer default Go version.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When comparing PVs, skip comparing unset "Spec.ClaimRef.kind". This
breaks validation when using KubeVirt VM, and actual resources in the
system do not match the backed up resources in the s3 store. It is
correct to ignore unset kind since this is an optional field[1].

Previously we failed with:

    Failed to restore PVs: failed to restore ClusterData for VolRep
    (failed to restore PVs and PVCs using profile list
    ([s3profile-perf8-ocs-storagecluster]): failed to restore all
    []v1.PersistentVolume. Total/Restored 1/0)

And then the VRG will not make any progress. Now we consider unset
"kind" as equal and continue the flow normally.

[1] https://github.com/kubernetes/api/blob/f3648a53522eb60ea75d70d36a50c799f7e4e23b/core/v1/types.go#L6381

Bug: https://bugzilla.redhat.com/2262455
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs
Copy link
Member Author

nirs commented Feb 22, 2024

Not needed now, replaced by #1213

@nirs nirs closed this Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant