Skip to content

Getting corrupted data when restoring from snapshot #203

@rks889

Description

@rks889

Getting corrupted data on restored volume in particular scenarios.

Steps to reproduce in my case:

  1. Run initial deployment having PVC (storageclass - piraeus-storage-replicated-lvm )
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-cluster
  namespace: ts-k8supgrade-dev
spec:
  selector:
    matchLabels:
      app: app-test-cluster
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate      
  replicas: 1
  template:
    metadata:
      labels:
        app: app-test-cluster
        name: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                      - k8s-dfw-prod-worker2
      containers:
      - name: nginx
        image: registry.k8s.io/nginx-slim:0.21
        imagePullPolicy: "IfNotPresent"
        ports:
        - containerPort: 80
          name: web
        resources:
          requests:
            memory: "250Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "1"
        securityContext:
          allowPrivilegeEscalation: false
        volumeMounts:
        - name: data
          mountPath: /usr/share/nginx/html
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: pvc-test-cluster-0
  1. PVC:
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  
pvc-test-cluster-0   Bound    pvc-a037d2a4-d290-41ea-92dc-7b5d4048c9a0   500Mi      RWO            piraeus-storage-replicated-lvm
  1. Copy html to a mounted volume:
kubectl cp ./index.html test-cluster-68f5648c54-wh2tr:/usr/share/nginx/html
  1. Check file
:/# md5sum /usr/share/nginx/html/index.html 
b857e29a868877e98f4cb955ef371ab5  /usr/share/nginx/html/index.html
:/# ls -lh /usr/share/nginx/html/index.html 
-rw-rw-r-- 1 1000 1000 183 Mar  5 16:31 /usr/share/nginx/html/index.html
:/# cat /usr/share/nginx/html/index.html    
<!DOCTYPE html>
<html>
    <head>
        <title>Example</title>
    </head>
    <body>
  1. Create VolumeSnapshot from existing PVC:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: snapshot-test-cluster-0-affw2
  namespace: ts-k8supgrade-dev
spec:
  volumeSnapshotClassName: linstor-csi-delete
  source:
    persistentVolumeClaimName: pvc-test-cluster-0
NAME                            READYTOUSE   SOURCEPVC            SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS        SNAPSHOTCONTENT                                    CREATIONTIME   AGE
snapshot-test-cluster-0-affw2   true         pvc-test-cluster-0                           500Mi         linstor-csi-delete   snapcontent-a2a749b9-071c-47b5-82c4-4c3bc870d5bb   5s             6s
  1. Create new PVC from volumeSnapshot:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restore-pvc-test-cluster-0
  namespace: ts-k8supgrade-dev
spec:
  storageClassName: piraeus-storage-replicated-lvm
  dataSource:
    name: snapshot-test-cluster-0-affw2
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi
  1. Start pod with PVC "restore-pvc-test-cluster-0" mounted
  2. Check file - the same size but different hash, contents not shown:
:/#  md5sum /usr/share/nginx/html/index.html                                                                                                                                                                                                    
9e292e386b7cebd21e02ad51f7ace213  /usr/share/nginx/html/index.html
:/# ls -lh /usr/share/nginx/html/index.html
-rw-rw-r-- 1 1000 1000 183 Mar  5 16:31 /usr/share/nginx/html/index.html
:/# cat /usr/share/nginx/html/index.html
:/#

Additional info:

  1. Kubernetes 1.31.4, worker OS: Oracle Linux 8.10 5.15.0-305.176.4.el8uek.x86_64
  2. piraeus_deployment_verion: v2.8.0
    piraeus_snapshot_controller_charts_version: 4.0.1
    piraeus_snapshot_controller_crd_version: v8.2.0
  3. If I shutdown original deployment before volumeSnapshot creation (before step 5) - data restore succeeds. Since that moment volumeSnapshot can be made even with running initial deployment - restored volume will contain a valid copy of data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions