Skip to content

Helm upgrade errors imply broker is not shutdown gracefully #331

@michaeljmarshall

Description

@michaeljmarshall

Describe the bug
I observed apache/pulsar#18236 when doing a helm upgrade while testing out 3.0.0-candidate-1 for #326.

To Reproduce
My steps are listed in apache/pulsar#18236. The only difference is that instead of restarting the broker forcefully, I ran helm install test -f pulsar-chart-3.0.0/examples/values-minikube.yaml --version 2.9.4 apache-pulsar-dist-dev/pulsar to start the cluster and then I ran helm upgrade test -f pulsar-chart-3.0.0/examples/values-minikube.yaml --version 3.0.0 apache-pulsar-dist-dev/pulsar to upgrade it, which triggered a broker shutdown.

Here are the values:

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

## deployed withh emptyDir
volumes:
  persistence: true

# disabled AntiAffinity
affinity:
  anti_affinity: false

# disable auto recovery
components:
  autorecovery: false
  pulsar_manager: true

zookeeper:
  replicaCount: 1
  securityContext:
    fsGroup: 0
    fsGroupChangePolicy: "Always"

bookkeeper:
  replicaCount: 1
  securityContext:
    fsGroup: 0
    fsGroupChangePolicy: "Always"

broker:
  replicaCount: 1
  configData:
    ## Enable `autoSkipNonRecoverableData` since bookkeeper is running
    ## without persistence
    autoSkipNonRecoverableData: "true"
    # storage settings
    managedLedgerDefaultEnsembleSize: "1"
    managedLedgerDefaultWriteQuorum: "1"
    managedLedgerDefaultAckQuorum: "1"
    PULSAR_EXTRA_OPTS: "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005"

## disable monitoring stack
kube-prometheus-stack:
  enabled: false
  prometheusOperator:
    enabled: false
  grafana:
    enabled: false
  alertmanager:
    enabled: false
  prometheus:
    enabled: false

proxy:
  replicaCount: 1

Expected behavior
The helm upgrade should shutdown the broker gracefully. I suspect that the broker was not shutting down gracefully because of the code path that was executed. The broker had to load the cursor data from the bookkeeper instead of the zookeeper, as described in apache/pulsar#18237.

The main goal of this issue is to verify what kind of shutdown the broker has. Perhaps it is the case that there is an issue with the clean shutdown in Apache Pulsar 2.9.3.

Additional context
I reproduced this issue in minikube and gke. Both were running k8s 1.23.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions