Skip to content

BUG]: NIMCache creation fails - template references undefined storageClass value #1686

@dafinley

Description

@dafinley

Version

  • nv-ingest chart: 26.1.2 (also verified on main branch)
  • k8s-nim-operator: 3.1.0

Installation method

Kubernetes/Helm (NIM Operator mode)

Describe the bug

The NIMCache templates directly reference .Values.nimOperator.<service>.storage.pvc.storageClass, but this value is not defined in the default values.yaml for several NIM services (including nemotron_parse). When Helm renders a reference to an undefined value, it outputs null, which the NIM Operator CRD v3.1.0 rejects because it expects a string type.

Root cause analysis:

In helm/templates/nemotron-parse.yaml, the NIMCache template references:

  storage:
    pvc:
      create: {{ .Values.nimOperator.nemotron_parse.storage.pvc.create }}
      storageClass: {{ .Values.nimOperator.nemotron_parse.storage.pvc.storageClass }}
      size: {{ .Values.nimOperator.nemotron_parse.storage.pvc.size }}
      volumeAccessMode: {{ .Values.nimOperator.nemotron_parse.storage.pvc.volumeAccessMode }}

But in helm/values.yaml, the nemotron_parse.storage.pvc section does NOT define storageClass:

  nemotron_parse:
    enabled: false
    storage:
      pvc:
        create: true
        size: "100Gi"
        volumeAccessMode: "ReadWriteOnce"
        # storageClass is NOT defined here!

Note: While nimOperator.nimCache.pvc.storageClass IS defined as "default", the templates do not fall back to this global default - they directly reference the per-service value which doesn't exist.

Error message

Error: UPGRADE FAILED: failed to create resource: NIMCache.apps.nvidia.com "nemotron-parse" is invalid: [spec.storage.pvc.storageClass: Invalid value: "null": spec.storage.pvc.storageClass in body must be of type string: "null", <nil>: Invalid value: null: some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

Steps to reproduce

  1. Deploy nv-ingest with nemotron_parse.enabled: true using the default values (no override for storageClass)
  2. Observe the CRD validation error

Expected behavior

Either:

  1. The templates should use | default to fall back to the global nimOperator.nimCache.pvc.storageClass value, OR
  2. Each NIM service in values.yaml should define storageClass explicitly

Workaround

Explicitly set storageClass for each enabled NIM service in your values file:

nimOperator:
  nemotron_parse:
    enabled: true
    storage:
      pvc:
        create: true
        storageClass: "standard"  # Must be explicit
        size: "100Gi"
        volumeAccessMode: ReadWriteOnce

Affected NIM services

Checking values.yaml, the following services are missing storageClass in their storage.pvc section:

  • nemotron_parse
  • audio
  • nemotron_nano_12b_v2_vl

Other services (page_elements, graphic_elements, table_structure, embedqa, ocr) also don't define storageClass but may work if they somehow inherit - needs verification.

Suggested fix

Option 1: Add default fallback in templates

Update each NIMCache template to fall back to the global default:

storageClass: {{ .Values.nimOperator.nemotron_parse.storage.pvc.storageClass | default .Values.nimOperator.nimCache.pvc.storageClass }}

Option 2: Add storageClass to each service in values.yaml

Add storageClass to every NIM service's storage.pvc section:

  nemotron_parse:
    storage:
      pvc:
        create: true
        storageClass: ""  # Empty string to use cluster default, or inherit from nimCache
        size: "100Gi"
        volumeAccessMode: "ReadWriteOnce"

Affected templates

  • helm/templates/nemotron-parse.yaml
  • helm/templates/audio.yaml (if similar structure)
  • Potentially others - recommend auditing all NIMCache templates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions