-
Notifications
You must be signed in to change notification settings - Fork 309
BUG]: NIMCache creation fails - template references undefined storageClass value #1686
Description
Version
- nv-ingest chart: 26.1.2 (also verified on main branch)
- k8s-nim-operator: 3.1.0
Installation method
Kubernetes/Helm (NIM Operator mode)
Describe the bug
The NIMCache templates directly reference .Values.nimOperator.<service>.storage.pvc.storageClass, but this value is not defined in the default values.yaml for several NIM services (including nemotron_parse). When Helm renders a reference to an undefined value, it outputs null, which the NIM Operator CRD v3.1.0 rejects because it expects a string type.
Root cause analysis:
In helm/templates/nemotron-parse.yaml, the NIMCache template references:
storage:
pvc:
create: {{ .Values.nimOperator.nemotron_parse.storage.pvc.create }}
storageClass: {{ .Values.nimOperator.nemotron_parse.storage.pvc.storageClass }}
size: {{ .Values.nimOperator.nemotron_parse.storage.pvc.size }}
volumeAccessMode: {{ .Values.nimOperator.nemotron_parse.storage.pvc.volumeAccessMode }}But in helm/values.yaml, the nemotron_parse.storage.pvc section does NOT define storageClass:
nemotron_parse:
enabled: false
storage:
pvc:
create: true
size: "100Gi"
volumeAccessMode: "ReadWriteOnce"
# storageClass is NOT defined here!Note: While nimOperator.nimCache.pvc.storageClass IS defined as "default", the templates do not fall back to this global default - they directly reference the per-service value which doesn't exist.
Error message
Error: UPGRADE FAILED: failed to create resource: NIMCache.apps.nvidia.com "nemotron-parse" is invalid: [spec.storage.pvc.storageClass: Invalid value: "null": spec.storage.pvc.storageClass in body must be of type string: "null", <nil>: Invalid value: null: some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]
Steps to reproduce
- Deploy nv-ingest with
nemotron_parse.enabled: trueusing the default values (no override for storageClass) - Observe the CRD validation error
Expected behavior
Either:
- The templates should use
| defaultto fall back to the globalnimOperator.nimCache.pvc.storageClassvalue, OR - Each NIM service in values.yaml should define
storageClassexplicitly
Workaround
Explicitly set storageClass for each enabled NIM service in your values file:
nimOperator:
nemotron_parse:
enabled: true
storage:
pvc:
create: true
storageClass: "standard" # Must be explicit
size: "100Gi"
volumeAccessMode: ReadWriteOnceAffected NIM services
Checking values.yaml, the following services are missing storageClass in their storage.pvc section:
- nemotron_parse
- audio
- nemotron_nano_12b_v2_vl
Other services (page_elements, graphic_elements, table_structure, embedqa, ocr) also don't define storageClass but may work if they somehow inherit - needs verification.
Suggested fix
Option 1: Add default fallback in templates
Update each NIMCache template to fall back to the global default:
storageClass: {{ .Values.nimOperator.nemotron_parse.storage.pvc.storageClass | default .Values.nimOperator.nimCache.pvc.storageClass }}Option 2: Add storageClass to each service in values.yaml
Add storageClass to every NIM service's storage.pvc section:
nemotron_parse:
storage:
pvc:
create: true
storageClass: "" # Empty string to use cluster default, or inherit from nimCache
size: "100Gi"
volumeAccessMode: "ReadWriteOnce"Affected templates
helm/templates/nemotron-parse.yamlhelm/templates/audio.yaml(if similar structure)- Potentially others - recommend auditing all NIMCache templates