Fix for GPU accelerated vector indexing silently falling back to using CPU instead by rahulgoswami · Pull Request #4328 · apache/solr

rahulgoswami · 2026-04-23T22:04:51Z

https://issues.apache.org/jira/browse/SOLR-18210

Description

GPU accelerated vector indexing uses the Lucene99AcceleratedHNSWVectorsFormat coming from cuvs-lucene library. It currently falls back to using Lucene99HnswVectorsWriter (which uses the CPU instead) due to a library loading issue.

Log lines observed:

2026-04-23 06:20:37.554 INFO (qtp858232531-60-null-3) [ x:techproducts t:null-3] o.a.s.c.CuVSCodec Initializing Lucene99AcceleratedHNSWVectorsFormat with parameter values: cuvsWriterThreads 32, cuvsIntGraphDegree 128, cuvsGraphDegree 64, cuvsHnswLayers 1, cuvsHnswM 16, cuvsHnswEfConstruction 100
2026-04-23 06:20:37.556 WARN (qtp858232531-60-null-3) [ x:techproducts t:null-3] c.n.c.l.Utils Exception occurred during creation of cuVS resources. java.lang.NoClassDefFoundError: Could not initialize class com.nvidia.cuvs.spi.CuVSServiceProvider$Holder
2026-04-23 06:20:37.556 INFO (qtp858232531-60-null-3) [ x:techproducts t:null-3] o.a.s.c.CuVSCodec Initializing Lucene99AcceleratedHNSWVectorsFormat with parameter values: cuvsWriterThreads 32, cuvsIntGraphDegree 128, cuvsGraphDegree 64, cuvsHnswLayers 1, cuvsHnswM 16, cuvsHnswEfConstruction 100
2026-04-23 06:20:37.559 WARN (qtp858232531-60-null-3) [ x:techproducts t:null-3] c.n.c.l.Lucene99AcceleratedHNSWVectorsFormat GPU based indexing not supported, falling back to using the Lucene99HnswVectorsWriter
2026-04-23 06:20:38.104 WARN (qtp858232531-60-null-3) [ x:techproducts t:null-3] c.n.c.l.Utils Exception occurred during creation of cuVS resources. java.lang.NoClassDefFoundError: Could not initialize class com.nvidia.cuvs.spi.CuVSServiceProvider$Holder

Solution

Root Cause: Initialization Race Between Two Independent Code Paths

There are two independent code paths that access com.nvidia.cuvs.spi.CuVSServiceProvider$Holder.INSTANCE, but only one of them loads the required libcudart.so native library first. The wrong one wins the race.

Path A : GpuMetricsService (runs FIRST, does NOT load libcudart)

org.apache.solr.cuvs.GpuMetricsService is started on a ScheduledExecutorService during CoreContainer initialization. Its updateGpuMetrics() method directly
calls:

GpuMetricsService.updateGpuMetrics() // scheduled executor thread
→ CuVSProvider.provider() // cuvs-java: CuVSProvider.java:159
→ CuVSServiceProvider$Holder.INSTANCE // triggers $Holder.

This triggers $Holder class init → loadProvider() → builtinProvider(), → eventually lands in "throws UnsatisfiedLinkError: unresolved symbol: cudaMemcpyAsync" , causing $Holder init to fail.

Once the class initializer fails, all future access throws NoClassDefFoundError.

Path B : Utils.cuVSResourcesOrNull() (runs SECOND, DOES load libcudart)

Lucene99AcceleratedHNSWVectorsFormat class init calls Utils.cuVSResourcesOrNull() (class com.nvidia.cuvs.lucene.Utils):

static CuVSResources cuVSResourcesOrNull() {
    try {
        System.loadLibrary("cudart");            // loads libcudart.so into this classloader
    } catch (UnsatisfiedLinkError e) {
        log.warning("Could not load CUDA runtime library: " + e.getMessage());
    }
    return CuVSResources.create();               // would trigger $Holder.<clinit> successfully
}

This method correctly calls System.loadLibrary("cudart") before touching $Holder. If it ran first, cudaMemcpyAsync would be resolvable via SymbolLookup.loaderLookup() and GPU init would succeed. But by the
time this path runs, $Holder is already poisoned by Path A.

This causes Lucene99AcceleratedHNSWVectorsFormat.supported() to return false, causing a silent fallback to Lucene99HnswVectorsWriter, whereby the indexing succeeds successfully with a log warning "GPU based indexing not supported, falling back to using the Lucene99HnswVectorsWriter"

Fix :
Load the cuda runtime library when GpuMetricsService initializes

Tests

Built the solr-cuvs.jar locally and placed it in WEB-INF/lib of the solr web app. Then ran vector indexing on an L40S GPU machine with the configuration mentioned in the document https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html#gpu-acceleration.
The log then prints "cuVS is supported so using the Lucene99AcceleratedHNSWVectorsWriter" coming from cuvs-lucene's Lucene99AcceleratedHNSWVectorsFormat

rahulgoswami · 2026-04-27T03:38:37Z

@chatman Requesting a review please. I'd add tests, but if the tests would be running on a CPU machine anyway, we'd continue to get the "GPU based indexing not supported, falling back to using the Lucene99HnswVectorsWriter" log line.

Not sure if we have any GPU instances provisioned for tests (?) fwiw I have tested the patch and it works.

rahulgoswami · 2026-04-30T15:51:26Z

Hi @chatman Circling back on this in case you got a chance to take a look? If it looks good, I'd like to go ahead and merge.

fix CUDA runtime not being available upon first call

6f9ed1c

rahulgoswami changed the title ~~Fix for GPU vector indexing silently falling back to using CPU instead~~ Fix for GPU accelerated vector indexing silently falling back to using CPU instead Apr 23, 2026

rahulgoswami added 2 commits April 23, 2026 18:55

changelog

fa6d1a3

changelog edit

a5ba39d

rahulgoswami requested a review from chatman April 27, 2026 03:36

rahulgoswami added 2 commits April 27, 2026 00:29

use exception object in log call instead of e.getMessage

cf7c3b0

tidy

a4b3ef3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for GPU accelerated vector indexing silently falling back to using CPU instead#4328

Fix for GPU accelerated vector indexing silently falling back to using CPU instead#4328
rahulgoswami wants to merge 5 commits intoapache:mainfrom
Commvault:fix-GPU-vector-indexing

rahulgoswami commented Apr 23, 2026 •

edited

Loading

Uh oh!

rahulgoswami commented Apr 27, 2026 •

edited

Loading

Uh oh!

rahulgoswami commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rahulgoswami commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Tests

Uh oh!

rahulgoswami commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rahulgoswami commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rahulgoswami commented Apr 23, 2026 •

edited

Loading

rahulgoswami commented Apr 27, 2026 •

edited

Loading

rahulgoswami commented Apr 30, 2026 •

edited

Loading