Skip to content

Conversation

@KrishnanPrash
Copy link
Contributor

@KrishnanPrash KrishnanPrash commented Nov 12, 2025

Overview:

With #3988, we have functional image decoding in the frontend for any b64 or http urls passed with the inference request. This PR builds on top of #3988, and implements the nixl read() portion of the image decoding workflow for the backend.

Details:

Look at handlers.py for the additions to the DECODED workflow.

milesial and others added 8 commits November 10, 2025 14:18
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
@KrishnanPrash KrishnanPrash requested review from a team as code owners November 12, 2025 21:06
@KrishnanPrash KrishnanPrash marked this pull request as draft November 12, 2025 21:06
@github-actions github-actions bot added the feat label Nov 12, 2025
@KrishnanPrash KrishnanPrash reopened this Nov 13, 2025
…h/vllm-nixl-read

Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Comment on lines +127 to +166
async def _read_decoded_image_via_nixl(
self, decoded_meta: Dict[str, Any]
) -> PIL.Image.Image:
"""Read decoded image via NIXL RDMA and convert to PIL.Image."""
# Lazy-init connector
if self._connector is None:
self._connector = connect.Connector()
await self._connector.initialize()
logger.info("NIXL connector initialized for decoded media")

# Extract fields
meta_str = decoded_meta["nixl_metadata"]
desc = decoded_meta["nixl_descriptor"]
shape = decoded_meta["shape"]

# Create tensor to receive RDMA data
tensor = torch.empty(shape, dtype=torch.uint8)

# Build RdmaMetadata from frontend-provided descriptor
# Frontend sends compressed metadata (matches Python nixl_connect)
rdma_meta = RdmaMetadata(
descriptors=[
SerializedDescriptor(
device="cpu"
if desc.get("mem_type") == "Dram"
else f"cuda:{desc.get('device_id', 0)}",
ptr=desc["addr"],
size=desc["size"],
)
],
nixl_metadata=meta_str,
notification_key=f"img-{shape}",
operation_kind=int(OperationKind.READ),
)

# RDMA read
read_op = await self._connector.begin_read(
rdma_meta, connect.Descriptor(tensor)
)
await read_op.wait_for_completion()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a NIXL expert, so please let me know if I can be doing anything here better.

Comment on lines +125 to +129
// Compress metadata before base64 encoding (matches Python nixl_connect behavior)
// Backend expects: b64:<base64_of_compressed_bytes>
let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6));
encoder.write_all(&nixl_md)?;
let compressed = encoder.finish()?;
Copy link
Contributor Author

@KrishnanPrash KrishnanPrash Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, welcome any suggestions on correct nixl usage.

@KrishnanPrash
Copy link
Contributor Author

KrishnanPrash commented Nov 13, 2025

Open Question for Testing:

Ideally, we would like to test both test cases:

  1. Frontend URL pass through + backend decoding: This requires building without nixl.
  2. Frontend decoding + backend nixl read: This requires building dynamo with the command maturin develop --features media-nixl

Based on my conversation with @nv-tusharma, IIUC they suggested creating a separate workflow outside .github/workflows/container-backends-validation.yaml that would be a non-blocking test that would still run in our current CI.

@KrishnanPrash KrishnanPrash marked this pull request as ready for review November 13, 2025 23:54
@KrishnanPrash KrishnanPrash changed the title feat: Adding nixl read support for decoded path feat: Adding nixl read() multimodal support for vLLM backend Nov 13, 2025
Comment on lines +182 to +183
1. Url: Frontend passes URL, backend decodes
2. Decoded: Frontend decoded, NIXL RDMA transfer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a user control which one happens (1) or (2)?

Copy link
Contributor Author

@KrishnanPrash KrishnanPrash Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to opt-in/opt-out depends on what flags are included at build-time (--features media-nixl). Do you have a better workflow in mind? Might be worth mentioning on #3988 as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile to have a argument at startup time for this.
Which would be provided to frontend and workers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading @milesial 's doc here: https://github.com/ai-dynamo/dynamo/pull/3988/files?short_path=c817023#diff-c817023e4a07199f620dfc8dbf04021b0edc558d6b30b7e8bbb089615dc040ec

It sounds to me like passing media_decoder and media_fetcher to register_llm enables the feature / hints the frontend to do the decoding if available. Please read up on that part and see if that approach makes sense to you or not @indrajit96

Comment on lines +182 to +183
1. Url: Frontend passes URL, backend decodes
2. Decoded: Frontend decoded, NIXL RDMA transfer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile to have a argument at startup time for this.
Which would be provided to frontend and workers

let b64_encoded = general_purpose::STANDARD.encode(&nixl_md);
// Compress metadata before base64 encoding (matches Python nixl_connect behavior)
// Backend expects: b64:<base64_of_compressed_bytes>
let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
Rename the encoder to zlib_encoder.
Can confuse reader with Encoder from E->P->D

let b64_encoded = general_purpose::STANDARD.encode(&nixl_md);
// Compress metadata before base64 encoding (matches Python nixl_connect behavior)
// Backend expects: b64:<base64_of_compressed_bytes>
let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding.
I'm curious you don't need to uncompress on the worker?

@rmccorm4 rmccorm4 requested a review from ayushag-nv November 14, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants