-
Notifications
You must be signed in to change notification settings - Fork 697
feat: Adding nixl read() multimodal support for vLLM backend #4271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alexandrem/frontend-image-decoding-nixl
Are you sure you want to change the base?
feat: Adding nixl read() multimodal support for vLLM backend #4271
Conversation
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
…h/vllm-nixl-read Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
| async def _read_decoded_image_via_nixl( | ||
| self, decoded_meta: Dict[str, Any] | ||
| ) -> PIL.Image.Image: | ||
| """Read decoded image via NIXL RDMA and convert to PIL.Image.""" | ||
| # Lazy-init connector | ||
| if self._connector is None: | ||
| self._connector = connect.Connector() | ||
| await self._connector.initialize() | ||
| logger.info("NIXL connector initialized for decoded media") | ||
|
|
||
| # Extract fields | ||
| meta_str = decoded_meta["nixl_metadata"] | ||
| desc = decoded_meta["nixl_descriptor"] | ||
| shape = decoded_meta["shape"] | ||
|
|
||
| # Create tensor to receive RDMA data | ||
| tensor = torch.empty(shape, dtype=torch.uint8) | ||
|
|
||
| # Build RdmaMetadata from frontend-provided descriptor | ||
| # Frontend sends compressed metadata (matches Python nixl_connect) | ||
| rdma_meta = RdmaMetadata( | ||
| descriptors=[ | ||
| SerializedDescriptor( | ||
| device="cpu" | ||
| if desc.get("mem_type") == "Dram" | ||
| else f"cuda:{desc.get('device_id', 0)}", | ||
| ptr=desc["addr"], | ||
| size=desc["size"], | ||
| ) | ||
| ], | ||
| nixl_metadata=meta_str, | ||
| notification_key=f"img-{shape}", | ||
| operation_kind=int(OperationKind.READ), | ||
| ) | ||
|
|
||
| # RDMA read | ||
| read_op = await self._connector.begin_read( | ||
| rdma_meta, connect.Descriptor(tensor) | ||
| ) | ||
| await read_op.wait_for_completion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a NIXL expert, so please let me know if I can be doing anything here better.
| // Compress metadata before base64 encoding (matches Python nixl_connect behavior) | ||
| // Backend expects: b64:<base64_of_compressed_bytes> | ||
| let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6)); | ||
| encoder.write_all(&nixl_md)?; | ||
| let compressed = encoder.finish()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, welcome any suggestions on correct nixl usage.
|
Open Question for Testing: Ideally, we would like to test both test cases:
Based on my conversation with @nv-tusharma, IIUC they suggested creating a separate workflow outside |
| 1. Url: Frontend passes URL, backend decodes | ||
| 2. Decoded: Frontend decoded, NIXL RDMA transfer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does a user control which one happens (1) or (2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to opt-in/opt-out depends on what flags are included at build-time (--features media-nixl). Do you have a better workflow in mind? Might be worth mentioning on #3988 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worthwhile to have a argument at startup time for this.
Which would be provided to frontend and workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading @milesial 's doc here: https://github.com/ai-dynamo/dynamo/pull/3988/files?short_path=c817023#diff-c817023e4a07199f620dfc8dbf04021b0edc558d6b30b7e8bbb089615dc040ec
It sounds to me like passing media_decoder and media_fetcher to register_llm enables the feature / hints the frontend to do the decoding if available. Please read up on that part and see if that approach makes sense to you or not @indrajit96
| 1. Url: Frontend passes URL, backend decodes | ||
| 2. Decoded: Frontend decoded, NIXL RDMA transfer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worthwhile to have a argument at startup time for this.
Which would be provided to frontend and workers
| let b64_encoded = general_purpose::STANDARD.encode(&nixl_md); | ||
| // Compress metadata before base64 encoding (matches Python nixl_connect behavior) | ||
| // Backend expects: b64:<base64_of_compressed_bytes> | ||
| let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
Rename the encoder to zlib_encoder.
Can confuse reader with Encoder from E->P->D
| let b64_encoded = general_purpose::STANDARD.encode(&nixl_md); | ||
| // Compress metadata before base64 encoding (matches Python nixl_connect behavior) | ||
| // Backend expects: b64:<base64_of_compressed_bytes> | ||
| let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding.
I'm curious you don't need to uncompress on the worker?
Overview:
With #3988, we have functional image decoding in the frontend for any b64 or http urls passed with the inference request. This PR builds on top of #3988, and implements the nixl
read()portion of the image decoding workflow for the backend.Details:
Look at
handlers.pyfor the additions to theDECODEDworkflow.