Hello,
I'd like to express my appreciation for the xmanager tool! However, I've noticed a couple of issues regarding the specification of the A100 GPU and its associated machine types in the Vertex API, which I'd like to bring to your attention:
-
GPU Naming Discrepancy:
According to the Google Cloud resource documentation, the correct name for the A100 GPU with 80GB is A100_80GB, not A100_80GIB. This naming inconsistency leads to an error when requesting this resource. Reference: Google Cloud Documentation . Additionally, I've attached an image from the documentation.

-
Incorrect API Call Formation:
When the A100_80GIB is referenced in the Vertex API, it results in a string like 'NVIDIA_TESLA_A100_80GIB', whereas it should be NVIDIA_A100_80GB. I believe this error stems from the line: accelerator_type = 'NVIDIA_TESLA_' + str(resource).upper() in the vertex.py script .
-
Machine Type Mismatch:
The A100_80GB GPU should be associated with machine types such as 'a2-ultragpu-1g', 'a2-ultragpu-2g', 'a2-ultragpu-4g', and 'a2-ultragpu-8g'. However, the current specification only attempts to map A100 GPUs to the following machine types:
_A100_GPUS_TO_MACHINE_TYPE = {
1: 'a2-highgpu-1g',
2: 'a2-highgpu-2g',
4: 'a2-highgpu-4g',
8: 'a2-highgpu-8g',
16: 'a2-megagpu-16g',
}
Thank you for your attention to this matter.
Hello,
I'd like to express my appreciation for the xmanager tool! However, I've noticed a couple of issues regarding the specification of the A100 GPU and its associated machine types in the Vertex API, which I'd like to bring to your attention:
GPU Naming Discrepancy:

According to the Google Cloud resource documentation, the correct name for the A100 GPU with 80GB is
A100_80GB, notA100_80GIB. This naming inconsistency leads to an error when requesting this resource. Reference: Google Cloud Documentation . Additionally, I've attached an image from the documentation.Incorrect API Call Formation:
When the
A100_80GIBis referenced in the Vertex API, it results in a string like'NVIDIA_TESLA_A100_80GIB', whereas it should beNVIDIA_A100_80GB. I believe this error stems from the line:accelerator_type = 'NVIDIA_TESLA_' + str(resource).upper()in the vertex.py script .Machine Type Mismatch:
The A100_80GB GPU should be associated with machine types such as
'a2-ultragpu-1g','a2-ultragpu-2g','a2-ultragpu-4g', and'a2-ultragpu-8g'. However, the current specification only attempts to map A100 GPUs to the following machine types:Thank you for your attention to this matter.