Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API

Hello,

I'd like to express my appreciation for the xmanager tool! However, I've noticed a couple of issues regarding the specification of the A100 GPU and its associated machine types in the Vertex API, which I'd like to bring to your attention: 

1. **GPU Naming Discrepancy:** 
According to the Google Cloud resource documentation, the correct name for the A100 GPU with 80GB is `A100_80GB`, not `A100_80GIB`. This naming inconsistency leads to an error when requesting this resource. Reference: [Google Cloud Documentation](https://cloud.google.com/vertex-ai/docs/training/configure-compute#create_custom_job_gpus-gcloud) . Additionally, I've attached an image from the documentation.
![Documentation Screenshot](https://github.com/google-deepmind/xmanager/assets/26931037/87f90633-208e-4b1e-a2b9-32f2c92e8773) 

2. **Incorrect API Call Formation:** 
When the `A100_80GIB` is referenced in the Vertex API, it results in a string like `'NVIDIA_TESLA_A100_80GIB'`, whereas it should be `NVIDIA_A100_80GB`. I believe this error stems from the line: `accelerator_type = 'NVIDIA_TESLA_' + str(resource).upper()` in the [vertex.py script](https://github.com/google-deepmind/xmanager/blob/main/xmanager/cloud/vertex.py#L299C28-L299C28) . 
3. **Machine Type Mismatch:** 
The A100_80GB GPU should be associated with machine types such as `'a2-ultragpu-1g'`, `'a2-ultragpu-2g'`, `'a2-ultragpu-4g'`, and `'a2-ultragpu-8g'`. However, the current specification only attempts to map A100 GPUs to the following machine types:
```python
_A100_GPUS_TO_MACHINE_TYPE = {
    1: 'a2-highgpu-1g',
    2: 'a2-highgpu-2g',
    4: 'a2-highgpu-4g',
    8: 'a2-highgpu-8g',
    16: 'a2-megagpu-16g',
}
```
Thank you for your attention to this matter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions