Skip to content

Support for L4 GPUs #57

@hartikainen

Description

@hartikainen

I see references to L4 GPUs, e.g. here:

L4_24TH = 68
. This doesn't seem to work though because it's not been implemented. A simple test case in xmanager/cloud/vertex_test.py could look something like:

Details
  def test_get_machine_spec_l4(self):
    job = xm.Job(
        executable=local_executables.GoogleContainerRegistryImage('name', ''),
        executor=local_executors.Vertex(
            requirements=xm.JobRequirements(L4_24TH=2)
        ),
        args={},
    )
    machine_spec = vertex.get_machine_spec(job)
    self.assertDictEqual(
        machine_spec,
        {
            'machine_type': 'g2-standard-4',
            'accelerator_type': vertex.aip_v1.AcceleratorType.NVIDIA_L4,
            'accelerator_count': 2,
        },
    )

This gives an error:

Details
python -m xmanager.cloud.vertex_test
..........E.Creating CustomJob
CustomJob created. Resource name: <MagicMock name='WrappedClient().create_custom_job().name' id='4726134768'>
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('<MagicMock name='WrappedClient().create_custom_job().name' id='4726134768'>')
View Custom Job:
<MagicMock name='_dashboard_uri()' id='4726135440'>
Job launched at: <MagicMock name='_dashboard_uri()' id='4726135440'>
.
======================================================================
ERROR: test_get_machine_spec_l4 (__main__.VertexTest.test_get_machine_spec_l4)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/xmanager/main/xmanager/cloud/vertex_test.py", line 166, in test_get_machine_spec_l4
    machine_spec = vertex.get_machine_spec(job)
  File "/Users/user/xmanager/main/xmanager/cloud/vertex.py", line 305, in get_machine_spec
    spec['accelerator_type'] = aip_v1.AcceleratorType[accelerator_type]
                               ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/enum.py", line 791, in __getitem__
    return cls._member_map_[name]
           ~~~~~~~~~~~~~~~~^^^^^^
KeyError: 'NVIDIA_TESLA_L4_24TH'

----------------------------------------------------------------------
Ran 13 tests in 0.003s

FAILED (errors=1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions