Skip to content

Support GCP spot machines #905

@williamflynt

Description

@williamflynt

What would you like to be added:

Google Cloud recently rolled out spot VMs. Unlike the existing preemptible machine type, a spot VM has a notionally unlimited lifetime. Pricing for spot VMs is the same as for preemptible VMs, typically about 30% of the cost of a dedicated VM.

Why is this needed:

Imagine running a cluster using relatively expensive machines, like Tau 2D instances. Your pods are part of a big data serving platform with automatic shard management - great! That means you can use ephemeral VMs because the cluster will automatically reflow data away from non-operational nodes, and redistribute when new nodes come up.

Of course, this takes time, and loading the indices to memory also takes more time. The tradeoff is that we can run at a fraction of the cost!!

Overall, this use case is well-served by spot instances. It is poorly served by preemptible instances. When using preemtible VMs, the data shuffle and index build/load requires ~15% of life just for reshuffle data. Spot is more like 4% in practice. Overall impact on query latency exceeds that level of improvement.

Extra info (e.g. existing slack convo link):

The SPOT provisioning_model is supported in Terraform 4.23 as a beta feature.

(Optional, Beta) Describe the type of preemptible VM. This field accepts the value STANDARD or SPOT. If the value is STANDARD, there will be no discount. If this is set to SPOT, preemptible should be true and auto_restart should be false.

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#provisioning_model

Slack link

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions