feat(BRE2-661): Add Support for San Francisco Compute#87
Conversation
* add scaffolding, and client authentication.
* return APITypeGlobal from GetAPIType function, as SFC accounts are not tied to specific regions.
* fix apiKey in SFCCredential struct
* scaffolding for instance.go
* add instance creation implementation with SSH key support
* add function to map the status of a node reported from SFC API to v1.LifecycleStatus in Brev
* implement GetInstance in sfcompute with node data retrieval inluding SSH Hostname / Public IP
* add TerminateInstance implementation with node release and delete logic
* set default SSH port to 2222, as is standard for our platform
* implement GetSSHHostname for retrieving the SSH hostname of an instance in sfcompute
* remove unneeded call to api for GetSSHHostname
* use VM ID instead of instance ID to retrieve SSH Hostname
* remove get ssh hostname function
* implement ListInstances
* add validation test for sfcompute with API key check and skip logic
* add getInstanceTypeID method for generating instance type IDs in sfcompute
* bump sfcnodes version to v0.1.0-alpha.4 which adds support for the /v0/zones endpoint
* implement GetLocations
* only return approved zones
* only return regions that have more than zero capacity instead of any zones that have capacity not equal to zero, in case the availability ever returns a negative number
* update location description to include formatted hardware type information. example: `sfc_hayesvalley_h100`
* return unavailable regions with v1.Location{Available: false}
* fix an error where a nil map was returned
* start implementing GetInstanceTypes
* fix tests failing due to ValidateRegionalInstanceTypes and ValidateStableInstanceTypeIDs fails errors
* set MaxPricePerNodePerHour to 1600 ($16/node/h, $2/gpu/h)
* add regions excelsior and yerba
* remove region excelsior
894ca32 to
29104ff
Compare
| t.Fatalf("failed to make client: %v", err) | ||
| } | ||
|
|
||
| instance, err := client.GetInstance(context.Background(), "6c7a3ade-1e59-4e04-af6e-365046995a81_test") |
There was a problem hiding this comment.
what's with the hardcoded id?
There was a problem hiding this comment.
Ah these are basically test tasks, so this was left over from testing.
| ) | ||
|
|
||
| const ( | ||
| maxPricePerNodeHour = 1600 |
| } | ||
|
|
||
| func sshKeyCloudInit(sshKey string) string { | ||
| script := fmt.Sprintf("#cloud-config\nssh_authorized_keys:\n - %s", sshKey) |
There was a problem hiding this comment.
do we want to double check for ufw, or does sfcompute have firewalls external?
There was a problem hiding this comment.
Yeah SFC essentially only has port 2222 open at an infrastructural level.
v1/providers/sfcompute/instance.go
Outdated
|
|
||
| func sfcStatusToLifecycleStatus(status string) v1.LifecycleStatus { | ||
| switch strings.ToLower(status) { | ||
| case "pending", "nodefailure", "unspecified", "awaitingcapacity", "unknown", "failed": |
There was a problem hiding this comment.
failed/nodefailure will sit in pending and then our 30 minute timeout will hit and turn these to failed, right?
There was a problem hiding this comment.
Ah good catch I will adjust these to map to LifecycleStatusFailed
| diskTypeSSD = "ssd" | ||
| ) | ||
|
|
||
| var allowedZones = []string{"hayesvalley", "yerba"} |
There was a problem hiding this comment.
do the zones have an "enabled" or "available" so we don't have to hardcode them?
There was a problem hiding this comment.
They have an "enabled" but we also want to explicitly exclude zones (per SFC's suggestion out of band).
No description provided.