Nomad Enterprise has the nomad operator snapshot agent command, which you can use to periodically snapshot your cluster and ship backups to AWS S3 or whatever. We recommend deploying nomad operator snapshot agent as a Nomad job, but we don't have a recommended resource configuration that's been actually benchmarked.
CPU probably doesn't matter much because at worst that just slows things down, but memory is harder to handwave because of OOM. Even if we assume we're efficiently streaming, the RSS is going to include the paged-in parts of the Nomad binary. But this hasn't been specifically optimized the way we've done for logmon, etc. The bulk of the functionality is over in https://github.com/hashicorp/raft-snapshotagent (internal repo), which in turn pulls in Azure/AWS/GCP clients, which are all pretty large.
Let's benchmark this on a real cluster with a real external object storage, so that we can make accurate recommendations here.
Ref: #27890
Ref (internal): https://hashicorp.atlassian.net/browse/NMD-1426
Nomad Enterprise has the
nomad operator snapshot agentcommand, which you can use to periodically snapshot your cluster and ship backups to AWS S3 or whatever. We recommend deployingnomad operator snapshot agentas a Nomad job, but we don't have a recommended resource configuration that's been actually benchmarked.CPU probably doesn't matter much because at worst that just slows things down, but memory is harder to handwave because of OOM. Even if we assume we're efficiently streaming, the RSS is going to include the paged-in parts of the Nomad binary. But this hasn't been specifically optimized the way we've done for logmon, etc. The bulk of the functionality is over in https://github.com/hashicorp/raft-snapshotagent (internal repo), which in turn pulls in Azure/AWS/GCP clients, which are all pretty large.
Let's benchmark this on a real cluster with a real external object storage, so that we can make accurate recommendations here.
Ref: #27890
Ref (internal): https://hashicorp.atlassian.net/browse/NMD-1426