Skip to content

Support queue-related logic with kube-queue #1519

@zw0610

Description

@zw0610

For a deep learning cluster, it is common case that all kinds of tasks (like TFJob, MPIJob, Deployment, Statefulset, etc.) submitted by users are waiting for resource to be allocated. Unfortunately, Pod is the minimal scheduling unit, which brings hurdle to mange tasks the way other clusters like Slurm do.

To make up such a feature missing, @denkensk and I work together with other contributors to present a new queue system for tasks on Kubernetes cluster called kube-queue. Unlike the queue in volcano, kube-queue does not hijack the creation/submission of tasks. Instead, kube-queue relies operators of each task API (like TFJob, MPIJob) to wait until a clear ready-to-go message confirmed by kube-queue and delivered to the task itself via annotation of the CR.

We'd like to integrate kube-queue with training-operator, which requires minimal changes to the Reconcile method:

import (
    ...
    queuev1alpha1 "github.com/kube-queue/pkg/apis/scheduling/v1alpha1"
    ....
)

func (r *XXJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    ...
    if queuev1alpha1.JobSuspended(job) {
        logger.Info("job suspended by kube-queue")
        return ctrl.Result{RequeueAfter: 10*time.Second}, nil
    }
    ...
}

Certainly, such logic can be turn on and off via the launch argument of training-operator.

The proposal of kube-queue has been submitted to Kubernetes wg-batch, pending further discussion and the implementation is now managing thousands of tasks within Alibaba and Baidu.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions