Skip to content

Executor pods not starting while submitting spark application from operator #352

@sp-matrix

Description

@sp-matrix

Hello,

The operator was installed in our openshift cluster (organization). When the example spark application (spark-examples_2.11-2.4.5.jar) was submitted with the help of operator, submitter pod and driver pod was getting created but the executor pod is not getting created and failing with the below error.

INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
ERROR Utils: Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc/api/v1/namespaces/nm/pods. Message: Pod "my-spark-app-1653416287945-exec-4" is invalid: spec.containers[0].resources.requests: Invalid value: "1": must be less than or equal to cpu limit. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.containers[0].resources.requests, message=Invalid value: "1": must be less than or equal to cpu limit, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=my-spark-app-1653416287945-exec-4, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "my-spark-app-1653416287945-exec-4" is invalid: spec.containers[0].resources.requests: Invalid value: "1": must be less than or equal to cpu limit, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure

After going through the documentation, we tried limiting the CPU and core but the issue is not getting resolved.

SparkApplication yaml:
...
...
spec:
driver:
coreLimit: 500m
cores: 0.2
executor:
coreLimit: 1000m
coreRequest: 0.5
cores: 1 # we can't give below 1 (float values), when we skip this parameter the default value 1 is assigned.
cpuLimit: 1000m
instances: 1
...

In configmap of the driver:

spark.executor.memory=512m
spark.driver.blockManager.port=7079
spark.ui.reverseProxy=true
spark.executorEnv.APPLICATION_NAME=my-spark-app
spark.kubernetes.container.image=quay.io/radanalyticsio/openshift-spark:2.4-latest
spark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
spark.ui.reverseProxyUrl=/
spark.kubernetes.driver.limit.cores=500m
spark.kubernetes.submitInDriver=true
spark.driver.memory=512m
spark.submit.deployMode=cluster
spark.kubernetes.driverEnv.APPLICATION_NAME=my-spark-app
spark.kubernetes.executor.label.radanalytics.io/SparkApplication=my-spark-app
spark.kubernetes.driver.label.radanalytics.io/SparkApplication=my-spark-app
spark.executor.cores=1
spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator
spark.jars.ivy=/tmp/.ivy2
spark.kubernetes.driver.pod.name=my-spark-app-1653416287945-driver
spark.executor.instances=1
spark.kubernetes.namespace=nm-np
spark.app.id=spark-d7cd179d47a047dd9d11811a99e1060c
spark.app.name=my-spark-app
spark.kubernetes.driver.label.version=2.3.0
spark.driver.cores=0.2
spark.driver.port=7078

Note: we tried starting the service 'cluster-limreq' with 4 cpu limit for executors as mentioned in READme but the issue was not resolved.

Resource Quota of my namespace allocated by my cluster manager

Name: core-resource-limits-hermes
Namespace: nm-np
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio


Container cpu 25m 4 150m 400m -
Container memory 25Mi 4Gi 256Mi 512Mi -
Pod cpu 25m 4 - - -
Pod memory 25Mi 4Gi - - -

Can anyone please help to find the reason why the executor Pod is not staring even though we request only 1 cpu from the allocated quota of 4 cpus.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions