KBase Spark Standalone Deployment

This repository contains the Dockerfile and associated configurations for deploying Apache Spark in a standalone mode using Docker and for Kubernetes.

It re-uses the BERDL Docker image as a base, which provides a consistent environment for running Spark jobs and python applications.

ENV VARS

SPARK_PREFIX are official Spark environment variables, such as SPARK_MASTER_URL, SPARK_DRIVER_HOST, etc.
BERDL_PREFIX are our specialized ones

Getting Started

Clone the repository

git clone git@github.com:kbase/cdm-spark-standalone.git
cd cdm-spark-standalone

Build the Docker image
```
docker compose up -d --build
```
Access the Spark UI:

Spark Master: http://localhost:8090
Spark Worker 1: http://localhost:8081
Spark Worker 2: http://localhost:8082

Testing the Spark Cluster

To test the cluster is working:

Start a shell in the spark-user container:
```
docker compose exec -it spark-user bash
```

Submit a test job:

spark_user@d30c26e91ae0:/opt/bitnami/spark$ bin/spark-submit --master $SPARK_MASTER_URL --deploy-mode client examples/src/main/python/pi.py 10

You should see a line like

Pi is roughly 3.138040

in the output.

Using Redis for Caching

Start a shell in the spark-user container:
```
docker compose exec -it spark-user bash
```

Run the example:

spark-submit /app/redis_container_script.py

Verifying Cache in Redis

Start a shell in the Redis container:
```
docker compose exec -it redis bash
```
Start the Redis CLI:
```
redis-cli
```
List all keys for your cached table:
```
keys people:*
```
View the contents of a specific key (replace the key with one from the previous command):
```
hgetall people:d6d606a747ae40368fc7fdae784b835b
```

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github		.github
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
podman_build.sh		podman_build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KBase Spark Standalone Deployment

ENV VARS

Getting Started

Testing the Spark Cluster

Using Redis for Caching

Verifying Cache in Redis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

BERDataLakehouse/kube_spark_manager_image

Folders and files

Latest commit

History

Repository files navigation

KBase Spark Standalone Deployment

ENV VARS

Getting Started

Testing the Spark Cluster

Using Redis for Caching

Verifying Cache in Redis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages