Where should one use Skein ?

Hi @jcrist . I'm trying to understand if Skein is the right tool for my needs.

We have a pretty big static cluster on GCP and we launch jobs on this cluster using some login/bastion/edge nodes (not on GCP). Along with distributed jobs like Hive/Spark etc, we also run a lot of single machine jobs.

Currently we don't have a good solution for running these single machine Python jobs on the cluster nodes and we end up running these on our Edge nodes itself (ideally edge nodes should not be used for computation). 

Skein really seems like a good solution for this problem.

However, I was wondering that we can also run a Python job on PySpark as well by just adding the following lines at the top of the Python file:

```Python
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
```
and then do a `spark-submit` with `driver-memory` and `driver-cores` specified.

- I'd really appreciate if you and the community here can perhaps share your thoughts on this ? 
- Do you see any potential problems with the above approach ?
- What are the use cases where people are using Skein ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Where should one use Skein ? #227

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Where should one use Skein ? #227

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions