Skip to content

Commit d3bcc1c

Browse files
Migrate prefect-databricks to core (#12820)
Co-authored-by: Alexander Streed <desertaxle@users.noreply.github.com>
1 parent 7c260d4 commit d3bcc1c

File tree

27 files changed

+16694
-15
lines changed

27 files changed

+16694
-15
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: prefect_databricks.credentials
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
description:
3+
notes: This documentation page is generated from source file docstrings.
4+
---
5+
6+
::: prefect_databricks.flows
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# prefect-databricks
2+
3+
<p align="center">
4+
<a href="https://pypi.python.org/pypi/prefect-databricks/" alt="PyPI version">
5+
<img alt="PyPI" src="https://img.shields.io/pypi/v/prefect-databricks?color=0052FF&labelColor=090422"></a>
6+
<a href="https://github.com/PrefectHQ/prefect-databricks/" alt="Stars">
7+
<a href="https://pepy.tech/badge/prefect-databricks/" alt="Downloads">
8+
<img src="https://img.shields.io/pypi/dm/prefect-databricks?color=0052FF&labelColor=090422" /></a>
9+
</p>
10+
11+
## Welcome!
12+
13+
Prefect integrations for interacting with Databricks
14+
15+
The tasks within this collection were created by a code generator using the service's OpenAPI spec.
16+
17+
The service's REST API documentation can be found [here](https://docs.databricks.com/dev-tools/api/latest/index.html).
18+
19+
## Getting Started
20+
21+
### Python setup
22+
23+
Requires an installation of Python 3.8+.
24+
25+
We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.
26+
27+
These tasks are designed to work with Prefect 2. For more information about how to use Prefect, please refer to the [Prefect documentation](https://docs.prefect.io/).
28+
29+
### Installation
30+
31+
Install `prefect-databricks` with `pip`:
32+
33+
```bash
34+
pip install prefect-databricks
35+
```
36+
37+
### Lists jobs on the Databricks instance
38+
39+
```python
40+
from prefect import flow
41+
from prefect_databricks import DatabricksCredentials
42+
from prefect_databricks.jobs import jobs_list
43+
44+
45+
@flow
46+
def example_execute_endpoint_flow():
47+
databricks_credentials = DatabricksCredentials.load("my-block")
48+
jobs = jobs_list(
49+
databricks_credentials,
50+
limit=5
51+
)
52+
return jobs
53+
54+
example_execute_endpoint_flow()
55+
```
56+
57+
### Use `with_options` to customize options on any existing task or flow
58+
59+
```python
60+
custom_example_execute_endpoint_flow = example_execute_endpoint_flow.with_options(
61+
name="My custom flow name",
62+
retries=2,
63+
retry_delay_seconds=10,
64+
)
65+
```
66+
67+
### Launch a new cluster and run a Databricks notebook
68+
69+
Notebook named `example.ipynb` on Databricks which accepts a name parameter:
70+
71+
```python
72+
name = dbutils.widgets.get("name")
73+
message = f"Don't worry {name}, I got your request! Welcome to prefect-databricks!"
74+
print(message)
75+
```
76+
77+
Prefect flow that launches a new cluster to run `example.ipynb`:
78+
79+
```python
80+
from prefect import flow
81+
from prefect_databricks import DatabricksCredentials
82+
from prefect_databricks.jobs import jobs_runs_submit
83+
from prefect_databricks.models.jobs import (
84+
AutoScale,
85+
AwsAttributes,
86+
JobTaskSettings,
87+
NotebookTask,
88+
NewCluster,
89+
)
90+
91+
92+
@flow
93+
def jobs_runs_submit_flow(notebook_path, **base_parameters):
94+
databricks_credentials = DatabricksCredentials.load("my-block")
95+
96+
# specify new cluster settings
97+
aws_attributes = AwsAttributes(
98+
availability="SPOT",
99+
zone_id="us-west-2a",
100+
ebs_volume_type="GENERAL_PURPOSE_SSD",
101+
ebs_volume_count=3,
102+
ebs_volume_size=100,
103+
)
104+
auto_scale = AutoScale(min_workers=1, max_workers=2)
105+
new_cluster = NewCluster(
106+
aws_attributes=aws_attributes,
107+
autoscale=auto_scale,
108+
node_type_id="m4.large",
109+
spark_version="10.4.x-scala2.12",
110+
spark_conf={"spark.speculation": True},
111+
)
112+
113+
# specify notebook to use and parameters to pass
114+
notebook_task = NotebookTask(
115+
notebook_path=notebook_path,
116+
base_parameters=base_parameters,
117+
)
118+
119+
# compile job task settings
120+
job_task_settings = JobTaskSettings(
121+
new_cluster=new_cluster,
122+
notebook_task=notebook_task,
123+
task_key="prefect-task"
124+
)
125+
126+
run = jobs_runs_submit(
127+
databricks_credentials=databricks_credentials,
128+
run_name="prefect-job",
129+
tasks=[job_task_settings]
130+
)
131+
132+
return run
133+
134+
135+
jobs_runs_submit_flow("/Users/username@gmail.com/example.ipynb", name="Marvin")
136+
```
137+
138+
Note, instead of using the built-in models, you may also input valid JSON. For example, `AutoScale(min_workers=1, max_workers=2)` is equivalent to `{"min_workers": 1, "max_workers": 2}`.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: prefect_databricks.jobs
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: prefect_databricks.models.jobs
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: prefect_databricks.rest

mkdocs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,12 @@ nav:
248248
- Container Instance Worker: integrations/prefect-azure/container_instance_worker.md
249249
- Deployments:
250250
- Steps: integrations/prefect-azure/deployments/steps.md
251+
- Databricks:
252+
- integrations/prefect-databricks/index.md
253+
- Credentials: integrations/prefect-databricks/credentials.md
254+
- Jobs: integrations/prefect-databricks/jobs.md
255+
- Rest: integrations/prefect-databricks/rest.md
256+
- Flows: integrations/prefect-databricks/flows.md
251257
- API Reference:
252258
- api-ref/index.md
253259
- Python SDK:

src/integrations/prefect-azure/pyproject.toml

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,8 @@ dev = [
4848
"azure-cosmos",
4949
"azure-storage-blob",
5050
"azureml-core",
51-
"black",
5251
"coverage",
53-
"flake8",
5452
"interrogate",
55-
"isort",
5653
"mkdocs-gen-files",
5754
"mkdocs-material",
5855
"mkdocs",
@@ -79,18 +76,6 @@ tag_regex = "^prefect-azure-(?P<version>\\d+\\.\\d+\\.\\d+)$"
7976
fallback_version = "0.0.0"
8077
git_describe_command = 'git describe --dirty --tags --long --match "prefect-azure-*[0-9]*"'
8178

82-
[tool.flake8]
83-
exclude = [".git", "__pycache__", "build", "dist"]
84-
per-file-ignores = ["setup.py:E501"]
85-
max-line-length = 88
86-
extend-ignore = ["E203"]
87-
88-
[tool.isort]
89-
skip = ["__init__.py"]
90-
profile = "black"
91-
skip_gitignore = true
92-
multi_line_output = 3
93-
9479
[tool.interrogate]
9580
ignore-init-module = true
9681
ignore_init_method = true
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# prefect-databricks
2+
3+
Visit the full docs [here](https://PrefectHQ.github.io/prefect-databricks) to see additional examples and the API reference.
4+
5+
<p align="center">
6+
<a href="https://pypi.python.org/pypi/prefect-databricks/" alt="PyPI version">
7+
<img alt="PyPI" src="https://img.shields.io/pypi/v/prefect-databricks?color=0052FF&labelColor=090422"></a>
8+
<a href="https://pepy.tech/badge/prefect-databricks/" alt="Downloads">
9+
<img src="https://img.shields.io/pypi/dm/prefect-databricks?color=0052FF&labelColor=090422" /></a>
10+
<a href="https://github.com/PrefectHQ/prefect-databricks/pulse" alt="Activity">
11+
</p>
12+
13+
## Welcome!
14+
15+
Prefect integrations for interacting with Databricks
16+
17+
The tasks within this collection were created by a code generator using the service's OpenAPI spec.
18+
19+
The service's REST API documentation can be found [here](https://docs.databricks.com/dev-tools/api/latest/index.html).
20+
21+
## Getting Started
22+
23+
### Python setup
24+
25+
Requires an installation of Python 3.7+.
26+
27+
We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.
28+
29+
These tasks are designed to work with Prefect 2. For more information about how to use Prefect, please refer to the [Prefect documentation](https://orion-docs.prefect.io/).
30+
31+
### Installation
32+
33+
Install `prefect-databricks` with `pip`:
34+
35+
```bash
36+
pip install prefect-databricks
37+
```
38+
39+
A list of available blocks in `prefect-databricks` and their setup instructions can be found [here](https://PrefectHQ.github.io/prefect-databricks/#blocks-catalog).
40+
41+
### Lists jobs on the Databricks instance
42+
43+
```python
44+
from prefect import flow
45+
from prefect_databricks import DatabricksCredentials
46+
from prefect_databricks.jobs import jobs_list
47+
48+
49+
@flow
50+
def example_execute_endpoint_flow():
51+
databricks_credentials = DatabricksCredentials.load("my-block")
52+
jobs = jobs_list(
53+
databricks_credentials,
54+
limit=5
55+
)
56+
return jobs
57+
58+
example_execute_endpoint_flow()
59+
```
60+
61+
### Use `with_options` to customize options on any existing task or flow
62+
63+
```python
64+
custom_example_execute_endpoint_flow = example_execute_endpoint_flow.with_options(
65+
name="My custom flow name",
66+
retries=2,
67+
retry_delay_seconds=10,
68+
)
69+
```
70+
71+
### Launch a new cluster and run a Databricks notebook
72+
73+
Notebook named `example.ipynb` on Databricks which accepts a name parameter:
74+
75+
```python
76+
name = dbutils.widgets.get("name")
77+
message = f"Don't worry {name}, I got your request! Welcome to prefect-databricks!"
78+
print(message)
79+
```
80+
81+
Prefect flow that launches a new cluster to run `example.ipynb`:
82+
83+
```python
84+
from prefect import flow
85+
from prefect_databricks import DatabricksCredentials
86+
from prefect_databricks.jobs import jobs_runs_submit
87+
from prefect_databricks.models.jobs import (
88+
AutoScale,
89+
AwsAttributes,
90+
JobTaskSettings,
91+
NotebookTask,
92+
NewCluster,
93+
)
94+
95+
96+
@flow
97+
def jobs_runs_submit_flow(notebook_path, **base_parameters):
98+
databricks_credentials = DatabricksCredentials.load("my-block")
99+
100+
# specify new cluster settings
101+
aws_attributes = AwsAttributes(
102+
availability="SPOT",
103+
zone_id="us-west-2a",
104+
ebs_volume_type="GENERAL_PURPOSE_SSD",
105+
ebs_volume_count=3,
106+
ebs_volume_size=100,
107+
)
108+
auto_scale = AutoScale(min_workers=1, max_workers=2)
109+
new_cluster = NewCluster(
110+
aws_attributes=aws_attributes,
111+
autoscale=auto_scale,
112+
node_type_id="m4.large",
113+
spark_version="10.4.x-scala2.12",
114+
spark_conf={"spark.speculation": True},
115+
)
116+
117+
# specify notebook to use and parameters to pass
118+
notebook_task = NotebookTask(
119+
notebook_path=notebook_path,
120+
base_parameters=base_parameters,
121+
)
122+
123+
# compile job task settings
124+
job_task_settings = JobTaskSettings(
125+
new_cluster=new_cluster,
126+
notebook_task=notebook_task,
127+
task_key="prefect-task"
128+
)
129+
130+
run = jobs_runs_submit(
131+
databricks_credentials=databricks_credentials,
132+
run_name="prefect-job",
133+
tasks=[job_task_settings]
134+
)
135+
136+
return run
137+
138+
139+
jobs_runs_submit_flow("/Users/username@gmail.com/example.ipynb", name="Marvin")
140+
```
141+
142+
Note, instead of using the built-in models, you may also input valid JSON. For example, `AutoScale(min_workers=1, max_workers=2)` is equivalent to `{"min_workers": 1, "max_workers": 2}`.
143+
144+
For more tips on how to use tasks and flows in a Collection, check out [Using Collections](https://orion-docs.prefect.io/collections/usage/)!
145+
146+
## Resources
147+
148+
If you encounter any bugs while using `prefect-databricks`, feel free to open an issue in the [prefect-databricks](https://github.com/PrefectHQ/prefect-databricks) repository.
149+
150+
If you have any questions or issues while using `prefect-databricks`, you can find help in either the [Prefect Discourse forum](https://discourse.prefect.io/) or the [Prefect Slack community](https://prefect.io/slack).
151+
152+
Feel free to star or watch [`prefect-databricks`](https://github.com/PrefectHQ/prefect-databricks) for updates too!
153+
154+
## Contributing
155+
156+
If you'd like to help contribute to fix an issue or add a feature to `prefect-databricks`, please [propose changes through a pull request from a fork of the repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).
157+
158+
Here are the steps:
159+
1. [Fork the repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo#forking-a-repository)
160+
2. [Clone the forked repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo#cloning-your-forked-repository)
161+
3. Install the repository and its dependencies:
162+
```
163+
pip install -e ".[dev]"
164+
```
165+
4. Make desired changes
166+
5. Add tests
167+
6. Insert an entry to [CHANGELOG.md](https://github.com/PrefectHQ/prefect-databricks/blob/main/CHANGELOG.md)
168+
7. Install `pre-commit` to perform quality checks prior to commit:
169+
```
170+
pre-commit install
171+
```
172+
8. `git commit`, `git push`, and create a pull request
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .credentials import DatabricksCredentials # noqa

0 commit comments

Comments
 (0)