Skip to content

Commit a0935cb

Browse files
authored
Simple example of querying a parquet file located in a public bucket on Google Storage (#68)
1 parent 7dcaac3 commit a0935cb

File tree

4 files changed

+97
-1
lines changed

4 files changed

+97
-1
lines changed

Community-Supported/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ The community samples focus on individual use cases and are Python-only. They ha
3535

3636
It demonstrates how to implement an incremental refresh based on the Hyper API and the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm). It showcases this based on fligths data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api).
3737

38+
- [__s3-compatible-services__](https://github.com/aetperf/hyper-api-samples/tree/main/Community-Supported/s3-compatible-services)
39+
- Demonstrates how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`.
40+
3841
</br>
3942
</br>
4043

Community-Supported/native-s3/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,4 @@ Check out these resources to learn more:
6969

7070
- [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL)
7171

72-
- [AWS command line tools documentation](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html), e.g. if you want to download some of the sample files to your local machine and explore them
72+
- [AWS command line tools documentation](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html), e.g. if you want to download some of the sample files to your local machine and explore them
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# parquet-to-hyper
2+
3+
## __parquet_to_hyper__
4+
5+
![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)
6+
7+
__Current Version__: 1.0
8+
9+
These samples show you how Hyper can natively interact with S3 compatible services, such as Google Storage, without the need to install any external dependencies like `google-cloud-bigquery`.
10+
11+
# Get started
12+
13+
## __Prerequisites__
14+
15+
To run the script, you will need:
16+
17+
- a computer running Windows, macOS, or Linux
18+
19+
- Python 3.9+
20+
21+
- install the dependencies from the `requirements.txt` file
22+
23+
## Run the samples
24+
25+
The following instructions assume that you have set up a virtual environment for Python. For more information on
26+
creating virtual environments, see [venv - Creation of virtual environments](https://docs.python.org/3/library/venv.html)
27+
in the Python Standard Library.
28+
29+
1. Open a terminal and activate the Python virtual environment (`venv`).
30+
31+
1. Navigate to the folder where you installed the samples.
32+
33+
1. Then follow the steps to run one of the samples which are shown below.
34+
35+
**Live query against a `.parquet` file which is stored on Google Storage**
36+
37+
Run the Python script
38+
39+
```bash
40+
$ python query-parquet-on-gs.py
41+
```
42+
This script will perform a live query on the Parquet file which is stored in this public Google Storage bucket: `gs://cloud-samples-data/bigquery/us-states/us-states.parquet`.
43+
44+
## __Resources__
45+
Check out these resources to learn more:
46+
47+
- [Hyper API docs](https://help.tableau.com/current/api/hyper_api/en-us/index.html)
48+
49+
- [Tableau Hyper API Reference (Python)](https://help.tableau.com/current/api/hyper_api/en-us/reference/py/index.html)
50+
51+
- [The EXTERNAL function in the Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL)
52+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
"""Connect to Google Storage and query a parquet file located in a public bucket.
2+
3+
Adapted from hyper-api-samples/Community-Supported/native-s3/query-csv-on-s3.py
4+
"""
5+
6+
from tableauhyperapi import Connection, HyperProcess, Telemetry, escape_string_literal
7+
8+
BUCKET_NAME = "cloud-samples-data"
9+
FILE_PATH = "bigquery/us-states/us-states.parquet"
10+
11+
states_dataset_gs = escape_string_literal(
12+
f"s3://{BUCKET_NAME.strip('/')}/{FILE_PATH.strip('/')}"
13+
)
14+
15+
# Hyper Process parameters
16+
parameters = {}
17+
# We need to manually enable S3 connectivity as this is still an experimental feature
18+
parameters["experimental_external_s3"] = "true"
19+
# endpoint URL
20+
parameters["external_s3_hostname"] = "storage.googleapis.com"
21+
# We do not need to specify credentials and bucket location as the GS bucket is
22+
# publicly accessible; this may be different when used with your own data
23+
24+
with HyperProcess(
25+
telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU,
26+
parameters=parameters,
27+
) as hyper:
28+
# Create a connection to the Hyper process - we do not connect to a database
29+
with Connection(
30+
endpoint=hyper.endpoint,
31+
) as connection:
32+
33+
# Use the SELECT FROM EXTERNAL(S3_LOCATION()) syntax - this allows us to use
34+
# the parquet file like a normal table name in SQL queries
35+
sql_query = (
36+
f"""SELECT COUNT(*) FROM EXTERNAL(S3_LOCATION({states_dataset_gs}))"""
37+
)
38+
39+
# Execute the query with `execute_scalar_query` as we expect a single number
40+
count = connection.execute_scalar_query(sql_query)
41+
print(f"number of rows : {count}")

0 commit comments

Comments
 (0)