Skip to content

Commit adc5b12

Browse files
authored
Add an 'query-external-data' example (#48)
Add an example that demonstrates several possibilities that became available with the new `external()` functionality.
1 parent 0a93a47 commit adc5b12

File tree

5 files changed

+138
-0
lines changed

5 files changed

+138
-0
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# query-external-data
2+
3+
![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)
4+
5+
__Current Version__: 1.0
6+
7+
This sample demonstrates how you can use Hyper to query external data like parquet or CSV files directly. This enables a variety of ETL capabilities like accessing multiple files at once, filtering the read data and creating additional calculated columns.
8+
9+
# Get started
10+
11+
## __Prerequisites__
12+
13+
To run the script, you will need:
14+
15+
- a computer running Windows, macOS, or Linux
16+
17+
- Python 3.7 or newer
18+
19+
## Run the sample
20+
21+
Ensure that you have installed the requirements and then just run the sample Python file.
22+
The following instructions assume that you have set up a virtual environment for Python. For more information on
23+
creating virtual environments, see [venv - Creation of virtual environments](https://docs.python.org/3/library/venv.html)
24+
in the Python Standard Library.
25+
26+
1. Open a terminal and activate the Python virtual environment (`venv`).
27+
28+
1. Navigate to the folder where you installed the sample.
29+
30+
1. Run the Python script:
31+
32+
**python query_external_data.py**
33+
34+
## __Resources__
35+
Check out these resources to learn more:
36+
37+
- [Hyper API docs](https://help.tableau.com/current/api/hyper_api/en-us/index.html)
38+
39+
- [Tableau Hyper API Reference (Python)](https://help.tableau.com/current/api/hyper_api/en-us/reference/py/index.html)
40+
41+
- [The Hyper API SQL Reference](https://help.tableau.com/current/api/hyper_api/en-us/reference/sql)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
2554,DE,Hansastrasse,15
2+
3554,DE,Ganghoferstrasse,24
3+
2654,US,180th Ave,174
4+
2564,US,150th Ave,114
5+
2114,US,80th Ave,74
6+
9954,US,42th Ave,94
7+
2444,EN,Oxford Rd,13
8+
1004,EN,Dowells Cl,41
9+
6454,DE,Radlkoferstrasse,75
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
2954,DE,Hansastrasse,11
2+
9664,DE,Ganghoferstrasse,14
3+
8554,US,10th Ave,184
962 Bytes
Binary file not shown.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# -----------------------------------------------------------------------------
2+
#
3+
# This file is the copyrighted property of Tableau Software and is protected
4+
# by registered patents and other applicable U.S. and international laws and
5+
# regulations.
6+
#
7+
# You may adapt this file and modify it to fit into your context and use it
8+
# as a template to start your own projects.
9+
#
10+
# -----------------------------------------------------------------------------
11+
from tableauhyperapi import HyperProcess, Telemetry, \
12+
Connection, CreateMode, \
13+
HyperException
14+
15+
def print_list(l):
16+
for e in l:
17+
print(e)
18+
19+
20+
def run_hyper_query_external():
21+
"""
22+
An example demonstrating how to use Hyper to read data directly from external sources.
23+
24+
More information can be found here:
25+
https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/external-data-in-sql.html
26+
https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/sql-copy.html
27+
https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/sql-createexternaltable.html
28+
https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/functions-srf.html#FUNCTIONS-SRF-EXTERNAL
29+
"""
30+
31+
# Start the Hyper process.
32+
with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
33+
# Open a connection to the Hyper process. This will also create the new Hyper file.
34+
# The `CREATE_AND_REPLACE` mode causes the file to be replaced if it
35+
# already exists.
36+
with Connection(endpoint=hyper.endpoint,
37+
database="output_file.hyper",
38+
create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
39+
40+
print("Scenario 1: Create a table from filtered parquet data with a calculated extra column")
41+
# This SQL command queries a parquet file directly and creates the table 'low_prio_orders' in Hyper.
42+
# The created table contains the data that is returned from the 'SELECT' part of the query. I.e., only
43+
# a selection of columns, a new calculated column 'employee_nr' and only the rows with low order priority.
44+
command_1 = """CREATE TABLE low_prio_orders AS
45+
SELECT order_key, customer_key, price, CAST(SUBSTRING(employee from 0 for 6) AS int) as employee_nr
46+
FROM external('orders.parquet')
47+
WHERE priority = 'LOW'"""
48+
49+
connection.execute_command(command_1)
50+
51+
print("table content:")
52+
print_list(connection.execute_list_query("SELECT * FROM low_prio_orders"))
53+
print()
54+
55+
print("\nScenario 2: Query multiple external data sources in one query.")
56+
# This query reads data from a parquet and a CSV file and joins it. Note that, for CSV files, the schema of the file
57+
# has to be provided and currently cannot be inferred form the file directly (see the `DESCRIPTOR` argument below).
58+
command_2 = """SELECT country, SUM(quantity * price)
59+
FROM external('orders.parquet') orders
60+
join external('customers.csv',
61+
COLUMNS => DESCRIPTOR(customer_key int, country text, street text, nr int),
62+
DELIMITER => ',', FORMAT => 'csv', HEADER => false) customers
63+
on orders.customer_key = customers.customer_key GROUP BY country
64+
ORDER BY country"""
65+
print("result:")
66+
print_list(connection.execute_list_query(command_2))
67+
print()
68+
69+
70+
print("Scenario 3: Query multiple CSV files that have the same schema in one go.")
71+
# Note that, for CSV files, the schema of the file has to be provided and currently cannot be inferred form the file directly.
72+
# (see the `DESCRIPTOR` argument below).
73+
command_3 = """SELECT *
74+
FROM external(ARRAY['customers.csv','customers.csv'],
75+
COLUMNS => DESCRIPTOR(customer_key int, country text, street text, nr int),
76+
DELIMITER => ',', FORMAT => 'csv', HEADER => false)
77+
ORDER BY country"""
78+
79+
print("result:")
80+
print_list(connection.execute_list_query(command_3))
81+
82+
83+
84+
if __name__ == '__main__':
85+
run_hyper_query_external()

0 commit comments

Comments
 (0)