Skip to content

Commit 9ad9c0d

Browse files
Adds a sample demonstrating how to implement an incremental refresh based on the Hyper API and Hyper Update REST API. (#64)
* Adds an incremental refresh sample based on the Hyper Update REST API. Adds a sample demonstrating how to implement an incremental refresh based on the Hyper API and the Hyper Update REST API. The sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022. * Minor: Added argparser to pass in arguments and minor rephrasing of the README file. * Added the OpenSkyAPI to the requirements.txt file and removed the instructions to manually install it.
1 parent ecf38e2 commit 9ad9c0d

File tree

5 files changed

+181
-1
lines changed

5 files changed

+181
-1
lines changed

Community-Supported/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,10 @@ The community samples focus on individual use cases and are Python-only. They ha
3030
- Demonstrates the full end-to-end workflow of how to create a multi-table `.hyper` file, place the extract into a `.tdsx`, and publish to Tableau Online or Server.
3131
- [__s3-to-hyper__](https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/s3-to-hyper)
3232
- Demonstrates how to create a `.hyper` file from a wildcard union on text files held in an AWS S3 bucket. The extract is then placed in a `.tdsx` file and published to Tableau Online or Server.
33+
- [__flights-data-incremental-refresh__](https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/flights-data-incremental-refresh)
34+
- This sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022 ([slides available here](https://mkt.tableau.com/tc22/sessions/live/430-HOT-D1_Hands-onLeverageTheHyperUpdate.pdf)).
3335

36+
It demonstrates how to implement an incremental refresh based on the Hyper API and the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm). It showcases this based on fligths data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api).
3437

3538
</br>
3639
</br>
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# flights-data-incremental-refresh
2+
## __Incremental Refresh using the OpenSkyApi__
3+
4+
![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)
5+
6+
This sample is based on the content the Hyper team presented in the Hands on Training session "Hands-on: Leverage the Hyper Update API and Hyper API to Keep Your Data Fresh on Tableau Server" at Tableau Conference 2022 ([slides available here](https://mkt.tableau.com/tc22/sessions/live/430-HOT-D1_Hands-onLeverageTheHyperUpdate.pdf)).
7+
8+
This script pulls down flights data from the [OpenSkyAPI](https://github.com/openskynetwork/opensky-api), creates a hyper database with this data and uses the [Hyper Update API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) to implement an incremental refresh on your Tableau Server/Cloud. The first time this script is executed, the database file is simply published.
9+
10+
# Get started
11+
12+
## __Prerequisites__
13+
To run the script, you will need:
14+
- Windows, Linux, or Mac
15+
- Python 3
16+
- Run `pip install -r requirements.txt`
17+
- Tableau Server Credentials, see below.
18+
19+
## Tableau Server Credentials
20+
To run this sample with your Tableau Server/Cloud, you first need to get the following information:
21+
- Tableau Server Url, e.g. 'https://us-west-2a.online.tableau.com'
22+
- Site name, e.g., use 'default' for your default site (note that you cannot use 'default' in Tableau Cloud but must use the site name)
23+
- Project name, e.g., use an empty string ('') for your default project
24+
- [Token Name and Token Value](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm)
25+
26+
Ensure that you have installed the requirements and then just run the sample Python file with the information from above. The syntax for running the script is:
27+
28+
**python flights-data-incremental-refresh.py [-h] server_url site_name project_name token_name token_value**
29+
30+
# Incremental Refresh using the OpenSkyApi
31+
The script consists of two parts: first it creates a Hyper database with flights data and then publishes the database to Tableau Server/Cloud.
32+
33+
## Create a database with flights data
34+
The `create_hyper_database_with_flights_data` method creates an instance of the `OpenSkyAPI` and then pulls down states within a specific bounding box. This example just uses a subset of the available data as we are using the free version of the OpenSkyApi.
35+
36+
Then, a Hyper database is created with a table with name `TableName("public", "flights")`. Finally, an inserter is used to insert the flights data.
37+
38+
## Publish the hyper database to Tableau Server / Cloud
39+
The `publish_to_server` method first signs into Tableau Server / Cloud. Then, it finds the respective project to which the database should be published to.
40+
41+
There are two cases for publishing the database to Server:
42+
- No datasource with name `datasource_name_on_server` exists on Tableau Server. In this case, the script simply creates the initial datasource on Tableau server. This datasource is needed for the subsequent incremental refreshes as the data will be added to this datasource.
43+
- The datasource with name `datasource_name_on_server` already exists on Tableau Server. In this case, the script uses the Hyper Update REST API to insert the data from the database into the respective table in the datasource on Tableau Server/Cloud.
44+
45+
## __Resources__
46+
Check out these resources to learn more:
47+
- [Hyper API documentation](https://help.tableau.com/current/api/hyper_api/en-us/index.html)
48+
- [Hyper Update API documentation](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm)
49+
- [Tableau Server Client Docs](https://tableau.github.io/server-client-python/docs/)
50+
- [REST API documentation](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm)
51+
- [Tableau Tools](https://github.com/bryantbhowell/tableau_tools)
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, TableName, CreateMode, SqlType, Nullability, Inserter
2+
from opensky_api import OpenSkyApi
3+
import tableauserverclient as TSC
4+
import uuid
5+
import argparse
6+
7+
def create_hyper_database_with_flights_data(database_path):
8+
"""
9+
Leverages the OpenSkyAPI (https://github.com/openskynetwork/opensky-api) to create a
10+
Hyper database with flights data.
11+
"""
12+
# Create an instance of the opensky api to retrieve data from OpenSky via HTTP.
13+
opensky = OpenSkyApi()
14+
# Get the most recent state vector. Note that we can only call this method every
15+
# 10 seconds as we are using the free version of the API.
16+
states = opensky.get_states(bbox=(45.8389, 47.8229, 5.9962, 10.5226))
17+
18+
# Start up a local Hyper process.
19+
with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
20+
# Create a connection to the Hyper process and connect to a hyper file
21+
# (create the file and replace if it exists).
22+
with Connection(endpoint=hyper.endpoint, database=database_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
23+
# Create a table definition with table name "flights" in the "public" schema
24+
# and columns for airport data.
25+
table_definition = TableDefinition(
26+
table_name=TableName("public", "flights"),
27+
columns=[
28+
TableDefinition.Column('baro_altitude', SqlType.double(), Nullability.NULLABLE),
29+
TableDefinition.Column('callsign', SqlType.text(), Nullability.NOT_NULLABLE),
30+
TableDefinition.Column('latitude', SqlType.double(), Nullability.NULLABLE),
31+
TableDefinition.Column('longitude', SqlType.double(), Nullability.NULLABLE),
32+
TableDefinition.Column('on_ground', SqlType.bool(), Nullability.NOT_NULLABLE),
33+
TableDefinition.Column('origin_country', SqlType.text(), Nullability.NOT_NULLABLE),
34+
TableDefinition.Column('time_position', SqlType.int(), Nullability.NULLABLE),
35+
TableDefinition.Column('velocity', SqlType.double(), Nullability.NULLABLE),
36+
])
37+
# Create the flights table.
38+
connection.catalog.create_table(table_definition)
39+
40+
# Insert each of the states into the table.
41+
with Inserter(connection, table_definition) as inserter:
42+
for s in states.states:
43+
inserter.add_row([s.baro_altitude, s.callsign, s.latitude, s.longitude, s.on_ground, s.origin_country, s.time_position, s.velocity])
44+
inserter.execute()
45+
46+
num_flights = connection.execute_scalar_query(query=f"SELECT COUNT(*) from {table_definition.table_name}")
47+
print(f"Inserted {num_flights} flights into {database_path}.")
48+
49+
def publish_to_server(server_url, tableau_auth, project_name, database_path, datasource_name_on_server):
50+
"""
51+
Creates the datasource on Tableau Server if it has not yet been created. Otherwise, uses the
52+
Hyper Update REST API (https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm) to append the data to the datasource.
53+
"""
54+
# Create a tableuserverclient object to interact with Tableau Server.
55+
server = TSC.Server(server_url, use_server_version=True)
56+
# Sign into Tableau Server with the above authentication information.
57+
with server.auth.sign_in(tableau_auth):
58+
# Get project_id from project_name.
59+
matching_projects = server.projects.filter(name=project_name)
60+
project_id = next((project.id for project in matching_projects if project.name == project_name), None)
61+
if project_id is None:
62+
print(f"Publish failed. The specified project '{project_name}' does not exist.")
63+
exit()
64+
65+
# Get the datasource from Server (if it exists).
66+
matching_datasources = server.datasources.filter(name=datasource_name_on_server)
67+
datasource = next((ds for ds in matching_datasources), None)
68+
69+
if datasource is None:
70+
# If the datasource does not exist on server, publish the datasource.
71+
publish_mode = TSC.Server.PublishMode.CreateNew
72+
datasource = TSC.DatasourceItem(project_id)
73+
# Set the name of the datasource such that it can be easily identified.
74+
datasource.name = datasource_name_on_server
75+
datasource = server.datasources.publish(datasource, database_path, publish_mode)
76+
print(f"New datasource published: (id : {datasource.id}, name: {datasource.name}).")
77+
else:
78+
# If the datasource already exists on Tableau Server, use the Hyper Update REST API
79+
# to send the delta to Tableau Server and insert the data into the respective table
80+
# in the datasource.
81+
82+
# Create a new random request id.
83+
request_id = str(uuid.uuid4())
84+
85+
# Create one action that inserts from the new table into the existing table.
86+
# For more details, see https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm#action-batch-descriptions
87+
actions = [
88+
{
89+
"action": "insert",
90+
"source-schema": "public",
91+
"source-table": "flights",
92+
"target-schema": "public",
93+
"target-table": "flights",
94+
}
95+
]
96+
97+
# Start the update job on Server.
98+
job = server.datasources.update_hyper_data(datasource.id, request_id=request_id, actions=actions, payload=database_path)
99+
print(f"Update job posted (ID: {job.id}). Waiting for the job to complete...")
100+
101+
# Wait for the job to finish.
102+
job = server.jobs.wait_for_job(job)
103+
print("Job finished successfully")
104+
105+
106+
if __name__ == '__main__':
107+
argparser = argparse.ArgumentParser(description="Incremental refresh with flights data.")
108+
argparser.add_argument("server_url", help="The url of Tableau Server / Cloud, e.g. 'https://us-west-2a.online.tableau.com'")
109+
argparser.add_argument("site_name", help="The name of your site, e.g., use 'default' for your default site. Note that you cannot use 'default' in Tableau Cloud but must use the site name.", default='default')
110+
argparser.add_argument("project_name", help="The name of your project, e.g., use an empty string ('') for your default project.", default="")
111+
argparser.add_argument("token_name", help="The name of your authentication token for Tableau Server/Cloud. See this url for more details: https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm")
112+
argparser.add_argument("token_value", help="The value of your authentication token for Tableau Server/Cloud. See this url for more details: https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm")
113+
args = argparser.parse_args()
114+
115+
# First create the hyper database with flights data.
116+
database_path = "flights.hyper"
117+
create_hyper_database_with_flights_data(database_path)
118+
119+
# Then publish the data to server.
120+
datasource_name_on_server = 'flights_data_set'
121+
# Create credentials to sign into Tableau Server.
122+
tableau_auth = TSC.PersonalAccessTokenAuth(args.token_name, args.token_value, args.site_name)
123+
publish_to_server(args.server_url, tableau_auth, args.project_name, database_path, datasource_name_on_server)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
tableauhyperapi>=0.0.14946
2+
tableauserverclient>=0.19.0
3+
https://github.com/openskynetwork/opensky-api/archive/master.zip#subdirectory=python

Community-Supported/hyper-to-csv/hyper-to-csv.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
TableName, \
2020
HyperException
2121

22-
# An example of how to turn a .hyper file into a csv to fit within potiential ETL workflows.
22+
# An example of how to turn a .hyper file into a csv to fit within potential ETL workflows.
2323

2424
"""
2525
Note: you need to follow the pantab documentation to make sure columns line up with the

0 commit comments

Comments
 (0)