Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
6fd7db9
feat: tf gcs-bq-ingest module sets bq permissions
Nov 23, 2020
d2f00ce
fixup roles
Nov 23, 2020
65d9515
fixup dockerfil ci check
Nov 24, 2020
7fdffd7
docs: add note on unicode delimiters
Nov 24, 2020
537f05d
fix: update nested values in configs
Nov 30, 2020
125ca9f
chore: improve error message for wrong external table name (#200)
Nov 30, 2020
02458b8
fix: external configs not found in parent dirs
Nov 25, 2020
1525102
time series UDFs (#198)
pdunn Dec 1, 2020
cfe8f8c
Adding helper assets for JMeter performance testing on BigQuery (#203)
danieldeleo Dec 1, 2020
d951ebf
feat: bq project env-var
Dec 2, 2020
c77908f
Merge branch 'feat/bq-project-env-var'
Dec 3, 2020
1768414
Merge branch 'fix/configDiscovery'
Dec 3, 2020
ddf3391
Merge branch 'docs/delimiters'
Dec 3, 2020
abbf5ea
Merge branch 'tf/multi-project'
Dec 3, 2020
cf23f1b
Move utility methods into a utils module
Dec 4, 2020
b52d291
Fix sorting issues
Dec 4, 2020
3276214
Move out constants into their own file
Dec 4, 2020
117d91b
fixup! pylint
Dec 5, 2020
3770c84
fixup! fixup! gcb pylint issue
Dec 5, 2020
8b1982b
Merge pull request #1 from jaketf/feat/move-utils
Dec 5, 2020
81bb167
feat: sequencing with backlog publisher / subscriber
Dec 4, 2020
3c798f7
fixup! mypy pylint
Dec 9, 2020
c631150
fixup! flake8
Dec 9, 2020
d5fabfa
fixup! mypy tests
Dec 9, 2020
1c26e23
support _config/*.sql for bq tranform sql
Dec 9, 2020
d16fb1b
improve performance of wait_on_bq_job
Dec 9, 2020
c627af0
wip
Dec 10, 2020
35e26d9
fixup! handle race condition
Dec 10, 2020
8c97f5a
ordering docs and isort single line rule
Dec 10, 2020
70d2d2b
docs
Dec 10, 2020
6ec3625
fixup linters
Dec 10, 2020
2d0e5a8
fixup import style
Dec 10, 2020
7cb00e4
typing isort single line exclusion
Dec 10, 2020
0be46f9
fixup gcb no-name-in-module bug
Dec 10, 2020
9a0ee10
add test of subscriber after subscriber exit
Dec 10, 2020
feb867e
chores: tf updates, larger machine type, etc.
Dec 11, 2020
2218212
terraform fmt
Dec 11, 2020
d528d85
handle abandoned _BACKFILL and other review feedback
Dec 11, 2020
a0114e1
improve tests
Dec 12, 2020
def1ddb
fix: handle long running bq jobs
Dec 14, 2020
ddaf280
chore: add e2e test, fixup terraform
Dec 15, 2020
c18e5e9
ignore pylint redherring import errors
Dec 15, 2020
2c4376a
fixup! e2e tf to support builds where short_sha is set to empty string.
Dec 15, 2020
b6690af
fix TF_VAR env var
Dec 15, 2020
36be628
enable resource manager api
Dec 15, 2020
6103743
enable cloud functions api...
Dec 15, 2020
edcdae5
add unit test timeout
Dec 15, 2020
63f480d
explicit local backend
Dec 15, 2020
03d9b79
debug missing state file
Dec 15, 2020
fa82f12
debug
Dec 15, 2020
d1acf9e
relative state path
Dec 15, 2020
b9e741c
typo .[tf]state
Dec 15, 2020
dadacaa
fixup docs
Dec 15, 2020
41f04ae
chore: clean up subscriber
Dec 15, 2020
d8ae3cf
fix: don't try to regex match _backlog/* items
Dec 15, 2020
d9f3482
don't regex match in triage if ordering enabled (this happens later)
Dec 15, 2020
7d2f28f
fix: subscriber monitor get table prefix
Dec 15, 2020
35fe6e3
fix: get_table_prefix issues w/ backlog, backfill and historydone
Dec 15, 2020
d93a2c9
fix: look_for_config_in_parents should return empty string for empty …
Dec 15, 2020
d50fefc
fix table prefix w/ trailing slash
Dec 15, 2020
b16a8b0
use get_table_prefix instead of removesuffix
Dec 15, 2020
f685511
chore: refactor terraform into pytest fixture to always clean up
Dec 16, 2020
905949d
fix don't removesuffix for start backfill file
Dec 16, 2020
675c756
fixup isort
Dec 16, 2020
f0ebcd0
more logging statements fail on untriageable event
Dec 16, 2020
b83fee8
fix pylint
Dec 16, 2020
c9263b7
Merge branch 'master' into sequencing-develop
Dec 16, 2020
eae687f
feat: env-var t numDmlRowsAffected = 0 as a failure
Dec 16, 2020
94136b6
[skip ci] add comment to cloudbuild.yaml
Dec 16, 2020
790abb1
[skip ci] update comment in cloudbuild.yaml
Dec 16, 2020
94ca2f6
chore: clean up unused fixture, init files
Dec 16, 2020
b216d88
chore: improve terraform printint in pytest fixture
Dec 16, 2020
d5fe02b
better bq job ids
Dec 16, 2020
fcb88a0
fixup regex escaping
Dec 16, 2020
85cea34
make pylint happy
Dec 16, 2020
f7af0fb
[skip ci] more docs
Dec 16, 2020
7971bc3
fix default load config return type
Dec 17, 2020
de19c98
fix: fail on failure of children jobs
Jan 7, 2021
61d2c14
chore: add test for child job failing behavior
Jan 8, 2021
fb69a6a
fixup flake8
Jan 8, 2021
1aec908
fixup flake8
Jan 8, 2021
0490217
fixup flake8
Jan 8, 2021
3c3bd3d
feat: separate bq storage and compute project env vars
Jan 11, 2021
9e8e52f
fix: don't require escaping braces in sql, still support {dest_datase…
Jan 22, 2021
854aa68
happy newyear! copyright 2020 -> 2021
Jan 22, 2021
09daa9d
clean up newlines in logs / error messages
Feb 18, 2021
8821dc0
improve logging in bq failures
Mar 2, 2021
33ae329
fixup flake8
Mar 3, 2021
57443e0
fixup mypy
Mar 3, 2021
97f48a7
fixup pylint
Mar 3, 2021
03fb42c
Merge pull request #6 from jaketf/cf-improve-logging
Mar 3, 2021
7293f32
fixup BigQueryJobFailure docstring
Mar 3, 2021
21bd43c
Merge pull request #7 from jaketf/cf-improve-logging
Mar 3, 2021
91dd8af
FEATURE: Snapshot the table once a chunk has successfully loaded
RyandenOtter Mar 23, 2021
29b2412
Changing to a copy until the snapshotting feature is enabled
RyandenOtter Mar 24, 2021
db6a98e
Make the SNAPSHOT_DATSET and ENABLE_SNAPSHOTTING constants environment
RyandenOtter Mar 24, 2021
145c2af
force enable snapshotting
RyandenOtter Mar 24, 2021
5f69020
setting the snapshotting as enabled and included in tests by default
RyandenOtter Mar 25, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions performance_testing/jmeter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Using JMeter for BigQuery Performance Testing

## Before You Start

Make sure you've completed the following prerequisite steps before running the
provided JMeter test plans

* Install
[Java 8+ Oracle JDK](https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html)
from Oracle page
* Download the
[Simba BigQuery JDBC Driver](https://cloud.google.com/bigquery/providers/simba-drivers)
* Download the latest
[JMeter Binary](https://jmeter.apache.org/download_jmeter.cgi)

## Which JMeter Test Plan Do I Use?

### [bigquery_jdbc_sampler.jmx](bigquery_jdbc_sampler.jmx) (Runs queries using JDBC driver)

#### Pros

* **Long-running job polling** - The JDBC request sampler is necessary for
tests where queries run longer than 4 minutes and where a consistent
concurrency level must be maintained. The JDBC driver will poll the query
job until it is finished before submitting a new query, ensuring that JMeter
active threads exactly match active BigQuery query jobs.
* **Simpler query format** - The JDBC request sampler does not require you to
form a JSON configuration object to submit the query to the API. This
eliminates JSON errors as a source of problems.
* Unescaped double quotes are allowed in SQL queries - You do not have to
escape double quotes in your SQL queries as is required in the HTTP
sampler.

#### Cons

* **JDBC overhead latency** - The JDBC driver has some overhead latency
associated with it versus directly calling the REST API. Use the
BigQuery-provided
[INFORMATION_SCHEMA.JOBS_BY*](https://cloud.google.com/bigquery/docs/information-schema-jobs)
view to exclusively measure query runtime without any other latencies like
network.
* **BigQuery job labels unsupported** - You cannot currently set labels for
jobs submitted by the JDBC driver. In order to get a similar effect to
labeling, you'll need to include something like a JSON object in a comment
in each query, that can be parsed when querying the
[INFORMATION_SCHEMA.JOBS_BY*](https://cloud.google.com/bigquery/docs/information-schema-jobs)
view.
* **Response rows must be returned** - The JDBC driver does not support an
option to return 0 results. The MaxResults JDBC config should therefore be
set to 1, since the default setting of 0 instructs the JDBC driver to return
all rows.

### [bigquery_http_sampler.jmx](bigquery_http_sampler.jmx) (Runs queries using REST API)

#### Pros

* **Fully configurable job options, including job labels** - The HTTP request
sampler allows you to specify the raw JSON request body which can include
any supported BigQuery options. In particular, it's very useful to include
query labels, since these will be present in the
[jobs metadata schema](https://cloud.google.com/bigquery/docs/information-schema-jobs#schema)
in the labels field.
* **Faster Performance** - Since JMeter is making REST calls directly to the
BigQuery API, the performance is faster than having to invoke BigQuery API
via the Java JDBC driver.

#### Cons

* **Default 1 hour maximum lifetime for access tokens** - The HTTP request
sampler uses an access token (which you provide as a command-line parameter
at startup) to authenticate with BigQuery. The default maximum lifetime of a
Google access token is 1 hour (3,600 seconds). However, you can extend the
maximum lifetime to 12 hours by
[modifying the organization policy](https://cloud.google.com/resource-manager/docs/organization-policy/restricting-service-accounts#extend_oauth_ttl).
JMeter calls to BigQuery APIs will start failing if your JMeter test runs
longer than your access token’s maximum lifetime.
* **JSON body configuration** - You need to configure the API request payload
using JSON, and the JSON object configuration is easy to break. A stray
quote or a missing comma can make your query fail in ways that are hard to
troubleshoot.
* **Queries must have all double quotes escaped** - Since the SQL queries
you pass to JMeter are values inside the HTTP request JSON body, you
must escape all double quotes that appear in the SQL query with a
backslash. ( e.g. SELECT \”Hello World\” )
* **4min Max Timeout** - If a query runs for longer than 4 minutes, it can
appear to be done. If you intend to use JMeter's data to characterize the
runtime of your queries, this is a critical consideration. The results will
be wrong if you have queries that are long-running.

## Running the JMeter Test Plan

The JMeter test plans provided in this repo are designed to be run with very few
modifications. You should first test-run them this way before adding in more
changes to simplify troubleshooting if any issues are encountered.

### [run_jmeter_jdbc_sampler.sh](run_jmeter_jdbc_sampler.sh) (**Runs bigquery_jdbc_sampler.jmx**)

1. Replace the bash script placeholders with your own values, depending on
whether you use JDBC or HTTP as shown below:
* `-Jproject_id=`*YOUR_PROJECT*
* `-Juser.classpath=`*/path/to/your/SimbaJDBCDriverforGoogleBigQuery*
1. Ensure proper authentication is set up for either service account or user
account authentication:
* Service account authentication: \
`export GOOGLE_APPLICATION_CREDENTIALS=`*/path/to/your/private_key.json*
* User account authentication: \
`gcloud auth application-default login`
1. Run the bash helper script to begin the JMeter test
* `bash run_jmeter_jdbc_sampler.sh`

### [run_jmeter_http_sampler.sh](run_jmeter_http_sampler.sh) (**Runs bigquery_http_sampler.jmx**)

1. Replace the bash script placeholders shown below with your own values:
* `-Jproject_id=`*YOUR_PROJECT*
1. Ensure proper authentication is set up
* Service account authentication: \
`gcloud auth activate-service-account
--key-file=`*/path/to/your/private_key.json*
* User account authentication: \
`gcloud auth login`
1. Run the bash helper script to begin the JMeter test
* `bash run_jmeter_http_sampler.sh`

## Inspecting the JMeter Test Plans

The best method of viewing and understand the JMeter test plans is to open then in JMeter's GUI mode as shown below:
* `./apache-jmeter-5.3/bin/jmeter -t bigquery_jdbc_sampler.jmx`
* `./apache-jmeter-5.3/bin/jmeter -t bigquery_http_sampler.jmx`
Loading