This community extension allows DuckDB to query data from Google BigQuery using a mix of BigQuery Storage (Read/Write) and REST API. It enables users to access, manage, and manipulate their BigQuery datasets/tables directly from DuckDB using standard SQL queries. Inspired by official DuckDB RDBMS extensions like MySQL, PostgreSQL, and SQLite, this extension offers a similar feel. See Important Notes for disclaimers and usage information.
This extension supports the following builds:
linux_amd64,linux_arm64,linux_amd64_musl,osx_amd64,osx_arm64, andwindows_amd64. The buildswasm_mvp,wasm_eh,wasm_threads, andwindows_amd64_mingware not supported.
To authenticate using your Google Account, first install the Google Cloud CLI (gcloud). Download the latest version from the Google Cloud CLI installation page and follow the instructions to select and authenticate your Google Cloud project for using BigQuery.
After installation, run the following command to authenticate and follow the steps along:
gcloud auth application-default loginYou can store your BigQuery credentials securely using DuckDB's built-in secrets management system. This eliminates the need to manage environment variables and provides better security through encrypted storage.
Using Service Account JSON:
-- Store service account credentials as a persistent secret
CREATE SECRET my_bigquery_secret (
    TYPE bigquery,
    PROVIDER service_account,
    SCOPE 'bigquery://my-project-id',
    json '{ "type": "service_account", "project_id": "....", "private_key_id": "...." }'
);
-- Now use BigQuery normally - credentials are automatically used
ATTACH 'project=my-project-id' AS bq (TYPE bigquery);Using Individual Credentials:
-- Store individual credential components
CREATE SECRET my_bigquery_secret (
    TYPE bigquery,
    PROVIDER config,
    SCOPE 'bigquery://my-project-id',
    project_id 'my-project-id',
    key_id 'my-service-account@my-project.iam.gserviceaccount.com',
    secret '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n'
);Benefits of DuckDB Secrets:
- No environment variable management required
- Credentials stored encrypted in DuckDB
- Project-specific secrets with scope matching
- Persistent across sessions
- Seamless integration with all BigQuery functions
For more details on using secrets, see the Authentication with Secrets section below.
Alternatively, you can authenticate using a service account JSON file with an environment variable. First, create a service account in the Google Cloud Console, assign the necessary roles, and download the JSON key file. Next, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the file path. For example:
# On Linux or macOS
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/my/service-account-credentials.json"
# On Windows
set GOOGLE_APPLICATION_CREDENTIALS="C:\path\to\my\service-account-credentials.json"On Windows, gRPC requires an additional environment variable to configure the trust store for SSL certificates. Download and configure it using (see official documentation):
@powershell -NoProfile -ExecutionPolicy unrestricted -Command ^
    (new-object System.Net.WebClient).Downloadfile( ^
        'https://pki.google.com/roots.pem', 'roots.pem')
set GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=%cd%\roots.pemThis downloads the roots.pem file and sets the GRPC_DEFAULT_SSL_ROOTS_FILE_PATH environment variable to its location.
The BigQuery extension can be installed from the official Community Extension Repository, eliminating the need to enable the unsigned mode. Just use the following command to install and load the extension:
-- Install and load the DuckDB BigQuery extension from the Community Repository
FORCE INSTALL 'bigquery' FROM community;
LOAD 'bigquery';Note: Windows user require an additional step to configure the gRPC SSL certificates (see here).
After loading the extension, you can connect to your BigQuery project using the ATTACH statement. Replace my_gcp_project with the name of your actual Google Cloud Project. Here is an example:
-- Attach to your BigQuery Project
D ATTACH 'project=my_gcp_project' as bq (TYPE bigquery, READ_ONLY);
-- Show all tables in all datasets in the attached BigQuery project
D SHOW ALL TABLES;
ββββββββββββ¬βββββββββββββββββββ¬βββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββ
β database β      schema      β   name   β column_names β   column_types    β temporary β
β varchar  β     varchar      β  varchar β  varchar[]   β     varchar[]     β  boolean  β
ββββββββββββΌβββββββββββββββββββΌβββββββββββΌβββββββββββββββΌββββββββββββββββββββΌββββββββββββ€
β bq       β quacking_dataset β duck_tbl β [i, s]       β [BIGINT, VARCHAR] β false     β
| bq       | barking_dataset  | dog_tbl  | [i, s]       | [BIGINT, VARCHAR] β false     |
ββββββββββββ΄βββββββββββββββββββ΄βββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββ΄ββββββββββββ
-- Select data from a specific table in BigQuery
D SELECT * FROM bq.quacking_dataset.duck_tbl;
βββββββββ¬βββββββββββββββββ
β   i   β       s        β
β int32 β    varchar     β
βββββββββΌβββββββββββββββββ€
β    12 β quack π¦       β
β    13 β quack quack π¦ β
βββββββββ΄βββββββββββββββββDepending on the number of schemas and tables, initializing the BigQuery catalog may take some time. However, once initialized, the tables are cached. To speed up this process, you also focus the loading process on a particular dataset by specifying a dataset= parameter as follows.
-- Attach to your BigQuery Project
D ATTACH 'project=my_gcp_project dataset=quacking_dataset' as bq (TYPE bigquery);
-- Show all tables in all datasets in the attached BigQuery project
D SHOW ALL TABLES;
ββββββββββββ¬βββββββββββββββββββ¬βββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββ
β database β      schema      β   name   β column_names β   column_types    β temporary β
β varchar  β     varchar      β  varchar β  varchar[]   β     varchar[]     β  boolean  β
ββββββββββββΌβββββββββββββββββββΌβββββββββββΌβββββββββββββββΌββββββββββββββββββββΌββββββββββββ€
β bq       β quacking_dataset β duck_tbl β [i, s]       β [BIGINT, VARCHAR] β false     β
ββββββββββββ΄βββββββββββββββββββ΄βββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββ΄ββββββββββββWhen working with BigQuery, you may need to separate storage and compute across different GCP projects. You can achieve this by using the billing_project parameter with the ATTACH command:
-- Attach to a storage project while billing compute to a different project
D ATTACH 'project=my_storage_project billing_project=my_compute_project' AS bq (TYPE bigquery, READ_ONLY);
-- Query data from the storage project, billed to the compute project
D SELECT * FROM bq.quacking_dataset.duck_tbl WHERE i = 12;
βββββββββ¬βββββββββββββββββ
β   i   β       s        β
β int32 β    varchar     β
βββββββββΌβββββββββββββββββ€
β    12 β quack π¦       β
βββββββββ΄βββββββββββββββββIn this configuration:
- project=storage-projectspecifies where your data is stored
- billing_project=compute-projectspecifies which project will be billed for query execution and compute resources
This approach allows you to maintain clear separation between data storage costs and compute costs across different GCP projects.
The following SQL statements provide a brief overview of supported functionalities and include examples for interacting with BigQuery:
ATTACH 'project=my_gcp_project' as bq (TYPE bigquery);
-- Create a BigQuery dataset
CREATE SCHEMA bq.some_dataset;
-- Create a BigQuery table in some dataset
CREATE TABLE bq.some_dataset.tbl(id INTEGER, some_string VARCHAR);
-- Insert values into the table
INSERT INTO bq.some_dataset.tbl VALUES (42, "my quacking string");
-- Retrieves rows from the table
SELECT some_string FROM bq.some_dataset.tbl;
-- Drop a BigQury table in some dataset
DROP TABLE IF EXISTS bq.some_dataset.tbl;
-- Drop a BigQuery dataset
DROP SCHEMA IF EXISTS bq.some_dataset;
-- Altering tables - rename table
ALTER TABLE bq.some_dataset.tbl RENAME TO tbl_renamed;
-- Altering tables - rename column
ALTER TABLE bq.some_dataset.tbl RENAME COLUMN i TO i2;
-- Altering tables - add column
ALTER TABLE bq.some_dataset.tbl ADD COLUMN j INTEGER;
-- Altering tables - drop column
ALTER TABLE bq.some_dataset.tbl DROP COLUMN i;
-- Altering tables - change column type
ALTER TABLE bq.some_dataset.tbl ALTER COLUMN i TYPE DOUBLE;
-- Altering tables - drop not null condition
ALTER TABLE bq.some_dataset.tbl ALTER COLUMN i DROP NOT NULL;The bigquery_scan function provides direct, efficient reads from a single table within your BigQuery project. This function is ideal for simple reads where no complex SQL is required, and it supports simple projection pushdown from DuckDB.
If you would rather query just one table directly instead of attaching all tables, you can achieve this by directly using the bigquery_scan function, such as:
D SELECT * FROM bigquery_scan('my_gcp_project.quacking_dataset.duck_tbl');
βββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββββββββ
β   i   β       s        β        timestamp         β
β int32 β    varchar     β        timestamp         β
βββββββββΌβββββββββββββββββΌβββββββββββββββββββββββββββ€
β    12 β quack π¦       β 2024-03-21 08:01:02 UTC  β
β    13 β quack quack π¦ β 2024-05-19 10:25:44 UTC  β
βββββββββ΄βββββββββββββββββ΄βββββββββββββββββββββββββββBy default,
bigquery_scannow uses the optimized Arrow-based implementation (formerlybigquery_arrow_scan). You can setuse_legacy_scan=trueto use the legacy implementation if needed. For backwards compatibility, the separatebigquery_arrow_scanfunction is still available and now behaves identically to the defaultbigquery_scan.
The function supports filter pushdown by accepting row restriction filter statements as an optional argument. These filters are passed directly to BigQuery and restrict which rows are transfered from the source table. For example:
D SELECT * FROM bigquery_scan('my_gcp_project.quacking_dataset.duck_tbl', filter='i=13 AND DATE(timestamp)=DATE(2023, 5, 19)'));
βββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββββββββ
β   i   β       s        β        timestamp         β
β int32 β    varchar     β        timestamp         β
βββββββββΌβββββββββββββββββΌβββββββββββββββββββββββββββ€
β    13 β quack quack π¦ β 2024-05-19 10:25:44 UTC  β
βββββββββ΄βββββββββββββββββ΄βββββββββββββββββββββββββββThe filter syntax follows the same rules as the row_restriction field in BigQuery's Storage Read API. |
While bigquery_scan offers high-speed data retrieval, it does not support reading from views or external tables due to limitations of the Storage Read API. For those cases, consider using the bigquery_query function, which allows more complex querying capabilities.
The bigquery_scan function supports the following named parameters:
| Parameter | Type | Description | 
|---|---|---|
| filter | VARCHAR | Row restriction filter statements passed directly to BigQuery Storage Read API. | 
| use_legacy_scan | BOOLEAN | Use legacy scan implementation: true(legacy) orfalse(optimized, default). | 
| billing_project | VARCHAR | Project ID to bill for query execution (useful for public datasets). | 
| api_endpoint | VARCHAR | Custom BigQuery API endpoint URL. | 
| grpc_endpoint | VARCHAR | Custom BigQuery Storage gRPC endpoint URL. | 
The bigquery_query function allows you to run custom GoogleSQL read queries within your BigQuery project. Like bigquery_scan, it uses the optimized Arrow-based implementation by default, with the option to use use_legacy_scan=true for the legacy implementation if needed. This function is especially useful to get around the limitations of the BigQuery Storage Read API, such as reading from views or external tables.
D SELECT * FROM bigquery_query('my_gcp_project', 'SELECT * FROM `my_gcp_project.quacking_dataset.duck_tbl`');
βββββββββ¬βββββββββββββββββ
β   i   β       s        β
β int32 β    varchar     β
βββββββββΌβββββββββββββββββ€
β    12 β quack π¦       β
β    13 β quack quack π¦ β
βββββββββ΄βββββββββββββββββNote: If your goal is straightforward table reads,
bigquery_scanis often more efficient, as it bypasses the SQL layer for direct data access. However,bigquery_queryis ideal when you need to execute custom SQL that requires the full querying capabilities of BigQuery expressed in GoogleSQL. In this case, BigQuery transparently creates an anonymous temporary result table, which is fetched using the selected scan engine.
The dry_run parameter allows you to validate a query without executing it. This is useful for estimating query costs and checking syntax before running expensive queries:
D SELECT * FROM bigquery_query('my_gcp_project', 'SELECT * FROM `my_gcp_project.quacking_dataset.duck_tbl`', dry_run=true);
βββββββββββββββββββββββββ¬ββββββββββββ¬βββββββββββ
β total_bytes_processed β cache_hit β location β
β        int64          β  boolean  β varchar  β
βββββββββββββββββββββββββΌββββββββββββΌβββββββββββ€
β                    54 β false     β US       β
βββββββββββββββββββββββββ΄ββββββββββββ΄βββββββββββThe bigquery_query function supports the following named parameters:
| Parameter | Type | Description | 
|---|---|---|
| use_legacy_scan | BOOLEAN | Use legacy scan implementation: true(legacy) orfalse(optimized, default). | 
| dry_run | BOOLEAN | When true, validates the query without executing it. Returns metadata:total_bytes_processed,cache_hit, andlocation. | 
| billing_project | VARCHAR | Project ID to bill for query execution (useful for public datasets). | 
| api_endpoint | VARCHAR | Custom BigQuery API endpoint URL. | 
| grpc_endpoint | VARCHAR | Custom BigQuery Storage gRPC endpoint URL. | 
The bigquery_execute function runs arbitrary GoogleSQL queries directly in BigQuery. These queries are executed without interpretation by DuckDB. The call is synchronous and returns a result with details about the query execution, like the following.
D ATTACH 'project=my_gcp_project' as bq (TYPE bigquery);
D CALL bigquery_execute('bq', '
    CREATE SCHEMA deluxe_dataset
    OPTIONS(
        location="us",
        default_table_expiration_days=3.75,
        labels=[("label1","value1"),("label2","value2")]
    )
');
βββββββββββ¬βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββ¬βββββββββββββ¬ββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β success β             job_id               β    project_id   β location β total_rows β total_bytes_processed β num_dml_affected_rows β
β boolean β             varchar              β     varchar     β varchar  β   uint64   β         int64         β        varchar        β
βββββββββββΌβββββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββΌβββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββ€
β true    β job_-Xu_D2wxe2Xjh-ArZNwZ6gut5ggi β my_gcp_project  β US       β          0 β                     0 β 0                     β
βββββββββββ΄βββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ΄βββββββββββββ΄ββββββββββββββββββββββββ΄ββββββββββββββββββββββββSimilar to bigquery_query, the dry_run parameter allows you to validate queries without executing them:
D CALL bigquery_execute('my_gcp_project', 'SELECT * FROM `my_gcp_project.quacking_dataset.duck_tbl`', dry_run=true);
βββββββββββββββββββββββββ¬ββββββββββββ¬βββββββββββ
β total_bytes_processed β cache_hit β location β
β        int64          β  boolean  β varchar  β
βββββββββββββββββββββββββΌββββββββββββΌβββββββββββ€
β                    54 β false     β US       β
βββββββββββββββββββββββββ΄ββββββββββββ΄βββββββββββThe bigquery_execute function supports the following named parameters:
| Parameter | Type | Description | 
|---|---|---|
| dry_run | BOOLEAN | When true, validates the query without executing it. Returns metadata:total_bytes_processed,cache_hit, andlocation. | 
| api_endpoint | VARCHAR | Custom BigQuery API endpoint URL. | 
| grpc_endpoint | VARCHAR | Custom BigQuery Storage gRPC endpoint URL. | 
The bigquery_jobs fucntion retrieves a list of jobs within the specified project. It displays job metadata such as
job state, start and end time, configuration, statistics, and many more. More information on the job information can be found here.
D ATTACH 'project=my_gcp_project' as bq (TYPE bigquery);
D SELECT * FROM bigquery_jobs('bq', maxResults=2);
βββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββ¬ββββ¬βββββββββββββββββββββββ¬βββββββββββββββββββ
β   state   β        job_id        β  project  β β¦ β    configuration     β      status      β
β  varchar  β       varchar        β  varchar  β   β         json         β       json       β
βββββββββββββΌβββββββββββββββββββββββΌββββββββββββΌββββΌβββββββββββββββββββββββΌβββββββββββββββββββ€
β Completed β job_zAAv42SdMT51qkβ¦  β my_gcp_pβ¦ β β¦ β {"query":{"query":β¦  β {"state":"DONE"} β
β Completed β job_ro2WURJlGlkXCCβ¦  β my_gcp_pβ¦ β β¦ β {"query":{"query":β¦  β {"state":"DONE"} β
βββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββ΄ββββ΄βββββββββββββββββββββββ΄βββββββββββββββββββ€
β 2 rows                                                                16 columns (5 shown) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββThe operation supports the following additional named parameters as query arguments:
| Parameter | Type | Description | 
|---|---|---|
| jobId | VARCHAR | Filters results by job ID. Returns only the matching job, ignoring all other arguments. | 
| allUsers | BOOLEAN | If true, returns jobs for all users in the project. Default is false (only current user's jobs). | 
| maxResults | INTEGER | Limits the number of jobs returned. | 
| minCreationTime | TIMESTAMP | Filters jobs created after the specified time (in milliseconds since the epoch). | 
| maxCreationTime | TIMESTAMP | Filters jobs created before the specified time (in milliseconds since the epoch). | 
| stateFilter | VARCHAR | Filters jobs by state (e.g., PENDING,RUNNING,DONE). | 
| parentJobId | VARCHAR | Filters results to only include child jobs of the specified parent job ID. | 
DuckDB caches schema metadata, such as datasets and table structures, to avoid repeated fetches from BigQuery. If the schema changes externally, use bigquery_clear_cache to update the cache and retrieve the latest schema information:
D CALL bigquery_clear_cache();Public datasets can be accessed by specifying your project as a billing_project. All queries will then be executed and billed on that project. This works for functions such as bigquery_scan, bigquery_execute, and the ATTACH command.
D SELECT * FROM bigquery_scan('bigquery-public-data.geo_us_boundaries.cnecta', billing_project='my_gcp_project');
βββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββββββ¬ββββ¬ββββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββββββββββ
β geo_id  β cnecta_fips_code β         name         β β¦ β area_water_meters β int_point_lat β int_point_lon β     cnecta_geom      β
β varchar β     varchar      β       varchar        β   β       int64       β    double     β    double     β       varchar        β
βββββββββββΌβββββββββββββββββββΌβββββββββββββββββββββββΌββββΌββββββββββββββββββββΌββββββββββββββββΌββββββββββββββββΌβββββββββββββββββββββββ€
β 710     β 710              β Augusta-Watervilleβ¦  β β¦ β         183936850 β    44.4092939 β   -69.6901717 β POLYGON((-69.79281β¦  β
β 775     β 775              β Portland-Lewiston-β¦  β β¦ β        1537827560 β    43.8562034 β   -70.3192682 β POLYGON((-70.48007β¦  β
β 770     β 770              β Pittsfield-North Aβ¦  β β¦ β          24514153 β    42.5337519 β   -73.1678825 β POLYGON((-73.30698β¦  β
β 790     β 790              β Springfield-Hartfoβ¦  β β¦ β         256922043 β    42.0359069 β   -72.6213616 β POLYGON((-72.63682β¦  β
β 715     β 715              β Boston-Worcester-Pβ¦  β β¦ β        3004814151 β    42.3307869 β   -71.3296644 β MULTIPOLYGON(((-71β¦  β
β 725     β 725              β Lebanon-Claremont,β¦  β β¦ β          58476158 β    43.6727226 β   -72.2484543 β POLYGON((-72.39601β¦  β
β 720     β 720              β Bridgeport-New Havβ¦  β β¦ β         374068423 β    41.3603421 β   -73.1284227 β MULTIPOLYGON(((-72β¦  β
βββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββββββ΄ββββ΄ββββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββ€
β 7 rows                                                                                                      11 columns (7 shown) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββThe BigQuery extension supports geospatial data with automatic conversion between DuckDB GEOMETRY and BigQuery GEOGRAPHY types. By default, BigQuery GEOGRAPHY columns are read as VARCHAR (WKT format). With the spatial extension installed and loaded and the bq_geography_as_geometry setting enabled, they can be read as native DuckDB GEOMETRY types:
-- Load spatial extension and enable GEOMETRY support (BEFORE ATTACH)
D INSTALL spatial; LOAD spatial;
D SET bq_geography_as_geometry = true;
D ATTACH 'project=my_gcp_project' AS bq (TYPE bigquery);
-- Read GEOGRAPHY columns as native GEOMETRY
D SELECT name, geography_column FROM bq.dataset.geo_table;
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β     name     β            geography_column              β
β   varchar    β                geometry                  β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ€
β Location A   β POINT(-122.4194 37.7749)                 β
β Location B   β POLYGON((-122.5 37.7, -122.3 37.8, ...)) β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ
-- Write GEOMETRY data - automatically converted to BigQuery GEOGRAPHY
D INSERT INTO bq.dataset.geo_table VALUES
    ('New Point', ST_Point(-122.2, 37.8)),
    ('Buffer Zone', ST_Buffer(ST_Point(-122.0, 37.9), 0.01));
β οΈ Spatial Extension Loading Order: The spatial extension must be installed and loaded before settingbq_geography_as_geometry=trueand before usingATTACH. Otherwise, the internal catalog will be configured forVARCHARtypes and geometry conversion will not work properly.
| Setting | Description | Default | 
|---|---|---|
| bq_bignumeric_as_varchar | Read BigQuery BIGNUMERICcolumns asVARCHARinstead of causing a type mapping error. Note: Only supported with legacy scan. | false | 
| bq_use_legacy_scan | Use legacy scan implementation instead of optimized Arrow-based scan. Set to truefor the old scan orfalsefor the new optimized implementation. | false | 
| bq_geography_as_geometry | Return BigQuery GEOGRAPHYcolumns as DuckDBGEOMETRYtypes (requires spatial extension). Whenfalse, returns WKT strings asVARCHAR. | false | 
| bq_query_timeout_ms | Timeout for BigQuery queries in milliseconds. If a query exceeds this time, the operation stops waiting. | 90000 | 
| bq_debug_show_queries | [DEBUG] - whether to print all queries sent to BigQuery to stdout | false | 
| bq_experimental_filter_pushdown | [EXPERIMENTAL] - Whether or not to use filter pushdown | true | 
| bq_experimental_use_info_schema | [EXPERIMENTAL] - Use information schema to fetch catalog info (often faster than REST API) | true | 
| bq_experimental_enable_bigquery_options | [EXPERIMENTAL] - Whether to enable BigQuery OPTIONS in CREATE statements | false | 
| bq_curl_ca_bundle_path | Path to the CA certificates used by cURL for SSL certificate verification | |
| bq_max_read_streams | Maximum number of read streams for BigQuery Storage Read. Set to 0 to automatically match the number of DuckDB threads. Requires SET preserve_insertion_order=FALSEfor parallelization to work. | 0 | 
| bq_arrow_compression | Compression codec for BigQuery Storage Read API. Options: UNSPECIFIED,LZ4_FRAME,ZSTD | ZSTD | 
There are some limitations that arise from the combination of DuckDB and BigQuery. These include:
- 
Reading from Views: This DuckDB extension utilizes the BigQuery Storage Read API to optimize reading results. However, this approach has limitations (see here for more information). First, the Storage Read API does not support direct reading from logical or materialized views. Second, reading external tables is not supported. To mitigate these limitations, you can use the bigquery_queryfunction to execute the query directly in BigQuery.
- 
Propagation Delay: After creating a table in BigQuery, there might be a brief propagation delay before the table becomes fully "visible". Therefore, be aware of potential delays when executing CREATE TABLE ... ASorCREATE OR REPLACE TABLE ...statements followed by immediate inserts. This delay is usually just a matter of seconds, but in rare cases, it can take up to a minute.
- 
BIGNUMERIC Type Support: The bq_bignumeric_as_varcharsetting is only supported with the legacy scan implementation. If you need to read BIGNUMERIC columns as VARCHAR, ensure you useuse_legacy_scan=truein scan functions or setbq_use_legacy_scan=trueglobally. The optimized Arrow-based scan does not currently support this conversion.
- 
Primary Keys and Foreign Keys: While BigQuery recently introduced the concept of primary keys and foreign keys constraints, they differ from what you're accustomed to in DuckDB or other traditional RDBMS. Therefore, this extension does not support this concept. 
Updates may not always be immediately available in the Community Extension repository. However, they can be obtained from a custom repository. To get the latest updates, start DuckDB with unsigned extensions setting enabled. Use the allow_unsigned_extensions flag for client connections, or start the CLI with -unsigned as follows:
# Example: CLI
duckdb -unsigned
# Example: Python
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})Then set the custom repository and install the extension:
-- Set the custom repository, then install and load the DuckDB BigQuery extension
D SET custom_extension_repository = 'http://storage.googleapis.com/hafenkran';
D FORCE INSTALL 'bigquery';
D LOAD 'bigquery';This extension uses VCPKG for dependency management. Enabling VCPKG is very simple: follow the installation instructions or just run the following:
git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmakeNow to build the extension, run:
makeThe main binaries that will be built are:
# the binary for the duckdb shell with the extension code automatically loaded.
./build/release/duckdb
# the test runner of duckdb. Again, the extension is already linked into the binary.
./build/release/test/unittest
# the loadable extension binary as it would be distributed.
./build/release/extension/bigquery/bigquery.duckdb_extensionAfter this step you can either simply start the shell with ./build/release/duckdb or by installing/loading the extension from inside your duckdb instance with:
INSTALL './build/release/extension/bigquery/bigquery.duckdb_extension'
LOAD 'bigquery'Now you can use the features from this extension.
You can also build the project using Docker. This approach simplifies dependencies management and setup. You can build the docker image by using the provided make command as follows:
make docker-buildTo run the Docker container, you'll need to authenticate with Google Cloud using a service account. Ensure you have a service account JSON file for Google Cloud authentication. The service account credentials must be set up as environment variables and mounted to the container. Use the following command to run the Docker container, replacing /path/to/my/service-account-credentials.json with the actual path to your service account JSON file:
docker run \
    -it \
    -v /path/to/my/service-account-credentials.json:/creds \
    -e GOOGLE_APPLICATION_CREDENTIALS=/creds/service-account-credentials.json \
    duckdb-bigquery:v1.4.0The extension includes two test suites designed for different testing scenarios:
The test/sql/bigquery/ directory contains tests that run against a real Google BigQuery project. These tests verify the extension's behavior with the actual BigQuery API.
Prerequisites:
- A Google Cloud project with BigQuery enabled
- Valid service account credentials or Application Default Credentials
- A test dataset in your project
Running BigQuery Tests:
# Run all BigQuery tests
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json \
BQ_TEST_PROJECT=your-project-id \
BQ_TEST_DATASET=your-dataset \
./build/release/test/unittest 'test/sql/bigquery/*.test'
# Run a specific test
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json \
BQ_TEST_PROJECT=your-project-id \
BQ_TEST_DATASET=your-dataset \
./build/release/test/unittest 'test/sql/bigquery/attach_default.test'Environment Variables:
- GOOGLE_APPLICATION_CREDENTIALS: Path to your service account JSON file
- BQ_TEST_PROJECT: Your Google Cloud project ID
- BQ_TEST_DATASET: A test dataset in your project (will be used for read/write operations)
- BQ_SERVICE_ACCOUNT_JSON: Full JSON content for secret authentication tests
The test/sql/local/ directory contains tests that run against the BigQuery Emulator, a local BigQuery-compatible server. These tests are ideal for CI/CD pipelines and development without incurring BigQuery costs.
# Start the emulator
./bigquery-emulator --project=test --dataset=dataset1Running Emulator Tests:
BQ_API_ENDPOINT=0.0.0.0:9050 \
BQ_GRPC_ENDPOINT=0.0.0.0:9060 \
./build/release/test/unittest 'test/sql/local/*.test'
# Run a specific local test
BQ_API_ENDPOINT=0.0.0.0:9050 \
BQ_GRPC_ENDPOINT=0.0.0.0:9060 \
./build/release/test/unittest 'test/sql/local/attach_insert_table.test'Limitations:
- The emulator may not support all BigQuery features
- Some advanced SQL functions might behave differently
- Performance characteristics will differ from production BigQuery
β οΈ Disclaimer: This is an independent, community-maintained open-source project and is not affiliated with, endorsed by, or officially supported by Google LLC, or any of their subsidiaries. This extension is provided "as is" without any warranties or guarantees. "DuckDB" and "BigQuery" are trademarks of their respective owners. Users are solely responsible for compliance with applicable terms of service and any costs incurred through usage.
When using this software with Google BigQuery, please ensure your usage complies with the Google API Terms of Service. Be mindful of the usage limits and quotas, and adhere to Google's Fair Use Policy.
Please be aware that using Google BigQuery through this software can incur costs. Google BigQuery is a paid service, and charges may apply based on the amount of data processed, stored, and the number of queries executed. Users are responsible for any costs associated with their use of Google BigQuery. For detailed information on BigQuery pricing, please refer to the Google BigQuery Pricing page. It is recommended to monitor your usage and set up budget alerts in the Google Cloud Console to avoid unexpected charges.
By using this software, you acknowledge and agree that you are solely responsible for any charges incurred from Google BigQuery.