pandas-gbq 0.16 release broke dask-bigquery CI 

It looks like the most recent update on pandas-gbq might have broken our tests. When writing to bigquery this

```python
pd.DataFrame.to_gbq(
        df,
        destination_table=f"{dataset_id}.{table_id}",
        project_id=project_id,
        chunksize=5,
        if_exists="append",
    )
```

with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.

From the discussion on #11 we know that 

> pandas-gbq 0.16 changed the default intermediate data serialization format to parquet instead of CSV.
> Likely this means the backend loader required fewer workers and wrote it to fewer files behind the scenes

- [x] Short term solution: pin `pandas-gbq <= 0.15` or avoid asserting for `ddf.npartitions`
- [ ] Long term solution: Avoid using `pandas-gbq` and use [`bigquery.Client.load_table_from_dataframe`](https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-dataframe) or something like this https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table_that_uses_column-based_time_partitioning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas-gbq 0.16 release broke dask-bigquery CI #24

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pandas-gbq 0.16 release broke dask-bigquery CI #24

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions