It looks like the most recent update on pandas-gbq might have broken our tests. When writing to bigquery this
pd.DataFrame.to_gbq(
df,
destination_table=f"{dataset_id}.{table_id}",
project_id=project_id,
chunksize=5,
if_exists="append",
)
with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.
From the discussion on #11 we know that
pandas-gbq 0.16 changed the default intermediate data serialization format to parquet instead of CSV.
Likely this means the backend loader required fewer workers and wrote it to fewer files behind the scenes
It looks like the most recent update on pandas-gbq might have broken our tests. When writing to bigquery this
with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.
From the discussion on #11 we know that
pandas-gbq <= 0.15or avoid asserting forddf.npartitionspandas-gbqand usebigquery.Client.load_table_from_dataframeor something like this https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table_that_uses_column-based_time_partitioning