Currently, the to_bigquery presented in the gist uses temporary storage, I think this is not ideal given that the user will have to create the storage to be able to do this.
I was wondering if it would be possible to take a similar approach what to was done for dask-mongo where the write_bgq would be using pandas.to_gbq() on the pandas df that comes from each partition. Where partitions will look something like
def to_bgq(ddf, some_args):
partitions = [
write_gbq(partition, connection_args)
for partition in ddf.to_delayed()
]
dask.compute(partitions)
and write_bigquery will have something of the form:
@delayed
def write_gbq():
with bigquery.Client() as bq_client:
pd.to_gbq(df, some_args)
Currently, the
to_bigquerypresented in the gist uses temporary storage, I think this is not ideal given that the user will have to create the storage to be able to do this.I was wondering if it would be possible to take a similar approach what to was done for
dask-mongowhere thewrite_bgqwould be usingpandas.to_gbq()on the pandasdfthat comes from each partition. Where partitions will look something likeand
write_bigquerywill have something of the form: