Skip to content

SNOW-295953: In CREATE TABLE AS, cursor.rowcount should return number of rows created #645

@jtcohen6

Description

@jtcohen6

What is the current behavior?

When the connector executes a CREATE TABLE ... AS SELECT or CREATE VIEW ... AS SELECT statement, the cursor.rowcount returns 1. Instead, I believe it should return the number of rows created (for tables) or -1 (for views).

I believe this to be the spirit of what rowcount represents:

rowcount: The number of records updated or selected. If not clear, -1 is returned.

While I'm not positive, I believe the current behavior (returning 1 for these query types) reflects the fact that a single "row of data" is returned, i.e. the status message. Here's cursor.fetchall() in response to both statement types:

[('Table TABLE_MODEL successfully created.',)]
[('Table VIEW_MODEL successfully created.',)]

What is the desired behavior?

I would like the Snowflake connector's cursor to behave more like the cursor in psycopg2, the standard python connector for Postgres. In psycopg2 (docs), the rowcount returned by CREATE VIEW, CREATE VIEW AS, and CREATE TABLE is always -1; whereas CREATE TABLE AS, treated as DML, returns the number of rows modified (i.e. created).

>>> import psycopg2
>>> con = psycopg2.connect(...)
>>> cur = con.cursor()
>>> cur.execute("create table my_table as (select 1 as id union all select 2 union all select 3)")
>>> cur.statusmessage
'SELECT 3'
>>> cur.rowcount
3
>>> import snowflake.connector
>>> handle = snowflake.connector.connect(...)
>>> cur = handle.cursor()
>>> cur.execute("create table my_table as (select 1 as id union all select 2 union all select 3)")
<snowflake.connector.cursor.SnowflakeCursor object at 0x10aba7eb0>
>>> cur.rowcount
1
>>> cur.fetchall()
[('Table MY_TABLE successfully created.',)]

How would this improve snowflake-connector-python?

Most directly, this would improve the connector by aligning it with the standard set by established connectors/cursors.

On a deeper level, information about the number of rows stored in a new table is tremendously useful for everyone, and strictly necessary for organizations that (for regulatory/compliance reasons) need rigorous auditing of data movement.

I'm one of the maintainers of dbt. Our models make frequent use of create or replace table as on Snowflake—a powerful feature for atomic and idempotent data transformation. When dbt materializes a model in the databases, it captures a database-specific AdapterResponse object, including a number of rows_affected, and writes those results to a metadata artifact (run_results.json, docs) that can be used for auditing, runtime analysis, etc. Here's the code for Snowflake, which directly relies on the python connector's cursor.rowcount.

I opened a related issue earlier this week (dbt-labs/dbt-core#3142), responding to a request from a joint dbt + Snowflake customer. I'm still clarifying the exact requirements, but I believe that storing record counts for dbt-produced tables in dbt-produced artifacts would be simpler and more reliable if the Snowflake connector's cursor returned the number of rows modified (created) by CTA statements.

I'm happy to hear disagreement, or that there are good reason for preserving the current behavior. Thanks folks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions