Skip to content

Feature/fix arithmetic on aggregation failure#64

Open
jeffery1236 wants to merge 2 commits intoPierreSenellart:masterfrom
jeffery1236:feature/fix-arithmetic-on-aggregation-failure
Open

Feature/fix arithmetic on aggregation failure#64
jeffery1236 wants to merge 2 commits intoPierreSenellart:masterfrom
jeffery1236:feature/fix-arithmetic-on-aggregation-failure

Conversation

@jeffery1236
Copy link
Copy Markdown

@jeffery1236 jeffery1236 commented Dec 4, 2025

This PR fixes an issue where arithmetic expressions involving provsql aggregation results (e.g., SELECT AVG(col) * 10 FROM table) would fail or return empty results.

The provenance_aggregate function returns a custom agg_token type to hold both the value and the provenance token. However, this type lacked casts to standard numeric types. As a result, any subsequent operations (like multiplication or division) on the aggregation result would fail because the database did not know how to handle agg_token in those expressions.

Changes

  • src/agg_token.c: Implemented C helper functions (agg_token_numeric, agg_token_float8, agg_token_int4, agg_token_int8) that extract the value from the agg_token string representation and convert it to the appropriate Postgres type.
  • sql/provsql.common.sql: Exposed these fun

@PierreSenellart
Copy link
Copy Markdown
Owner

Thanks! I will review these and integrate them in the next couple of weeks. Note that there are several issues related to maintaining provenance for this kind of queries:

  • there is no good formal model of provenance for non-monoid aggregate functions, such as AVG (see https://arxiv.org/pdf/2504.12058 for the formal model, where the need for a monoid aggregate function is discussed)
  • even for monoid aggregate functions, converting them to numbers means losing the provenance token, which contains important information
  • a better approach might be, for select arithmetic operations, to push them inside the aggregation operator, which would allow them to be processed by PostgreSQL before generating the aggregation token, but this would require more work

Still, I think that your approach of converting to numbers is appropriate. But I think it would be good to add a warning that provenance information is lost while converting. And it would also be useful to add such a warning when an agg_token is generated for a non-monoid aggregate function, as there won't be much that can be done with the resulting agg_token.

Also note that we are currently working on doing useful stuff with these aggregate tokens (for now, ProvSQL can compute them, but does nothing with them). Feel free to reach out if you think your use of ProvSQL would benefit of this.

@staskikotx
Copy link
Copy Markdown

staskikotx commented Dec 15, 2025

Will this help with a sum of two aggregate functions ?

I wish I could receive in this example

 CREATE TABLE my_table (
    value INTEGER,
    kind INTEGER
);

INSERT INTO my_table (value, kind)
VALUES (1, 1),
       (2, 2),
       (3, 3);

SELECT  add_provenance ('my_table');

SELECT SUM(CASE WHEN t.kind = 1 THEN t.value ELSE 0 END) + SUM(CASE WHEN t.kind = 2 THEN t.value ELSE 0 END) as s from my_table as t;

the same provenance as for filtering with kind = 1 OR kind = 2.

(Yes, I know that I can rewrite this query as
SELECT SUM(CASE WHEN t.kind = 1 OR t.kind = 2 THEN t.value ELSE 0 END) as s from my_table as t;
and as
SELECT SUM(t.value) as s FROM my_table as t WHERE t.kind = 1 OR t.kind = 2 ;
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants