Skip to content

Conversation

@mlell
Copy link

@mlell mlell commented Dec 9, 2025

This implements the SQL GROUP BY modifiers CUBE, ROLLUP, and GROUPING SETS, which allow to add subtotals to the output. For me, this is very useful to produce, for example monthly reports of income vs. expenses, where the row (Total) tells me the net gain/loss of the month:

beanquery> select yearmonth(date) as month, account, sum(position) where account !~ "Assets"  group by cube (account,month) 
pivot by account,month
  account/month      2024-02-01    2024-03-01    2024-04-01      (Total)   
------------------  ------------  ------------  ------------  -------------
Expenses:Dining                     120.50 USD     45.25 USD     165.75 USD
Expenses:Groceries    250.00 USD    310.75 USD    275.60 USD     836.35 USD
Income:Salary       -5000.00 USD  -5000.00 USD  -5000.00 USD  -15000.00 USD
(Total)             -4750.00 USD  -4568.75 USD  -4679.15 USD  -13997.90 USD

What do you think of this?

@mlell mlell mentioned this pull request Dec 9, 2025
@mlell
Copy link
Author

mlell commented Dec 9, 2025

Would fix #253

… refactor _select(), remove _compile_select_base() to move closer to the original logic before ROLLUP, CUBE, etc.
- In AST: Change 'sets' modifier to 'grouping sets' for consistency
- Update AST structure to use GroupByElement for consistency with other language constructs
- Update tests to reflect new structure
@dnicolodi
Copy link
Collaborator

Thanks for working on this. This would be a nice feature to have in beanquery. Although, there are a few issues with the PR.

First, to make the PR more easily reviewable, the patches should be organized in logical commits, not in commits working a reworking the same code. In particular, GROUP BY CUBE and GROUP BY ROLLUP are simply syntactic sugar for GROUP BY GROUPING SETS thus, if anything, the latter should be introduced first, and the former added later, not the other way around.

Glancing at the code, EvalUnion is written to be specific for GROUPING SETS support, however, it does not do any optimization of the evaluation of the query for different GROUP BY clauses, but it just concatenates the queries, thus it can be used to add generic UNION support.

I don't understand which problem the addition of the Sentinel class solves: it is unused in the commit that adds and two of the module level constants added are never used, AFAICT.

Finally, the use of (Total) as a fill value for grouping columns is not standard and thus would require very good motivation, which is not provided. It also seems wrong to use (Total) for some columns when grouping over multiple columns is involved.

@mlell
Copy link
Author

mlell commented Dec 28, 2025

Thank you for considering this! I will follow your review:

  • I will rebase the commits to follow the logical schema you have suggested: EvalUnion, GROUPING SETS, then syntactic sugar.
  • I see that the Sentinel object might lead into a dead end and is out of scope here. I will remove it. The motivation was the following:
    • disambiguate the values which are NULL because of the grouping pattern from those which are NULL because of missing data.
    • The modified version of beangulp which I use for me contains the function date_cap, which can collate all dates outside of a given range into (earlier) or (later). I needed some way to make sure (earlier) sorts above the values and (later) sorts below.

Remaining questions

  • You mentioned UNION support. Its true that EvalUnion allows this. Should I add this to the BQL or is this out of scope for now?
  • Should I include ORDER BY .... NULLS (FIRST|LAST) into the BQL to allow for some way to sort the totals to the bottom?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants