Skip to content

Commit 7166cdb

Browse files
Add GROUP BY documentation
1 parent b242d54 commit 7166cdb

File tree

4 files changed

+388
-0
lines changed

4 files changed

+388
-0
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Diagram(
2+
Terminal('GROUP'),
3+
Terminal('BY'),
4+
NonTerminal('expression'),
5+
Optional(
6+
Sequence(
7+
Terminal('AS'),
8+
NonTerminal('alias')
9+
)
10+
),
11+
OneOrMore(
12+
Sequence(
13+
Terminal(','),
14+
NonTerminal('expression'),
15+
Optional(
16+
Sequence(
17+
Terminal('AS'),
18+
NonTerminal('alias')
19+
)
20+
)
21+
)
22+
),
23+
)
Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
========
2+
GROUP BY
3+
========
4+
5+
.. _group_by:
6+
7+
Groups rows that have the same values in specified columns into aggregated rows, typically used with aggregate functions.
8+
9+
Syntax
10+
======
11+
12+
.. raw:: html
13+
:file: GROUP_BY.diagram.svg
14+
15+
The GROUP BY clause is used in SELECT statements:
16+
17+
.. code-block:: sql
18+
19+
SELECT column1, aggregate_function(column2)
20+
FROM table_name
21+
GROUP BY column1
22+
23+
Parameters
24+
==========
25+
26+
``GROUP BY expression [AS alias], ...``
27+
Groups rows based on the values of one or more expressions. Each unique combination of expression values creates a separate group.
28+
29+
``expression``
30+
Can be:
31+
32+
- Column names
33+
- Nested field references (e.g., ``struct_column.field``)
34+
- Expressions or calculations
35+
36+
``alias`` (optional)
37+
An optional alias for the grouping expression
38+
39+
Returns
40+
=======
41+
42+
Returns one row per unique combination of grouped values. When used with aggregate functions, computes aggregate values for each group.
43+
44+
Examples
45+
========
46+
47+
Setup
48+
-----
49+
50+
For these examples, assume we have an ``employees`` table:
51+
52+
.. code-block:: sql
53+
54+
CREATE TABLE employees(
55+
id BIGINT,
56+
department STRING,
57+
role STRING,
58+
salary BIGINT,
59+
PRIMARY KEY(id))
60+
61+
CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department
62+
CREATE INDEX role_idx AS SELECT role FROM employees ORDER BY role
63+
64+
INSERT INTO employees VALUES
65+
(1, 'Engineering', 'Developer', 100000),
66+
(2, 'Engineering', 'Developer', 110000),
67+
(3, 'Engineering', 'Manager', 150000),
68+
(4, 'Sales', 'Representative', 80000),
69+
(5, 'Sales', 'Manager', 120000)
70+
71+
GROUP BY Single Column
72+
-----------------------
73+
74+
Count employees by department:
75+
76+
.. code-block:: sql
77+
78+
SELECT department, COUNT(*) AS employee_count
79+
FROM employees
80+
GROUP BY department
81+
82+
.. list-table::
83+
:header-rows: 1
84+
85+
* - :sql:`department`
86+
- :sql:`employee_count`
87+
* - :json:`"Engineering"`
88+
- :json:`3`
89+
* - :json:`"Sales"`
90+
- :json:`2`
91+
92+
GROUP BY Multiple Columns
93+
--------------------------
94+
95+
Count employees by department and role:
96+
97+
.. code-block:: sql
98+
99+
SELECT department, role, COUNT(*) AS employee_count
100+
FROM employees
101+
GROUP BY department, role
102+
103+
.. list-table::
104+
:header-rows: 1
105+
106+
* - :sql:`department`
107+
- :sql:`role`
108+
- :sql:`employee_count`
109+
* - :json:`"Engineering"`
110+
- :json:`"Developer"`
111+
- :json:`2`
112+
* - :json:`"Engineering"`
113+
- :json:`"Manager"`
114+
- :json:`1`
115+
* - :json:`"Sales"`
116+
- :json:`"Representative"`
117+
- :json:`1`
118+
* - :json:`"Sales"`
119+
- :json:`"Manager"`
120+
- :json:`1`
121+
122+
GROUP BY with Aggregate Functions
123+
-----------------------------------
124+
125+
Calculate average salary by department:
126+
127+
.. code-block:: sql
128+
129+
SELECT department, AVG(salary) AS avg_salary
130+
FROM employees
131+
GROUP BY department
132+
133+
.. list-table::
134+
:header-rows: 1
135+
136+
* - :sql:`department`
137+
- :sql:`avg_salary`
138+
* - :json:`"Engineering"`
139+
- :json:`120000.0`
140+
* - :json:`"Sales"`
141+
- :json:`100000.0`
142+
143+
Calculate multiple aggregates:
144+
145+
.. code-block:: sql
146+
147+
SELECT department,
148+
COUNT(*) AS employee_count,
149+
MIN(salary) AS min_salary,
150+
MAX(salary) AS max_salary,
151+
AVG(salary) AS avg_salary
152+
FROM employees
153+
GROUP BY department
154+
155+
.. list-table::
156+
:header-rows: 1
157+
158+
* - :sql:`department`
159+
- :sql:`employee_count`
160+
- :sql:`min_salary`
161+
- :sql:`max_salary`
162+
- :sql:`avg_salary`
163+
* - :json:`"Engineering"`
164+
- :json:`3`
165+
- :json:`100000`
166+
- :json:`150000`
167+
- :json:`120000.0`
168+
* - :json:`"Sales"`
169+
- :json:`2`
170+
- :json:`80000`
171+
- :json:`120000`
172+
- :json:`100000.0`
173+
174+
GROUP BY with HAVING Clause
175+
-----------------------------
176+
177+
Filter groups using HAVING:
178+
179+
.. code-block:: sql
180+
181+
SELECT department, AVG(salary) AS avg_salary
182+
FROM employees
183+
GROUP BY department
184+
HAVING AVG(salary) > 110000
185+
186+
.. list-table::
187+
:header-rows: 1
188+
189+
* - :sql:`department`
190+
- :sql:`avg_salary`
191+
* - :json:`"Engineering"`
192+
- :json:`120000.0`
193+
194+
The HAVING clause filters groups after aggregation, unlike WHERE which filters rows before grouping.
195+
196+
GROUP BY with Column Aliases
197+
------------------------------
198+
199+
Use aliases for grouped columns:
200+
201+
.. code-block:: sql
202+
203+
SELECT department AS dept, COUNT(*) AS total
204+
FROM employees
205+
GROUP BY department AS dept
206+
207+
.. list-table::
208+
:header-rows: 1
209+
210+
* - :sql:`dept`
211+
- :sql:`total`
212+
* - :json:`"Engineering"`
213+
- :json:`3`
214+
* - :json:`"Sales"`
215+
- :json:`2`
216+
217+
Important Notes
218+
===============
219+
220+
Index Requirement
221+
-----------------
222+
223+
**GROUP BY operations require an appropriate index for optimal performance.** The query planner needs an index on the grouped column(s) to execute the query efficiently. Without a suitable index, the query will fail with an "unable to plan" error.
224+
225+
Example index creation:
226+
227+
.. code-block:: sql
228+
229+
CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department
230+
231+
See :ref:`Indexes <index_definition>` for details on creating indexes that support GROUP BY operations.
232+
233+
Column Selection Rules
234+
----------------------
235+
236+
* Only columns in the GROUP BY clause or aggregate functions can appear in the SELECT list
237+
* Selecting non-grouped, non-aggregated columns will result in error 42803
238+
239+
**Invalid example**:
240+
241+
.. code-block:: sql
242+
243+
-- ERROR: id is neither grouped nor aggregated
244+
SELECT id, department, COUNT(*)
245+
FROM employees
246+
GROUP BY department
247+
248+
**Valid example**:
249+
250+
.. code-block:: sql
251+
252+
-- OK: all non-aggregated columns are grouped
253+
SELECT department, role, COUNT(*)
254+
FROM employees
255+
GROUP BY department, role
256+
257+
Nested Fields
258+
-------------
259+
260+
GROUP BY supports grouping on nested struct fields:
261+
262+
.. code-block:: sql
263+
264+
SELECT address.city, COUNT(*) AS resident_count
265+
FROM people
266+
GROUP BY address.city
267+
268+
Execution Model
269+
---------------
270+
271+
The Relational Layer does not perform in-memory grouping. All GROUP BY operations must be backed by an appropriate index. This is a fundamental architectural constraint that ensures queries can execute efficiently over large datasets.
272+
273+
See Also
274+
========
275+
276+
* :ref:`Aggregate Functions <aggregate_functions>` - Functions used with GROUP BY
277+
* :ref:`Indexes <index_definition>` - Creating indexes for GROUP BY
278+
* :ref:`HAVING Clause <having>` - Filtering grouped results
279+
* :ref:`SELECT Statement <select>` - Full SELECT syntax

yaml-tests/src/test/java/DocumentationQueriesTests.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,9 @@ void aggregateFunctionsDocumentationQueriesTests(YamlTest.Runner runner) throws
5353
void scalarFunctionsDocumentationQueriesTests(YamlTest.Runner runner) throws Exception {
5454
runner.runYamsql(PREFIX + "/scalar-functions-documentation-queries.yamsql");
5555
}
56+
57+
@TestTemplate
58+
void groupByDocumentationQueriesTests(YamlTest.Runner runner) throws Exception {
59+
runner.runYamsql(PREFIX + "/group-by-documentation-queries.yamsql");
60+
}
5661
}
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
options:
3+
supported_version: 4.3.2.0
4+
---
5+
schema_template:
6+
create table employees(id bigint, department string, role string, salary bigint, primary key(id))
7+
create index dept_idx as select department from employees order by department
8+
create index role_idx as select role from employees order by role
9+
---
10+
setup:
11+
steps:
12+
- query: insert into employees
13+
values (1, 'Engineering', 'Developer', 100000),
14+
(2, 'Engineering', 'Developer', 110000),
15+
(3, 'Engineering', 'Manager', 150000),
16+
(4, 'Sales', 'Representative', 80000),
17+
(5, 'Sales', 'Manager', 120000)
18+
---
19+
test_block:
20+
name: group-by-documentation-tests
21+
preset: single_repetition_ordered
22+
tests:
23+
# GROUP BY Single Column
24+
-
25+
- query: SELECT department, COUNT(*) AS employee_count
26+
FROM employees
27+
GROUP BY department
28+
- supported_version: 4.3.2.0
29+
- unorderedResult: [{department: "Engineering", employee_count: 3},
30+
{department: "Sales", employee_count: 2}]
31+
32+
# GROUP BY Multiple Columns
33+
-
34+
- query: SELECT department, role, COUNT(*) AS employee_count
35+
FROM employees
36+
GROUP BY department, role
37+
- supported_version: 4.3.2.0
38+
- unorderedResult: [{department: "Engineering", role: "Developer", employee_count: 2},
39+
{department: "Engineering", role: "Manager", employee_count: 1},
40+
{department: "Sales", role: "Representative", employee_count: 1},
41+
{department: "Sales", role: "Manager", employee_count: 1}]
42+
43+
# GROUP BY with Aggregate Functions - Average salary
44+
-
45+
- query: SELECT department, AVG(salary) AS avg_salary
46+
FROM employees
47+
GROUP BY department
48+
- supported_version: 4.3.2.0
49+
- unorderedResult: [{department: "Engineering", avg_salary: 120000.0},
50+
{department: "Sales", avg_salary: 100000.0}]
51+
52+
# Calculate multiple aggregates
53+
-
54+
- query: SELECT department,
55+
COUNT(*) AS employee_count,
56+
MIN(salary) AS min_salary,
57+
MAX(salary) AS max_salary,
58+
AVG(salary) AS avg_salary
59+
FROM employees
60+
GROUP BY department
61+
- supported_version: 4.3.2.0
62+
- unorderedResult: [{department: "Engineering", employee_count: 3, min_salary: 100000, max_salary: 150000, avg_salary: 120000.0},
63+
{department: "Sales", employee_count: 2, min_salary: 80000, max_salary: 120000, avg_salary: 100000.0}]
64+
65+
# GROUP BY with HAVING Clause
66+
-
67+
- query: SELECT department, AVG(salary) AS avg_salary
68+
FROM employees
69+
GROUP BY department
70+
HAVING AVG(salary) > 110000
71+
- supported_version: 4.3.2.0
72+
- unorderedResult: [{department: "Engineering", avg_salary: 120000.0}]
73+
74+
# GROUP BY with Column Aliases
75+
-
76+
- query: SELECT department AS dept, COUNT(*) AS total
77+
FROM employees
78+
GROUP BY department AS dept
79+
- supported_version: 4.3.2.0
80+
- unorderedResult: [{dept: "Engineering", total: 3},
81+
{dept: "Sales", total: 2}]

0 commit comments

Comments
 (0)