fix: apply HAVING filter after cross-shard aggregation (#281)#297
fix: apply HAVING filter after cross-shard aggregation (#281)#297WyattJia wants to merge 2 commits intoXiaoMi:mainfrom
Conversation
When executing aggregate queries with HAVING clause across multiple shards, the HAVING condition was pushed down to each shard directly. This caused incorrect results because aggregate functions (SUM, COUNT, etc.) need to be computed globally after merging results from all shards. Changes: - Add havingExpr field to SelectPlan to store original HAVING expression - Add containsAggregateFunc() to detect if HAVING contains aggregate functions - Only remove HAVING from pushed SQL when it contains aggregate functions - Add HavingFilter to evaluate HAVING conditions on merged results - Call filterHavingResult() in MergeSelectResult after aggregation Fixes XiaoMi#281
|
@WyattJia 非常非常感谢您提交的这个修复补丁! 🙏 这个解决跨分片 HAVING 子句下推问题的方案非常专业。
再次感谢您对Gaea的关键贡献!!!期待未来继续合作! 💗✨ |
|
@WyattJia hi I discovered a problem during the testing process. 复现环境
{
"open_general_log": true,
"is_encrypt": true,
"name": "test_namespace",
"online": true,
"read_only": false,
"allowed_dbs": {
"test": true
},
"default_phy_dbs": null,
"slow_sql_time": "1000",
"black_sql": [],
"allowed_ip": null,
"slices": [
{
"name": "slice-0",
"user_name": "superroot",
"password": "superroot",
"master": "127.0.0.1:3349",
"slaves": [],
"statistic_slaves": [],
"capacity": 1,
"max_capacity": 1,
"idle_timeout": 3600,
"capability": 41479,
"init_connect": "",
"health_check_sql": ""
},
{
"name": "slice-1",
"user_name": "superroot",
"password": "superroot",
"master": "127.0.0.1:3379",
"slaves": [],
"statistic_slaves": [],
"capacity": 1,
"max_capacity": 1,
"idle_timeout": 3600,
"capability": 41479,
"init_connect": "",
"health_check_sql": ""
}
],
"shard_rules": [
{
"db": "test",
"table": "example_table",
"type": "mycat_long",
"key": "id",
"locations": [
2,
2
],
"slices": [
"slice-0",
"slice-1"
],
"databases": [
"test_[0-3]"
],
"partition_count": "4",
"partition_length": "256"
}
],
"users": [
{
"user_name": "superroot",
"password": "superroot",
"namespace": "test_namespace",
"rw_flag": 2,
"rw_split": 1,
"other_property": 0
}
],
"default_slice": "slice-0",
"global_sequences": null,
"default_charset": "",
"default_collation": "",
"max_sql_execute_time": 0,
"max_sql_result_size": 0,
"max_client_connections": 100000,
"down_after_no_alive": 32,
"seconds_behind_master": 32,
"check_select_lock": false,
"support_multi_query": false,
"local_slave_read_priority": 0,
"set_for_keep_session": false,
"client_qps_limit": 0,
"allowed_session_variables": {}
}
--- MySQL 3349, db test_0
CREATE TABLE `example_table` (
`id` int(11) NOT NULL,
`a` varchar(50) NOT NULL,
`b` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--- MySQL 3349, db test_1
CREATE TABLE `example_table` (
`id` int(11) NOT NULL,
`a` varchar(50) NOT NULL,
`b` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--- MySQL 3379 ,db test_2
CREATE TABLE `example_table` (
`id` int(11) NOT NULL,
`a` varchar(50) NOT NULL,
`b` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--- MySQL 3379, db test_3
CREATE TABLE `example_table` (
`id` int(11) NOT NULL,
`a` varchar(50) NOT NULL,
`b` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--- 登录Gaea
> mysql -h127.0.0.1 -P13306 -usuperroot -psuperroot
> use test-- 清空旧数据
TRUNCATE TABLE example_table;
-- 插入数据
-- Group 'A': 这里的意图是总和较大(60),但如果分散在两个分片(例如30+30),单独检查 >40 可能会失败
INSERT INTO example_table (id, a, b) VALUES (1, 'A', 30);
INSERT INTO example_table (id, a, b) VALUES (2, 'A', 30);
-- Group 'B': 总数多(3条),但单条数值小(10)。用于测试 COUNT > 2
INSERT INTO example_table (id, a, b) VALUES (3, 'B', 10);
INSERT INTO example_table (id, a, b) VALUES (4, 'B', 10);
INSERT INTO example_table (id, a, b) VALUES (5, 'B', 10);
-- Group 'C': 数值极大(100),一条就满足条件。作为对照组
INSERT INTO example_table (id, a, b) VALUES (6, 'C', 100);
-- Group 'D': 总和很小(10),用于验证“不该出现的不要出现”
INSERT INTO example_table (id, a, b) VALUES (7, 'D', 5);
INSERT INTO example_table (id, a, b) VALUES (8, 'D', 5);
-- Group 'E': 混合情况,用于测试复杂逻辑 (SUM > 50 AND COUNT > 1)
INSERT INTO example_table (id, a, b) VALUES (9, 'E', 40);
INSERT INTO example_table (id, a, b) VALUES (10, 'E', 40);测试场景用例 6: 验证原生列名引用 SELECT a
FROM example_table
GROUP BY a
HAVING SUM(b) < 20;- 预期正确结果:
- D (10)
- B (30) -> 不满足
- 结果: 仅 D。
- 实际结果
mysql> SELECT a
-> FROM example_table
-> GROUP BY a
-> HAVING SUM(b) < 20;
Empty set (0.027 sec)存在问题当前的 handleHaving 实现,如果检测到 HAVING 包含聚合函数时,仅仅将 HAVING 子句从下推的 SQL (stmt.Having) 中移除,并保存到 p.havingExpr 中。但是没有确保 HAVING 中用到的聚合列(例如 SUM(b))会被一起发送给后端数据库的 SELECT 列中。 |
|
@WyattJia 完整的测试场景如下 普通测试用例 1: 验证 SUM 聚合 (最典型的分片错误场景)
SELECT a, SUM(b) as total_b
FROM example_table
GROUP BY a
HAVING total_b > 40;
用例 2: 验证 COUNT 聚合
SELECT a, COUNT(*) as cnt
FROM example_table
GROUP BY a
HAVING cnt > 2;
用例 3: 验证 复合逻辑 (AND)测试意图:同时检查 SUM 和 COUNT。需要 Gaea 在内存中计算好两个聚合值后,再进行逻辑判断。 SELECT a, SUM(b) as total_b, COUNT(*) as cnt
FROM example_table
GROUP BY a
HAVING total_b > 50 AND cnt >= 2;用例4:验证 MAX/MIN 过滤
SELECT a, MAX(b) as max_b
FROM example_table
GROUP BY a
HAVING max_b < 50;
别名测试用例 1: 验证别名引用 (Alias)测试意图:验证是否能正确识别 SELECT 列表中的别名。 SELECT a, SUM(b) as my_sum
FROM example_table
GROUP BY a
HAVING my_sum = 100;
用例 2: 验证原生列名引用 (Raw Column)测试意图:验证 HAVING 直接使用聚合函数表达式,而不是别名。 SELECT a
FROM example_table
GROUP BY a
HAVING SUM(b) < 20;
用例 3: 标准表别名 + 聚合函数过滤测试点:验证 SUM(t.b) 能否被正确识别和计算。 预期结果:A (60), C (100), E (80) SELECT t.a, SUM(t.b) as total_b
FROM example_table AS t
GROUP BY t.a
HAVING SUM(t.b) > 50;用例 4: 表别名 + COUNT(*) 过滤测试点:验证 COUNT(*) 在有表别名的情况下是否正常工作(COUNT 通常不带表别名,但查询上下文有别名)。 预期结果:B (3) SELECT t.a, COUNT(*) as cnt
FROM example_table t
GROUP BY t.a
HAVING cnt > 2;用例 5:隐式聚合列补全测试点:SELECT 列表中没有出现 HAVING 用到的聚合列,且使用了表别名。最考验 handleHaving 补列逻辑的场景。预期结果:D (10) SELECT t.a
FROM example_table t
GROUP BY t.a
HAVING SUM(t.b) < 20;用例 6:混合别名引用测试点:GROUP BY 使用列名,HAVING 使用列别名,FROM 使用表别名。验证别名解析能力。 预期结果:E (80) SELECT t.a as group_name, SUM(t.b) as total_val
FROM example_table t
GROUP BY t.a
HAVING total_val = 80;用例 7:复杂表达式 + 表别名测试点:验证 restoreExpr 在处理二元表达式时的稳定性。 预期结果:C (100) (100 > 90 且 1 > 0) SELECT t.a, SUM(t.b)
FROM example_table t
GROUP BY t.a
HAVING SUM(t.b) > 90 AND COUNT(t.id) > 0; |
|
@gongna-au 感谢百忙之中抽出时间做出这么详细的测试 已更新 HAVING 处理,覆盖你提的两点:
新增测试: 验证:
|
|
@WyattJia 早上好~ 发现还有几个复杂测试有点小问题🥺 我发现难点主要有2方面 # Gaea行为
mysql> SELECT t.a, SUM(t.b) as total_b FROM example_table AS t GROUP BY t.a HAVING SUM(t.b) > 50;
ERROR 1105 (HY000): merge select result error: filter having result error: evaluate having expression error: aggregate function SUM not found in result
# MySQL 行为
mysql> SELECT t.a, SUM(t.b) as total_b FROM example_table AS t GROUP BY t.a HAVING SUM(t.b) > 50;
Empty set (0.018 sec)
mysql>如何准确识别是否需要拦截 HAVING?问题:用户可能使用多层别名,例如 SELECT count(id) as a, a+1 as b ... HAVING b > 10。简单的 AST 遍历无法发现 b 包含聚合函数。因此不仅需要检查 expr 是否为聚合函数,还要递归检查 expr 是否为指向聚合函数的别名? Having 中的表别名与后端返回列名不一致问题:SELECT t.a, SUM(t.b) FROM t ... HAVING SUM(t.b) > 10。 后端数据库可能返回列名为 SUM(b) 或 SUM(t.b)(取决于具体数据库实现)。很难通过字符串匹配准确找到对应的列 |
When executing aggregate queries with HAVING clause across multiple shards, the HAVING condition was pushed down to each shard directly. This caused incorrect results because aggregate functions (SUM, COUNT, etc.) need to be computed globally after merging results from all shards.
Changes:
Fixes #281