Skip to content

fix: apply HAVING filter after cross-shard aggregation (#281)#297

Open
WyattJia wants to merge 2 commits intoXiaoMi:mainfrom
WyattJia:main
Open

fix: apply HAVING filter after cross-shard aggregation (#281)#297
WyattJia wants to merge 2 commits intoXiaoMi:mainfrom
WyattJia:main

Conversation

@WyattJia
Copy link

@WyattJia WyattJia commented Jan 7, 2026

When executing aggregate queries with HAVING clause across multiple shards, the HAVING condition was pushed down to each shard directly. This caused incorrect results because aggregate functions (SUM, COUNT, etc.) need to be computed globally after merging results from all shards.

Changes:

  • Add havingExpr field to SelectPlan to store original HAVING expression
  • Add containsAggregateFunc() to detect if HAVING contains aggregate functions
  • Only remove HAVING from pushed SQL when it contains aggregate functions
  • Add HavingFilter to evaluate HAVING conditions on merged results
  • Call filterHavingResult() in MergeSelectResult after aggregation

Fixes #281

When executing aggregate queries with HAVING clause across multiple shards,
the HAVING condition was pushed down to each shard directly. This caused
incorrect results because aggregate functions (SUM, COUNT, etc.) need to
be computed globally after merging results from all shards.

Changes:
- Add havingExpr field to SelectPlan to store original HAVING expression
- Add containsAggregateFunc() to detect if HAVING contains aggregate functions
- Only remove HAVING from pushed SQL when it contains aggregate functions
- Add HavingFilter to evaluate HAVING conditions on merged results
- Call filterHavingResult() in MergeSelectResult after aggregation

Fixes XiaoMi#281
@gongna-au
Copy link
Collaborator

@WyattJia 非常非常感谢您提交的这个修复补丁! 🙏 这个解决跨分片 HAVING 子句下推问题的方案非常专业。
当前进展✅:

  • 我们已将补丁集成到内部测试环境(Staging),正在执行:
  • 跨分片聚合查询的集成测试(尤其验证 COUNT/SUM 与 HAVING 的组合场景)
  • 全量 SQL 回归测试(确保原有查询不受影响)
  • 预计1月底完成验证,届时将同步测试结果。若过程中发现任何边界情况,我会第一时间与您讨论。
  • 测试验证后我们将合并此MR

再次感谢您对Gaea的关键贡献!!!期待未来继续合作! 💗✨

@gongna-au
Copy link
Collaborator

gongna-au commented Feb 4, 2026

@WyattJia hi I discovered a problem during the testing process.

复现环境

  • Gaea Namespace
{
    "open_general_log": true,
    "is_encrypt": true,
    "name": "test_namespace",
    "online": true,
    "read_only": false,
    "allowed_dbs": {
        "test": true
    },
    "default_phy_dbs": null,
    "slow_sql_time": "1000",
    "black_sql": [],
    "allowed_ip": null,
    "slices": [
        {
            "name": "slice-0",
            "user_name": "superroot",
            "password": "superroot",
            "master": "127.0.0.1:3349",
            "slaves": [],
            "statistic_slaves": [],
            "capacity": 1,
            "max_capacity": 1,
            "idle_timeout": 3600,
            "capability": 41479,
            "init_connect": "",
            "health_check_sql": ""
        },
        {
            "name": "slice-1",
            "user_name": "superroot",
            "password": "superroot",
            "master": "127.0.0.1:3379",
            "slaves": [],
            "statistic_slaves": [],
            "capacity": 1,
            "max_capacity": 1,
            "idle_timeout": 3600,
            "capability": 41479,
            "init_connect": "",
            "health_check_sql": ""
        }
    ],
    "shard_rules": [
        {
            "db": "test",
            "table": "example_table",
            "type": "mycat_long",
            "key": "id",
            "locations": [
                2,
                2
            ],
            "slices": [
                "slice-0",
                "slice-1"
            ],
            "databases": [
                "test_[0-3]"
            ],
            "partition_count": "4",
            "partition_length": "256"
        }
        
    ],
    "users": [
        {
            "user_name": "superroot",
            "password": "superroot",
            "namespace": "test_namespace",
            "rw_flag": 2,
            "rw_split": 1,
            "other_property": 0
        }
    ],
    "default_slice": "slice-0",
    "global_sequences": null,
    "default_charset": "",
    "default_collation": "",
    "max_sql_execute_time": 0,
    "max_sql_result_size": 0,
    "max_client_connections": 100000,
    "down_after_no_alive": 32,
    "seconds_behind_master": 32,
    "check_select_lock": false,
    "support_multi_query": false,
    "local_slave_read_priority": 0,
    "set_for_keep_session": false,
    "client_qps_limit": 0,
    "allowed_session_variables": {}
}
  • MySQL环境
--- MySQL 3349, db test_0

CREATE TABLE `example_table` (
  `id` int(11) NOT NULL,
  `a` varchar(50) NOT NULL,
  `b` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

--- MySQL 3349, db test_1
CREATE TABLE `example_table` (
  `id` int(11) NOT NULL,
  `a` varchar(50) NOT NULL,
  `b` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

--- MySQL 3379 ,db test_2
CREATE TABLE `example_table` (
  `id` int(11) NOT NULL,
  `a` varchar(50) NOT NULL,
  `b` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

--- MySQL 3379, db test_3
CREATE TABLE `example_table` (
  `id` int(11) NOT NULL,
  `a` varchar(50) NOT NULL,
  `b` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
  • 初始化SQL
--- 登录Gaea
> mysql -h127.0.0.1 -P13306 -usuperroot -psuperroot 

> use test
-- 清空旧数据
TRUNCATE TABLE example_table;

-- 插入数据
-- Group 'A': 这里的意图是总和较大(60),但如果分散在两个分片(例如30+30),单独检查 >40 可能会失败
INSERT INTO example_table (id, a, b) VALUES (1, 'A', 30);
INSERT INTO example_table (id, a, b) VALUES (2, 'A', 30);

-- Group 'B': 总数多(3条),但单条数值小(10)。用于测试 COUNT > 2
INSERT INTO example_table (id, a, b) VALUES (3, 'B', 10);
INSERT INTO example_table (id, a, b) VALUES (4, 'B', 10);
INSERT INTO example_table (id, a, b) VALUES (5, 'B', 10);

-- Group 'C': 数值极大(100),一条就满足条件。作为对照组
INSERT INTO example_table (id, a, b) VALUES (6, 'C', 100);

-- Group 'D': 总和很小(10),用于验证“不该出现的不要出现”
INSERT INTO example_table (id, a, b) VALUES (7, 'D', 5);
INSERT INTO example_table (id, a, b) VALUES (8, 'D', 5);

-- Group 'E': 混合情况,用于测试复杂逻辑 (SUM > 50 AND COUNT > 1)
INSERT INTO example_table (id, a, b) VALUES (9, 'E', 40);
INSERT INTO example_table (id, a, b) VALUES (10, 'E', 40);

测试场景

用例 6: 验证原生列名引用
测试意图:验证 HAVING 直接使用聚合函数表达式,而不是别名。

SELECT a 
FROM example_table 
GROUP BY a 
HAVING SUM(b) < 20;
- 预期正确结果:
  - D (10)
  - B (30) -> 不满足
  - 结果: 仅 D。

- 实际结果
mysql> SELECT a
    -> FROM example_table
    -> GROUP BY a
    -> HAVING SUM(b) < 20;
Empty set (0.027 sec)

存在问题

当前的 handleHaving 实现,如果检测到 HAVING 包含聚合函数时,仅仅将 HAVING 子句从下推的 SQL (stmt.Having) 中移除,并保存到 p.havingExpr 中。但是没有确保 HAVING 中用到的聚合列(例如 SUM(b))会被一起发送给后端数据库的 SELECT 列中

@gongna-au
Copy link
Collaborator

gongna-au commented Feb 4, 2026

@WyattJia 完整的测试场景如下

普通测试

用例 1: 验证 SUM 聚合 (最典型的分片错误场景)

  • 测试意图:Group A 总和为 60。假设分片规则将其拆分到两个节点(例如每个节点有 30),如果 HAVING SUM(b) > 40 直接下推,两个节点都会过滤掉 A(因为 30 < 40),导致最终结果丢失
SELECT a, SUM(b) as total_b 
FROM example_table 
GROUP BY a 
HAVING total_b > 40;
  • 预期正确结果:
    • A (60)
    • C (100)
    • E (80)

用例 2: 验证 COUNT 聚合

  • 测试意图:Group B 有 3 条记录。如果分片将其拆分为 1条 和 2条,且直接下推 HAVING COUNT(*) > 2,则两个分片都会返回空,导致 B 丢失。
SELECT a, COUNT(*) as cnt 
FROM example_table 
GROUP BY a 
HAVING cnt > 2;
  • 预期正确结果:
    • B (3)

用例 3: 验证 复合逻辑 (AND)

测试意图:同时检查 SUM 和 COUNT。需要 Gaea 在内存中计算好两个聚合值后,再进行逻辑判断。

SELECT a, SUM(b) as total_b, COUNT(*) as cnt
FROM example_table
GROUP BY a
HAVING total_b > 50 AND cnt >= 2;

用例4:验证 MAX/MIN 过滤

  • 测试意图:MAX/MIN 属于非累加聚合,但逻辑是一样的。如果我们要排除最大值小于 50 的组,分片下推可能导致错误保留了局部最大值满足条件的行(反向过滤场景较少见,但逻辑通用)。这里测试一个常规过滤。
SELECT a, MAX(b) as max_b
FROM example_table
GROUP BY a
HAVING max_b < 50;
  • 预期正确结果:
    • A (30)
    • B (10)
    • D (5)
    • E (40)
  • 注意: C (100) 应该被过滤掉。

别名测试

用例 1: 验证别名引用 (Alias)

测试意图:验证是否能正确识别 SELECT 列表中的别名。

SELECT a, SUM(b) as my_sum 
FROM example_table 
GROUP BY a 
HAVING my_sum = 100;
  • 预期正确结果:
    • C (100)

用例 2: 验证原生列名引用 (Raw Column)

测试意图:验证 HAVING 直接使用聚合函数表达式,而不是别名。

SELECT a 
FROM example_table 
GROUP BY a 
HAVING SUM(b) < 20;
  • 预期正确结果:
    • D (10)
    • B (30) -> 不满足
    • 结果: 仅 D。

用例 3: 标准表别名 + 聚合函数过滤

测试点:验证 SUM(t.b) 能否被正确识别和计算。 预期结果:A (60), C (100), E (80)

SELECT t.a, SUM(t.b) as total_b 
FROM example_table AS t 
GROUP BY t.a 
HAVING SUM(t.b) > 50;

用例 4: 表别名 + COUNT(*) 过滤

测试点:验证 COUNT(*) 在有表别名的情况下是否正常工作(COUNT 通常不带表别名,但查询上下文有别名)。 预期结果:B (3)

SELECT t.a, COUNT(*) as cnt 
FROM example_table t 
GROUP BY t.a 
HAVING cnt > 2;

用例 5:隐式聚合列补全

测试点:SELECT 列表中没有出现 HAVING 用到的聚合列,且使用了表别名。最考验 handleHaving 补列逻辑的场景。预期结果:D (10)

SELECT t.a 
FROM example_table t 
GROUP BY t.a 
HAVING SUM(t.b) < 20;

用例 6:混合别名引用

测试点:GROUP BY 使用列名,HAVING 使用列别名,FROM 使用表别名。验证别名解析能力。 预期结果:E (80)

SELECT t.a as group_name, SUM(t.b) as total_val 
FROM example_table t 
GROUP BY t.a 
HAVING total_val = 80;

用例 7:复杂表达式 + 表别名

测试点:验证 restoreExpr 在处理二元表达式时的稳定性。 预期结果:C (100) (100 > 90 且 1 > 0)

SELECT t.a, SUM(t.b) 
FROM example_table t 
GROUP BY t.a 
HAVING SUM(t.b) > 90 AND COUNT(t.id) > 0;

@WyattJia
Copy link
Author

WyattJia commented Feb 4, 2026

@gongna-au 感谢百忙之中抽出时间做出这么详细的测试

已更新 HAVING 处理,覆盖你提的两点:

  • 聚合别名识别:HAVING 中引用聚合别名时触发全局重算,移除下推 HAVING。
  • 聚合列补全:HAVING 里出现但 SELECT 未包含的聚合表达式自动补列(辅助字段),并注册聚合合并器,确保合并后可评估。

新增测试:

- TestHandleHaving_AggregateAliasTriggersGlobal
- TestHandleHaving_AddsAggregateFieldWhenMissing
- TestFilterHavingResult_SumExpr

验证:

- go test ./proxy/plan -v PASS

@gongna-au
Copy link
Collaborator

gongna-au commented Feb 5, 2026

@WyattJia 早上好~ 发现还有几个复杂测试有点小问题🥺 我发现难点主要有2方面

# Gaea行为
mysql> SELECT t.a, SUM(t.b) as total_b  FROM example_table AS t  GROUP BY t.a  HAVING SUM(t.b) > 50;
ERROR 1105 (HY000): merge select result error: filter having result error: evaluate having expression error: aggregate function SUM not found in result

# MySQL 行为
mysql> SELECT t.a, SUM(t.b) as total_b  FROM example_table AS t  GROUP BY t.a  HAVING SUM(t.b) > 50;
Empty set (0.018 sec)

mysql>

如何准确识别是否需要拦截 HAVING?

问题:用户可能使用多层别名,例如 SELECT count(id) as a, a+1 as b ... HAVING b > 10。简单的 AST 遍历无法发现 b 包含聚合函数。因此不仅需要检查 expr 是否为聚合函数,还要递归检查 expr 是否为指向聚合函数的别名?

Having 中的表别名与后端返回列名不一致

问题:SELECT t.a, SUM(t.b) FROM t ... HAVING SUM(t.b) > 10。 后端数据库可能返回列名为 SUM(b) 或 SUM(t.b)(取决于具体数据库实现)。很难通过字符串匹配准确找到对应的列

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

修复 HAVING 过滤不正确的问题

2 participants