Problem
We seem to have encountered a bug during the handoff between query rules and fast routing rules. The gist is that when no rule is matched, but if the last rule in the chain (matching or not) has apply=1, then the fast routing rules are skipped entirely. This seems to be caused by a scope leak in Query_Processor<QP_DERIVED>::process_query.
Logically stepping through the code, this becomes a bit more apparent if you consider a set of rules wherein there are exactly 0 matches but the last rule (i.e. the one with the largest rule_id) in the chain has apply=1. In this case, the last rule is still bound to qr when the match loop exists, as qr isn't cleared when no match is found. When it is subsequently checked in __exit_process_mysql_query, despite not matching, qr->apply == false bypasses the fast routing rules.
The expected behaviour when no rules match is that fast routing rules continue to work as usual.
We've reproduced this in 3.0.3 and 3.0.7, though, I do not know when it was introduced.
Steps to reproduce
A fairly simple test case can be set-up here with a configuration that uses 1 replication hostgroup. In this test, the user has an unroutable default hostgroup of 0, and is expected to be routed to hostgroup 100 using a fast routing rule:
-- proxysql: create a new replication hostgroup;
insert into mysql_replication_hostgroups(writer_hostgroup, reader_hostgroup) values(100, 101);
-- proxysql: create the new user; default hostgroup doesn't matter.
insert into mysql_users(username, password) values('rule_test', '<password>');
load mysql users to runtime;
-- proxysql: create a new fast routing rule to hg=100 for "test"
insert into mysql_query_rules_fast_routing values('rule_test', 'test', 0, 100, '');
load mysql query rules to runtime;
At this point, connections from the rule_test user are redirected to hostgroup 100 as expected:
> mysql -h 127.0.0.1 --port 26033 -u rule_test -p --password='<password>' -e 'select @@report_host' test
+---------------+
| @@report_host |
+---------------+
| devdb001 |
+---------------+
To trigger this bug, create a non-matching query rule with apply=1:
-- proxysql
insert into mysql_query_rules(rule_id, active, flagIN, apply) values(100000, 1, 99999, 1);
load mysql query rules to runtime;
In this case, flagIN is set to 99999 to prevent a match; however, any other type of non-match will trigger it, too:
> mysql -h 127.0.0.1 --port 26033 -u rule_test -p --password='<password>' -e 'select @@report_host' test
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 9001 (HY000) at line 1: Max connect timeout reached while reaching hostgroup 0 after 10021ms
This can be worked-around somewhat safely by putting a rule with apply=0 at the end of the rule list:
-- proxysql: using 2147483647 ensures this is always the last rule.
insert into mysql_query_rules(rule_id, active, flagIN, apply) values(2147483647, 1, 99999, 0);
load mysql query rules to runtime;
> mysql -h 127.0.0.1 --port 26033 -u rule_test -p --password='<password>' -e 'select @@report_host' test
+---------------+
| @@report_host |
+---------------+
| devdb001 |
+---------------+
Problem
We seem to have encountered a bug during the handoff between query rules and fast routing rules. The gist is that when no rule is matched, but if the last rule in the chain (matching or not) has
apply=1, then the fast routing rules are skipped entirely. This seems to be caused by a scope leak inQuery_Processor<QP_DERIVED>::process_query.Logically stepping through the code, this becomes a bit more apparent if you consider a set of rules wherein there are exactly 0 matches but the last rule (i.e. the one with the largest
rule_id) in the chain hasapply=1. In this case, the last rule is still bound toqrwhen the match loop exists, asqrisn't cleared when no match is found. When it is subsequently checked in__exit_process_mysql_query, despite not matching,qr->apply == falsebypasses the fast routing rules.The expected behaviour when no rules match is that fast routing rules continue to work as usual.
We've reproduced this in 3.0.3 and 3.0.7, though, I do not know when it was introduced.
Steps to reproduce
A fairly simple test case can be set-up here with a configuration that uses 1 replication hostgroup. In this test, the user has an unroutable default hostgroup of
0, and is expected to be routed to hostgroup100using a fast routing rule:At this point, connections from the
rule_testuser are redirected to hostgroup100as expected:To trigger this bug, create a non-matching query rule with
apply=1:In this case,
flagINis set to99999to prevent a match; however, any other type of non-match will trigger it, too:This can be worked-around somewhat safely by putting a rule with
apply=0at the end of the rule list: