Skip to content

Commit 70f118f

Browse files
authored
refine mysql doc (#550)
1 parent 2a58b8c commit 70f118f

File tree

1 file changed

+80
-28
lines changed

1 file changed

+80
-28
lines changed

docs/mysql-external-table.md

Lines changed: 80 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,88 @@
11
# MySQL External Table
22

3-
## Overview
3+
## Overview
44

55
Timeplus can read or write MySQL tables directly. This unlocks a set of new use cases, such as
66

7-
- Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote MySQL for further analysis or visualization.
8-
- Enrich the live data with the static or slow-changing data in MySQL. Apply streaming JOIN.
9-
- Use Timeplus to query historical or recent data in MySQL.
7+
- **Stream Processing**: Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformations or stateful aggregations, then write the processed data to the local or remote MySQL for further analysis or visualization.
8+
- **Data Enrichment**: Enrich live streaming data with the static or slow-changing data from MySQL using streaming JOINs.
9+
- **Unified Analytics**: Use Timeplus to query historical or recent data in MySQL alongside your streaming data for comprehensive analytics.
1010

11-
This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. However, since the data in MySQL is in the form of table, not data stream, so we call this as External Table. Currently, we support MySQL and ClickHouse. In the roadmap, we will support more integration by introducing other types of External Table.
11+
This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. They are called as "External Table" since the data in MySQL is structured as table rather than stream.
12+
13+
The implementation is built on top of StorageMySQL with connection pooling and failover support.
1214

1315
## Create MySQL External Table
1416

1517
```sql
16-
CREATE EXTERNAL TABLE name
17-
SETTINGS type='mysql',
18-
address='host:port',
19-
user='..',
20-
password='..',
21-
database='..',
22-
config_file='..',
23-
table='..',
24-
replace_query=false, -- optional, if it is ture, use REPLACE INTO instead of INSERT INTO
25-
on_duplicate_clause='..', -- optinal, set the expression for ON DUPLICATE KEY
26-
pooled_connections=16; -- optional, the maximum pooled connections to the database. Default 16.
18+
CREATE EXTERNAL TABLE
19+
table_name
20+
SETTINGS
21+
type='mysql',
22+
address='host:port',
23+
[ database='..', ]
24+
[ table='..', ]
25+
[ user='..', ]
26+
[ password='..', ]
27+
[ replace_query=false, ]
28+
[ on_duplicate_clause='..', ]
29+
[ pooled_connections=16, ]
30+
[ config_file='..', ]
31+
[ named_collection='..' ]
2732
```
2833

29-
The required settings are type and address. For other settings, the default values are
34+
### Required Settings
35+
36+
- **type** (string) - Must be set to `'mysql'`
37+
- **address** (string) - MySQL server address in format `'host:port'`. Default port is 3306
38+
39+
### Database Settings
40+
**user** (string, default: `'default'`) - MySQL username.
41+
- **password** (string, default: `''`) - MySQL password.
42+
- **database** (string, default: `'default'`) - MySQL database name.
43+
- **table** (string, default: external table name) - Remote MySQL table name. If omitted, uses the external table name.
44+
- **replace_query** (bool, default: `false`) - Flag that converts `INSERT INTO` queries to `REPLACE INTO`. If `true`, the query is executed as `INSERT INTO`. If `false`, the query is executed as `REPLACE INTO`.
45+
- **on_duplicate_clause** (string, default: `''`) - The `ON DUPLICATE KEY on_duplicate_clause` expression that is added to the `INSERT` query. Can be specified only with `replace_query=false`. Example: `UPDATE c=c+1`. See the [MySQL documentation](https://dev.mysql.com/doc/refman/8.4/en/insert-on-duplicate.html) to find which on_duplicate_clause you can use with the ON DUPLICATE KEY clause.
46+
- **pooled_connections** (uint64, default: `16`) - Maximum pooled TCP connections.
47+
48+
### Configuration Management Settings
3049

31-
- 'default' for `user`
32-
- '' (empty string) for `password`
33-
- 'default' for `database`
34-
- If you omit the table name, it will use the name of the external table
50+
- **config_file** (string, default: `''`) - Path to configuration file containing `key=value` pairs
51+
- **named_collection** (string, default: `''`) - Name of pre-defined named collection configuration
3552

36-
The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the MySQL user and password in the file.
53+
The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a configuration file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the MySQL user and password in the file.
3754

38-
Please follow the example in [Kafka External Stream](/kafka-source#config_file).
55+
Example configuration file content:
3956

40-
You don't need to specify the columns, since the table schema will be fetched from the MySQL server.
57+
```ini
58+
address=localhost:3306
59+
user=root
60+
password=secret123
61+
database=production
62+
```
63+
64+
The `named_collection` setting is available since Timeplus Enterprise 3.0. Similar with `config_file`, you can specify the name of a pre-defined named collection which contains the configuration settings.
65+
66+
Example named collection definition:
67+
68+
```sql
69+
CREATE NAMED COLLECTION
70+
mysql_config
71+
AS
72+
address='localhost:3306',
73+
user='root',
74+
password='secret123',
75+
database='production';
76+
```
77+
78+
### Columns Definition
79+
80+
You don't need to specify the columns in external table DDL, since the table schema will be fetched from the MySQL server.
4181

4282
Once the external table is created successfully, you can run the following SQL to list the columns:
4383

4484
```sql
45-
DESCRIBE name
85+
DESCRIBE table_name;
4686
```
4787

4888
:::info
@@ -51,6 +91,12 @@ The data types in the output will be Timeplus data types, such as `uint8`, inste
5191

5292
:::
5393

94+
:::info
95+
96+
Timeplus fetches and caches the MySQL table schema when the external table is attached. When the remote MySQL table schema changes (e.g., adding columns, changing data types, dropping columns), you must **restart** to reload the updated schema.
97+
98+
:::
99+
54100
You can define the external table and use it to read data from the MySQL table, or write to it.
55101

56102
## Connect to a local MySQL {#local}
@@ -83,8 +129,14 @@ Limitations:
83129
1. tumble/hop/session/table functions are not supported for External Table (coming soon)
84130
2. scalar or aggregation functions are performed by Timeplus, not the remote MySQL
85131
3. `LIMIT n` is performed by Timeplus, not the remote MySQL
132+
4. No query predicate pushdown to MySQL (planned for future versions)
86133

87134
## Write data to MySQL {#write}
135+
MySQL external tables support standard INSERT operations with the following behaviors:
136+
137+
- **Standard INSERT**: Uses `INSERT INTO` semantics
138+
- **Replace Mode**: When `replace_query=true`, uses `REPLACE INTO` instead
139+
- **On Duplicate Key**: Custom conflict resolution with `on_duplicate_clause`
88140

89141
You can run regular `INSERT INTO` to add data to MySQL table. However it's more common to use a Materialized View to send the streaming SQL results to MySQL.
90142

@@ -110,9 +162,9 @@ SELECT * FROM some_source_stream
110162
SETTINGS max_insert_block_size=10, max_insert_block_bytes=1024, insert_block_timeout_ms = 100;
111163
```
112164

113-
* `max_insert_block_size` - The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409
114-
* `max_insert_block_bytes` - The maximum size in bytes of block for insertion. Default value: 1 MiB.
115-
* `insert_block_timeout_ms` - The maximum time in milliseconds for constructing a block(a block) for insertion. Increasing the value gives greater possibility to create bigger blocks (limited by `max_insert_block_bytes` and `max_insert_block_size`), but also increases latency. Negative numbers means no timeout. Default value: 500.
165+
- `max_insert_block_size` - The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409
166+
- `max_insert_block_bytes` - The maximum size in bytes of block for insertion. Default value: 1 MiB.
167+
- `insert_block_timeout_ms` - The maximum time in milliseconds for constructing a block(a block) for insertion. Increasing the value gives greater possibility to create bigger blocks (limited by `max_insert_block_bytes` and `max_insert_block_size`), but also increases latency. Negative numbers means no timeout. Default value: 500.
116168

117169
## Supported data types {#datatype}
118170

0 commit comments

Comments
 (0)