You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Improve PostgreSQL external table documentation
- Add comprehensive use case descriptions with bullet points
- Restructure settings section with clear categories and descriptions
- Add examples for config_file and named_collection settings
- Include on_conflict setting for upsert operations with example
- Improve code formatting and spacing throughout
- Add informative notes about schema caching and reloading
- Remove outdated Aiven example and enhance local PostgreSQL example
- Better organize batching settings section
* fix
Timeplus can read or write PostgreSQL tables directly. This unlocks a set of new use cases, such as
6
6
7
-
- Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote PostgreSQL for further analysis or visualization.
8
-
-Enrich the live data with the static or slow-changing data in PostgreSQL. Apply streaming JOIN.
9
-
- Use Timeplus to query historical or recent data in PostgreSQL.
7
+
-**Stream Processing**: Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformations or stateful aggregations, then write the processed data to the local or remote PostgreSQL for further analysis or visualization.
8
+
-**Data Enrichment**: Enrich live streaming data with the static or slow-changing data from PostgreSQL using streaming JOINs.
9
+
-**Unified Analytics**: Use Timeplus to query historical or recent data in PostgreSQL alongside your streaming data for comprehensive analytics.
10
10
11
-
This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. However, since the data in PostgreSQL is in the form of table, not data stream, so we call this as External Table. Currently, we support S3, MySQL, PostgreSQL and ClickHouse. In the roadmap, we will support more integration by introducing other types of External Table.
11
+
This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. They are called as "External Table" since the data in PostgreSQL is structured as table rather than stream.
12
12
13
13
## Create PostgreSQL External Table
14
14
15
15
```sql
16
-
CREATE EXTERNAL TABLE name
17
-
SETTINGS type='postgresql',
18
-
address='host:port',
19
-
user='..',
20
-
password='..',
21
-
database='..',
22
-
secure=true|false,
23
-
config_file='..',
24
-
table='..',
25
-
pooled_connections=16; -- optional, the maximum pooled connections to the database. Default 16.
16
+
CREATE EXTERNAL TABLE table_name
17
+
SETTINGS
18
+
type='postgresql',
19
+
address='host:port',
20
+
[ user='..', ]
21
+
[ password='..', ]
22
+
[ database='..', ]
23
+
[ table='..', ]
24
+
[ schema='..', ]
25
+
[ on_conflict='..', ]
26
+
[ pooled_connections=16, ]
27
+
[ config_file='..', ]
28
+
[ named_collection='..' ]
29
+
```
30
+
31
+
### Required Settings
32
+
33
+
-**type** (string) - Must be set to `'postgresql'`
34
+
-**address** (string) - PostgreSQL server address in format `'host:port'`
35
+
36
+
### Database Settings
37
+
38
+
-**user** (string, default: `'default'`) - Username for PostgreSQL authentication.
39
+
-**password** (string, default: `''`) - Password for PostgreSQL authentication.
-**named_collection** (string, default: `''`) - Name of pre-defined named collection configuration
50
+
51
+
The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a configuration file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the PostgreSQL user and password in the file.
52
+
53
+
Example configuration file content:
54
+
55
+
```ini
56
+
address=localhost:5432
57
+
user=postgres
58
+
password=secret123
59
+
database=production
26
60
```
27
61
28
-
The required settings are type and address. For other settings, the default values are
62
+
The `named_collection` setting is available since Timeplus Enterprise 3.0. Similar with `config_file`, you can specify the name of a pre-defined named collection which contains the configuration settings.
29
63
30
-
- 'default' for `user`
31
-
- '' (empty string) for `password`
32
-
- 'default' for `database`
33
-
- 'false' for `secure`
34
-
- If you omit the table name, it will use the name of the external table
64
+
Example named collection definition:
35
65
36
-
The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the PostgreSQL user and password in the file.
66
+
```sql
67
+
CREATE NAMED COLLECTION
68
+
pg_config
69
+
AS
70
+
address='localhost:5432',
71
+
user='postgres',
72
+
password='secret123',
73
+
database='production';
74
+
```
37
75
38
-
Please follow the example in [Kafka External Stream](/kafka-source#config_file).
76
+
### Columns Definition
39
77
40
-
You don't need to specify the columns, since the table schema will be fetched from the PostgreSQL server.
78
+
You don't need to specify the columns in external table DDL, since the table schema will be fetched from the PostgreSQL server.
41
79
42
-
Once the external table is created successfully, you can run the following SQL to list the columns:
80
+
You can run the following SQL to list the columns after the external table is created:
43
81
44
82
```sql
45
-
DESCRIBE name
83
+
DESCRIBE table_name;
46
84
```
47
85
48
86
:::info
@@ -51,31 +89,41 @@ The data types in the output will be Timeplus data types, such as `uint8`, inste
51
89
52
90
:::
53
91
54
-
You can define the external table and use it to read data from the PostgreSQL table, or write to it.
92
+
:::info
93
+
94
+
Timeplus fetches and caches the PostgreSQL table schema when the external table is attached. When the remote PostgreSQL table schema changes (e.g., adding columns, changing data types, dropping columns), you must **restart** to reload the updated schema.
95
+
96
+
:::
55
97
56
-
## Connect to a local PostgreSQL {#local}
98
+
## Connect to a local PostgreSQL (example) {#local}
57
99
58
100
You can use the following command to start a local PostgreSQL via Docker:
101
+
59
102
```bash
60
103
docker run --name=postgres --rm --env=POSTGRES_PASSWORD=foo -p 5432:5432 postgres:latest -c log_statement=all
61
104
```
62
105
63
106
Then open a new terminal and run the following command to connect to the PostgreSQL server:
In Timeplus, you can create an external table to read data from the PostgreSQL table:
126
+
79
127
```sql
80
128
CREATE EXTERNAL TABLE pg_local
81
129
SETTINGS type='postgresql',
@@ -85,31 +133,18 @@ SETTINGS type='postgresql',
85
133
password='foo',
86
134
table='dim_products';
87
135
```
88
-
Then query the table:
89
-
```sql
90
-
SELECT*FROM pg_local;
91
-
```
92
-
93
-
## Connect to Aiven for PostgreSQL {#aiven}
94
136
95
-
Example SQL to connect to [Aiven for PostgreSQL](https://aiven.io/docs/products/postgresql/get-started):
137
+
Then query the table:
96
138
97
139
```sql
98
-
CREATE EXTERNAL TABLE postgres_aiven
99
-
SETTINGS type='postgresql',
100
-
address='abc.aivencloud.com:28851',
101
-
user='avnadmin',
102
-
password='..',
103
-
database='defaultdb',
104
-
secure=true,
105
-
table='events';
140
+
SELECT*FROM pg_local;
106
141
```
107
142
108
143
## Read data from PostgreSQL {#read}
109
144
110
145
Once the external table is created successfully, it means Timeplus can connect to the PostgreSQL server and fetch the table schema.
111
146
112
-
You can query it via the regular `select .. from table_name`.
147
+
You can query it via the regular `SELECT .. FROM table_name`.
113
148
114
149
:::warning
115
150
@@ -132,13 +167,14 @@ You can run regular `INSERT INTO` to add data to PostgreSQL table, such as:
132
167
```sql
133
168
INSERT INTO pg_local (product_id, price) VALUES ('10', 90.99), ('20', 199.99);
134
169
```
170
+
135
171
:::info
136
172
Please note, since the `price` column is in `float8` type, in Timeplus, you need to insert via `90.99`, instead of a string `"90.99"` as in PostgreSQL INSERT command.
137
173
:::
138
174
139
175
However it's more common to use a Materialized View in Timeplus to send the streaming SQL results to PostgreSQL.
140
176
141
-
Say you have created an external table `pg_table`. You can create a materialized view to read Kafka data(via an external stream) and transform/aggregate the data and send to the external table:
177
+
Say you have created an external table `pg_table`. You can create a materialized view to read Kafka data(via [Kafka External Stream](/kafka-source)) and transform/aggregate the data and send to the external table:
142
178
143
179
```sql
144
180
-- setup the ETL pipeline via a materialized view
@@ -150,17 +186,35 @@ CREATE MATERIALIZED VIEW mv INTO pg_table AS
150
186
FROM kafka_events;
151
187
```
152
188
189
+
You may use `on_conflict` to upsert data instead of insert.
190
+
191
+
```sql
192
+
CREATE EXTERNAL TABLE pg_local_upsert
193
+
SETTINGS type='postgresql',
194
+
address='localhost:5432',
195
+
database='postgres',
196
+
user='postgres',
197
+
password='foo',
198
+
table='dim_products',
199
+
on_conflict='ON CONFLICT (product_id) DO UPDATE SET price = EXCLUDED.price';
200
+
201
+
-- Update price of product_id=1 to 9.99
202
+
INSERT INTO pg_local (product_id, price) VALUES ('1', 9.99);
203
+
```
204
+
153
205
### Batching Settings
206
+
154
207
In Timeplus Enterprise, additional performance tuning settings are available, such as
*`max_insert_block_size` - The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409
162
-
*`max_insert_block_bytes` - The maximum size in bytes of block for insertion. Default value: 1 MiB.
163
-
*`insert_block_timeout_ms` - The maximum time in milliseconds for constructing a block(a block) for insertion. Increasing the value gives greater possibility to create bigger blocks (limited by `max_insert_block_bytes` and `max_insert_block_size`), but also increases latency. Negative numbers means no timeout. Default value: 500.
215
+
-`max_insert_block_size` - The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409
216
+
-`max_insert_block_bytes` - The maximum size in bytes of block for insertion. Default value: 1 MiB.
217
+
-`insert_block_timeout_ms` - The maximum time in milliseconds for constructing a block for insertion. Increasing the value gives greater possibility to create bigger blocks (limited by `max_insert_block_bytes` and `max_insert_block_size`), but also increases latency. Negative numbers means no timeout. Default value: 500.
0 commit comments