Skip to content

Commit 76d50ff

Browse files
authored
Improve PostgreSQL external table documentation (#549)
* Improve PostgreSQL external table documentation - Add comprehensive use case descriptions with bullet points - Restructure settings section with clear categories and descriptions - Add examples for config_file and named_collection settings - Include on_conflict setting for upsert operations with example - Improve code formatting and spacing throughout - Add informative notes about schema caching and reloading - Remove outdated Aiven example and enhance local PostgreSQL example - Better organize batching settings section * fix
1 parent 70f118f commit 76d50ff

File tree

1 file changed

+102
-48
lines changed

1 file changed

+102
-48
lines changed

docs/pg-external-table.md

Lines changed: 102 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,86 @@
11
# PostgreSQL External Table
22

3-
## Overview
3+
## Overview
44

55
Timeplus can read or write PostgreSQL tables directly. This unlocks a set of new use cases, such as
66

7-
- Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote PostgreSQL for further analysis or visualization.
8-
- Enrich the live data with the static or slow-changing data in PostgreSQL. Apply streaming JOIN.
9-
- Use Timeplus to query historical or recent data in PostgreSQL.
7+
- **Stream Processing**: Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformations or stateful aggregations, then write the processed data to the local or remote PostgreSQL for further analysis or visualization.
8+
- **Data Enrichment**: Enrich live streaming data with the static or slow-changing data from PostgreSQL using streaming JOINs.
9+
- **Unified Analytics**: Use Timeplus to query historical or recent data in PostgreSQL alongside your streaming data for comprehensive analytics.
1010

11-
This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. However, since the data in PostgreSQL is in the form of table, not data stream, so we call this as External Table. Currently, we support S3, MySQL, PostgreSQL and ClickHouse. In the roadmap, we will support more integration by introducing other types of External Table.
11+
This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. They are called as "External Table" since the data in PostgreSQL is structured as table rather than stream.
1212

1313
## Create PostgreSQL External Table
1414

1515
```sql
16-
CREATE EXTERNAL TABLE name
17-
SETTINGS type='postgresql',
18-
address='host:port',
19-
user='..',
20-
password='..',
21-
database='..',
22-
secure=true|false,
23-
config_file='..',
24-
table='..',
25-
pooled_connections=16; -- optional, the maximum pooled connections to the database. Default 16.
16+
CREATE EXTERNAL TABLE table_name
17+
SETTINGS
18+
type='postgresql',
19+
address='host:port',
20+
[ user='..', ]
21+
[ password='..', ]
22+
[ database='..', ]
23+
[ table='..', ]
24+
[ schema='..', ]
25+
[ on_conflict='..', ]
26+
[ pooled_connections=16, ]
27+
[ config_file='..', ]
28+
[ named_collection='..' ]
29+
```
30+
31+
### Required Settings
32+
33+
- **type** (string) - Must be set to `'postgresql'`
34+
- **address** (string) - PostgreSQL server address in format `'host:port'`
35+
36+
### Database Settings
37+
38+
- **user** (string, default: `'default'`) - Username for PostgreSQL authentication.
39+
- **password** (string, default: `''`) - Password for PostgreSQL authentication.
40+
- **database** (string, default: `'default'`) - PostgreSQL database name.
41+
- **table** (string, default: external table name) - PostgreSQL table name. If you omit the table name, it will use the name of the external table.
42+
- **schema** (string, default: `''`) - Non-default table schema.
43+
- **on_conflict** (string, default: `''`) - Conflict resolution strategy for INSERT operations. Example: `ON CONFLICT DO NOTHING`, `ON CONFLICT <conflict_target> DO UPDATE SET column = EXCLUDED.column`, etc. ([PostgreSQL INSERT](https://www.postgresql.org/docs/current/sql-insert.html))
44+
- **pooled_connections** (uint64, default: `16`) - Connection pool size for PostgreSQL.
45+
46+
### Configuration Management Settings
47+
48+
- **config_file** (string, default: `''`) - Path to configuration file containing key=value pairs
49+
- **named_collection** (string, default: `''`) - Name of pre-defined named collection configuration
50+
51+
The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a configuration file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the PostgreSQL user and password in the file.
52+
53+
Example configuration file content:
54+
55+
```ini
56+
address=localhost:5432
57+
user=postgres
58+
password=secret123
59+
database=production
2660
```
2761

28-
The required settings are type and address. For other settings, the default values are
62+
The `named_collection` setting is available since Timeplus Enterprise 3.0. Similar with `config_file`, you can specify the name of a pre-defined named collection which contains the configuration settings.
2963

30-
- 'default' for `user`
31-
- '' (empty string) for `password`
32-
- 'default' for `database`
33-
- 'false' for `secure`
34-
- If you omit the table name, it will use the name of the external table
64+
Example named collection definition:
3565

36-
The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the PostgreSQL user and password in the file.
66+
```sql
67+
CREATE NAMED COLLECTION
68+
pg_config
69+
AS
70+
address='localhost:5432',
71+
user='postgres',
72+
password='secret123',
73+
database='production';
74+
```
3775

38-
Please follow the example in [Kafka External Stream](/kafka-source#config_file).
76+
### Columns Definition
3977

40-
You don't need to specify the columns, since the table schema will be fetched from the PostgreSQL server.
78+
You don't need to specify the columns in external table DDL, since the table schema will be fetched from the PostgreSQL server.
4179

42-
Once the external table is created successfully, you can run the following SQL to list the columns:
80+
You can run the following SQL to list the columns after the external table is created:
4381

4482
```sql
45-
DESCRIBE name
83+
DESCRIBE table_name;
4684
```
4785

4886
:::info
@@ -51,31 +89,41 @@ The data types in the output will be Timeplus data types, such as `uint8`, inste
5189

5290
:::
5391

54-
You can define the external table and use it to read data from the PostgreSQL table, or write to it.
92+
:::info
93+
94+
Timeplus fetches and caches the PostgreSQL table schema when the external table is attached. When the remote PostgreSQL table schema changes (e.g., adding columns, changing data types, dropping columns), you must **restart** to reload the updated schema.
95+
96+
:::
5597

56-
## Connect to a local PostgreSQL {#local}
98+
## Connect to a local PostgreSQL (example) {#local}
5799

58100
You can use the following command to start a local PostgreSQL via Docker:
101+
59102
```bash
60103
docker run --name=postgres --rm --env=POSTGRES_PASSWORD=foo -p 5432:5432 postgres:latest -c log_statement=all
61104
```
62105

63106
Then open a new terminal and run the following command to connect to the PostgreSQL server:
107+
64108
```bash
65109
psql -p 5432 -U postgres -h localhost
66110
```
111+
67112
Create a table and add some rows:
113+
68114
```sql
69115
-- Table Definition
70116
CREATE TABLE "public"."dim_products" (
71117
"product_id" varchar NOT NULL,
72118
"price" float8,
73119
PRIMARY KEY ("product_id")
74120
);
121+
75122
INSERT INTO "public"."dim_products" ("product_id", "price") VALUES ('1', '10.99'), ('2', '19.99'), ('3', '29.99');
76123
```
77124

78125
In Timeplus, you can create an external table to read data from the PostgreSQL table:
126+
79127
```sql
80128
CREATE EXTERNAL TABLE pg_local
81129
SETTINGS type='postgresql',
@@ -85,31 +133,18 @@ SETTINGS type='postgresql',
85133
password='foo',
86134
table='dim_products';
87135
```
88-
Then query the table:
89-
```sql
90-
SELECT * FROM pg_local;
91-
```
92-
93-
## Connect to Aiven for PostgreSQL {#aiven}
94136

95-
Example SQL to connect to [Aiven for PostgreSQL](https://aiven.io/docs/products/postgresql/get-started):
137+
Then query the table:
96138

97139
```sql
98-
CREATE EXTERNAL TABLE postgres_aiven
99-
SETTINGS type='postgresql',
100-
address='abc.aivencloud.com:28851',
101-
user='avnadmin',
102-
password='..',
103-
database='defaultdb',
104-
secure=true,
105-
table='events';
140+
SELECT * FROM pg_local;
106141
```
107142

108143
## Read data from PostgreSQL {#read}
109144

110145
Once the external table is created successfully, it means Timeplus can connect to the PostgreSQL server and fetch the table schema.
111146

112-
You can query it via the regular `select .. from table_name`.
147+
You can query it via the regular `SELECT .. FROM table_name`.
113148

114149
:::warning
115150

@@ -132,13 +167,14 @@ You can run regular `INSERT INTO` to add data to PostgreSQL table, such as:
132167
```sql
133168
INSERT INTO pg_local (product_id, price) VALUES ('10', 90.99), ('20', 199.99);
134169
```
170+
135171
:::info
136172
Please note, since the `price` column is in `float8` type, in Timeplus, you need to insert via `90.99`, instead of a string `"90.99"` as in PostgreSQL INSERT command.
137173
:::
138174

139175
However it's more common to use a Materialized View in Timeplus to send the streaming SQL results to PostgreSQL.
140176

141-
Say you have created an external table `pg_table`. You can create a materialized view to read Kafka data(via an external stream) and transform/aggregate the data and send to the external table:
177+
Say you have created an external table `pg_table`. You can create a materialized view to read Kafka data(via [Kafka External Stream](/kafka-source)) and transform/aggregate the data and send to the external table:
142178

143179
```sql
144180
-- setup the ETL pipeline via a materialized view
@@ -150,17 +186,35 @@ CREATE MATERIALIZED VIEW mv INTO pg_table AS
150186
FROM kafka_events;
151187
```
152188

189+
You may use `on_conflict` to upsert data instead of insert.
190+
191+
```sql
192+
CREATE EXTERNAL TABLE pg_local_upsert
193+
SETTINGS type='postgresql',
194+
address='localhost:5432',
195+
database='postgres',
196+
user='postgres',
197+
password='foo',
198+
table='dim_products',
199+
on_conflict='ON CONFLICT (product_id) DO UPDATE SET price = EXCLUDED.price';
200+
201+
-- Update price of product_id=1 to 9.99
202+
INSERT INTO pg_local (product_id, price) VALUES ('1', 9.99);
203+
```
204+
153205
### Batching Settings
206+
154207
In Timeplus Enterprise, additional performance tuning settings are available, such as
208+
155209
```sql
156210
INSERT INTO pg_table
157211
SELECT * FROM some_source_stream
158212
SETTINGS max_insert_block_size=10, max_insert_block_bytes=1024, insert_block_timeout_ms = 100;
159213
```
160214

161-
* `max_insert_block_size` - The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409
162-
* `max_insert_block_bytes` - The maximum size in bytes of block for insertion. Default value: 1 MiB.
163-
* `insert_block_timeout_ms` - The maximum time in milliseconds for constructing a block(a block) for insertion. Increasing the value gives greater possibility to create bigger blocks (limited by `max_insert_block_bytes` and `max_insert_block_size`), but also increases latency. Negative numbers means no timeout. Default value: 500.
215+
- `max_insert_block_size` - The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409
216+
- `max_insert_block_bytes` - The maximum size in bytes of block for insertion. Default value: 1 MiB.
217+
- `insert_block_timeout_ms` - The maximum time in milliseconds for constructing a block for insertion. Increasing the value gives greater possibility to create bigger blocks (limited by `max_insert_block_bytes` and `max_insert_block_size`), but also increases latency. Negative numbers means no timeout. Default value: 500.
164218

165219
## Supported data types {#datatype}
166220

0 commit comments

Comments
 (0)