Skip to content

Commit d55b158

Browse files
authored
polish architecture (#374)
1 parent cf4612e commit d55b158

File tree

2 files changed

+51
-35
lines changed

2 files changed

+51
-35
lines changed

docs/architecture.md

Lines changed: 50 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,76 @@
11
# Architecture
22

3-
## High Level Architecture
3+
## Overview
44

5-
The following diagram depicts the high level components of Timeplus core engine.
5+
The diagram below illustrates the high-level components of the Timeplus core engine. The following sections explain how these components work together as a unified system.
66

77
![Architecture](/img/proton-high-level-arch.gif)
88

9-
### The Flow of Data
9+
## Data Flow
1010

11-
#### Ingest
11+
### Ingest
1212

13-
When data is ingested into Timeplus, it first lands in the NativeLog. As soon as the log commit completes, the data becomes immediately available for streaming queries.
13+
When data is ingested into Timeplus, it first lands in the **NativeLog**. As soon as the log commit completes, the data becomes instantly available for streaming queries.
1414

15-
In the background, dedicated threads continuously tail new entries from the NativeLog and flush them to the Historical Store in larger, optimized batches.
15+
In the background, dedicated threads continuously tail new entries from the NativeLog and flush them into the **Historical Store** in optimized, larger batches.
1616

17-
#### Query
17+
### Query
1818

19-
Timeplus supports three query modes: **historical**, **streaming**, and **hybrid (streaming + historical)**.
19+
Timeplus supports three query models: **historical**, **streaming**, and **hybrid (streaming + historical)**.
2020

21-
- **Historical Query (a.k.a. Table Query)**
21+
- **Historical Query (Table Query)**
22+
Works like a traditional database query. Data is read directly from the **Historical Store**, leveraging standard database optimizations for efficient lookups and scans:
23+
- Primary index
24+
- Skipping index
25+
- Secondary index
26+
- Bloom filter
27+
- Partition pruning
2228

23-
Works like a traditional database query. Data is fetched directly from the **Historical Store**, and all standard database optimizations like the following apply. These optimizations accelerate large-scale scans and point lookups, making historical queries fast and efficient.
24-
- Primary index
25-
- Skipping index
26-
- Secondary index
27-
- Bloom filter
28-
- Partition pruning
29+
- **Streaming Query**
30+
Operates on the **NativeLog**, where records are strictly ordered. Queries run incrementally, enabling real-time workloads such as **incremental ETL**, **joins**, and **aggregations**.
2931

30-
- **Streaming Query**
32+
- **Hybrid Query**
33+
Combines the best of both worlds. A streaming query can automatically **backfill** from the Historical Store when:
34+
1. Data has expired from the NativeLog (due to retention).
35+
2. Reading from the Historical Store is faster than rewinding and replaying from the NativeLog.
3136

32-
Operates on the **NativeLog**, which stores records in sequence. Queries run incrementally, enabling real-time processing patterns such as **incremental ETL**, **joins**, and **aggregations**.
37+
This eliminates the need for an external batch system, avoiding the extra **latency, inconsistency, and cost** usually associated with maintaining separate batch and streaming pipelines.
3338

34-
- **Hybrid Query**
39+
## Dural Storage
3540

36-
Streaming queries can automatically **backfill** from the Historical Store when:
37-
1. Data no longer exists in the NativeLog (due to retention policies).
38-
2. Pulling from the Historical Store is faster than rewinding the NativeLog to replay old events.
41+
### NativeLog
3942

40-
This allows seamless handling of scenarios like **fast backfill** and **mixed real-time + historical analysis** without breaking query continuity and also don't need yet another external batch system to load the historical data which usually introduce worse latency, inconsitency and cost.
41-
42-
### The Dural Storage
43-
44-
#### NativeLog
45-
46-
The **Timeplus NativeLog** is the system’s write-ahead log (WAL) or journal: an append-only, high-throughput store optimized for low-latency, highly concurrent data ingestion. In a cluster deployment, it is replicated using **Multi-Raft** for fault tolerance. By enforcing a strict ordering of records, NativeLog forms the backbone of streaming processing in **Timeplus Core**.
43+
The **Timeplus NativeLog** is the system’s write-ahead log (WAL) or journal: an append-only, high-throughput store optimized for low-latency, highly concurrent data ingestion. In a cluster deployment, it is replicated using **Multi-Raft** for fault tolerance. By enforcing a strict ordering of records, NativeLog forms the backbone of streaming processing in **Timeplus Core**, it is also the building block of other internal components like the repliated meta store in Timeplus.
4744

4845
NativeLog uses its own record format, consisting of two high-level types:
4946

5047
- **Control records** (a.k.a. meta records) – store metadata and operational information.
5148
- **Data records** – columnar-encoded for fast serialization/deserialization and efficient vectorized streaming execution.
5249

53-
Each record is assigned a monotonically increasing sequence numbersimilar to a Kafka offsetwhich guarantees ordering.
50+
Each record is assigned a monotonically increasing sequence numbersimilar to a Kafka offsetwhich guarantees ordering.
5451

55-
Lightweight indexes are maintained to support rapid rewind and replay operations by **timestamp** or **sequence number** in streaming queries.
52+
Lightweight indexes are maintained to support rapid rewind and replay operations by **timestamp** or **sequence number** for streaming queries.
5653

57-
#### Historical Store
54+
### Historical Store
5855

5956
The **Historical Store** in Timeplus stores data **derived** from the **NativeLog**. It powers use cases such as:
6057

6158
- **Historical queries** (a.k.a. *table queries* in Timeplus)
6259
- **Fast backfill** into streaming queries
63-
- Acting as a **serving layer** for downstream applications
60+
- Acting as a **serving layer** for applications
6461

6562
Timeplus supports two storage encodings for the Historical Store: **columnar** and **row**.
6663

67-
##### 1. Columnar Encoding (*Append Stream*)
68-
Optimized for **append-most workloads** with minimal data mutation, such as telemetry or event logs. Benefits include:
64+
#### 1. Columnar Encoding (*Append Stream*)
65+
Optimized for **append-most workloads** with minimal data mutation, such as telemetry or events, logs, metrics etc. Benefits include:
6966

7067
- High data compression ratios
7168
- Blazing-fast scans for analytical workloads
7269
- Backed by the **ClickHouse MergeTree** engine
7370

7471
This format is ideal when the dataset is largely immutable and query speed over large volumes is a priority.
7572

76-
##### 2. Row Encoding (*Mutable Stream*)
73+
#### 2. Row Encoding (*Mutable Stream*)
7774
Designed for **frequently updated datasets** where `UPSERT` and `DELETE` operations are common. Features include:
7875

7976
- Per-row **primary indexes**
@@ -82,6 +79,24 @@ Designed for **frequently updated datasets** where `UPSERT` and `DELETE` operati
8279

8380
Row encoding is the better choice when low-latency, high-frequency updates are required.
8481

82+
## External Storage
83+
84+
Timeplus natively connects to external storage systems through **External Streams** and **External Tables**, giving you flexibility in how data flows in and out of the platform.
85+
86+
- **Ingest from External Systems**
87+
Stream data directly from Kafka, Redpanda, or Pulsar into Timeplus. Use **Materialized Views** for incremental processing (e.g., ETL, filtering, joins, aggregations).
88+
89+
- **Send Data to External Systems**
90+
Push processed results downstream to systems like **ClickHouse** for analytics or long-term storage.
91+
92+
- **Keep Data Inside Timeplus**
93+
Store **Materialized View outputs** in Timeplus itself to serve client queries with low latency.
94+
95+
- **Raw Data Pipelines**
96+
Ingest and persist raw data in Timeplus, then build end-to-end pipelines for **filtering, transforming, and shaping** the data—serving both **real-time** and **historical** queries from a single platform.
97+
98+
This flexible integration model lets you decide whether Timeplus acts as a **processing engine**, a **serving layer**, or the **primary data hub** in your stack.
99+
85100
## References
86101

87102
[How Timeplus Unifies Streaming and Historical Data Processing](https://www.timeplus.com/post/unify-streaming-and-historical-data-processing)

docs/cluster.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Cluster

0 commit comments

Comments
 (0)