Skip to content

RFC: Support SELECT on SUBSCRIPTION#88

Open
hzxa21 wants to merge 1 commit intomainfrom
patrick/select-from-sub
Open

RFC: Support SELECT on SUBSCRIPTION#88
hzxa21 wants to merge 1 commit intomainfrom
patrick/select-from-sub

Conversation

@hzxa21
Copy link
Copy Markdown

@hzxa21 hzxa21 commented May 9, 2024

Extend the subscription concept to support:

  • Batch select on subscription to inspect the snapshot of the changelog of a MV/Table
  • Streaming query on subscription to facilitate alerting and changelog sink use cases.

Preview

- User can create sink/mv on the subscription to get the changelog stream of the MV/Table the subscription is created upon.
- `retention` is optional for subscription eligble for streaming query.
- When `retention` is set, the corresponding log store will be created and the subscription is eligble for cursor query, batch select query, and streaming query.
- When `retention` is not set, no log store will be created for the subscription and the subscription is only eligble for streaming query.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: should we have a default retention like 10mins when retention is not set for convenience to make all subscriptions eligible for cursor query, batch select query and streaming query?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial design, retention is a mandatory parameter to ensure that users are aware of the data cleaning.
I still believe that it's a right design choice.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "create sink/mv from subscription" use case, we don't necessarily need to buffer the changelog data under global checkpoint. The reason why I make retention optional in this use case is because we can skip creating the log store for subscription and in this case there is no data to clean.

```SQL
CREATE TABLE t (pk int PRIMARY KEY, v int);

CREATE SUBSCRIPTION sub FROM t;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we persist the created subscription to catalog?

It seems wired to persist it to catalog. Users were intended to only create the sink, and the subscription created here is only a sugar to express the semantic of change log, which is more like a temporary thing. If we persist it, users will have to first inspect whether there are some created subscription, and then create the subscription, and then create the sink, and if not, users can just create the subscription without any concern, and then create the sink.

If we won't persist it, maybe we can make the subscription created as a per-session temporary object, or express subscription as CTE, like WITH sub AS SUBSCRIPTION from t create materialized view as select * from sub

Copy link
Copy Markdown

@neverchanje neverchanje May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have already designed subscription semantically as a persistent object. So for now, it must be visible accross multiple sessions.

But I agree with you that we can specifically introduce a per-session, "temporary subscription" for this user case to address the inconvenience, like:

CREATE TABLE t (pk int PRIMARY KEY, v int);

--- `sub` will be consolidated into `t_changelog_sink` to avoid persisiting it in catalog.
CREATE TEMPORARY SUBSCRIPTION sub FROM t;

CREATE SINK t_changelog_sink
AS SELECT t.pk as pk, t.v as v, CASE WHEN t.op = 2 or t.op = 4 THEN TRUE ELSE FALSE END as is_delete FROM sub;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we should need an execute to convert the op column of streamchunk into a changelog, so persistence is only a syntactic difference (do users need to create subscriptions)?

distinct on window_start,
window_start,
FROM windowed_result_sub
WHERE error_count > all_count * 0.9 AND op = 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic: maybe we can introduce a static variable, e.g, CHANGE_INSERT with the value 1 to avoid using magic constants everywhere.

Copy link
Copy Markdown

@neverchanje neverchanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot find a fault in this proposal.
Overall, I think it widens the scope of our subscription ability.
I just want to explore more use cases that can utilize this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants