New lab for multi-topic fanout to Iceberg tables by kbatuigas · Pull Request #279 · redpanda-data/redpanda-labs

kbatuigas · 2026-02-20T00:29:32Z

Demonstrates 1:N fanout using Wasm transforms to route batched events to multiple domain-specific Iceberg topics. The transform parses batch messages and fans out individual updates, showcasing Redpanda's in-broker stream processing capabilities.

Key Features:

Producer: Sends complete JSON batches to input topic (events)
Transform: Parses batches and performs 1:N fanout (1 batch → 3 individual messages)
- Fanout Logic: Transform extracts individual updates and routes to target topics
- Schema Validation: Avro validation via Schema Registry
Iceberg Integration: Automatic writes to queryable Iceberg tables
Analytics: Spark/Jupyter notebooks for querying routed data

Data Flow:
10 batch messages → Transform parses → 30 individual messages (10 each to orders, inventory, customers topics)

To test:

After cloning the repo (git clone https://github.com/redpanda-data/redpanda-labs.git), check out the feature branch:

cd redpanda-labs && git checkout iceberg-multi-topic-fanout-using-transforms

Change into the lab directory:

cd data-transforms/go/iceberg-fanout

Then continue with the rest of the lab ("Set environment variables for Redpanda and Console versions..." etc.)

Preview: https://deploy-preview-279--redpanda-labs-preview.netlify.app/redpanda-labs/data-transforms/iceberg-fanout-go/

Closes DOC-1965

Demonstrates routing batched events to multiple domain-specific Iceberg topics using Wasm transforms. Includes Go producer with franz-go, transform with header-based routing, JSON Schema validation, and Spark/Jupyter for querying Iceberg tables. Known issue: Iceberg tables not being created. Redpanda logs show "type_resolver::errc::registry_error" when trying to write to Iceberg topics configured with value_schema_latest mode. Messages route correctly to all topics and JSON schemas are properly registered, but Iceberg integration fails during schema resolution step. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

netlify · 2026-02-20T00:29:37Z

✅ Deploy Preview for redpanda-labs-preview ready!

Name	Link
🔨 Latest commit	`4c174d4`
🔍 Latest deploy log	https://app.netlify.com/projects/redpanda-labs-preview/deploys/699f876fb3532700088d7a1b
😎 Deploy Preview	https://deploy-preview-279--redpanda-labs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

nvartolomei · 2026-03-02T17:47:58Z

+:env-docker: true
+:page-layout: lab
+:page-categories: Data Transforms, Iceberg, Schema Registry
+:description: Route batched events to multiple Iceberg-enabled topics using Wasm transforms with Avro encoding and Schema Registry wire format, creating a streaming data lakehouse pipeline.


is the wowrd "batched" necessary here?

"multiple Iceberg-enabled topics" that's an implementation detail so maybe drop it from the description and replace with "multiple iceberg tables" (as the title says) and rather introduce later that "redpanda supports at most one iceberg table per topic so to route to multiple table we are going to use transforms to fanout". Ohterwise, it is not clear why fanout is important, why transforms are necessary, how this lab is different from the other iceberg lab we already have.

nvartolomei · 2026-03-02T17:52:29Z

+
+== Overview
+
+This lab demonstrates how to build a streaming data lakehouse using Redpanda's Wasm data transforms and Iceberg integration. You'll deploy a transform that performs true 1:N fanout, parsing batch messages and routing individual updates from a single input topic to multiple domain-specific output topics, each configured as an Iceberg table for analytics.


Similar comment as above (https://github.com/redpanda-data/redpanda-labs/pull/279/changes#r2873829510). I believe the overview should state the problem and why this lab is worth looking into.

nvartolomei · 2026-03-02T17:55:23Z

+
+This lab demonstrates how to build a streaming data lakehouse using Redpanda's Wasm data transforms and Iceberg integration. You'll deploy a transform that performs true 1:N fanout, parsing batch messages and routing individual updates from a single input topic to multiple domain-specific output topics, each configured as an Iceberg table for analytics.
+
+A Go producer sends complete JSON batches to an input topic. The transform dynamically registers Avro schemas in Schema Registry during initialization, then parses each batch, converts the data to Avro format with Schema Registry wire format encoding, and fans out messages to the appropriate Iceberg-enabled topics. Redpanda automatically validates messages against the Avro schemas and writes to Iceberg tables.


Not clear why we need Avro. Worth adding a sentence why we use both avro and json otherwise it is confusing. Fine to just say "to demo capabilities" if that's the intention.

Since the title is about fan-out we could focus only on fan-out and not mix technologies (json vs avro). But again, fine if you want to show more things. Just make it explicit so that the reader can follow the train of thought.

nvartolomei · 2026-03-02T17:57:34Z

+This approach offers several advantages:
+
+* No external ETL: routing and encoding happen inside Redpanda brokers using Wasm transforms.
+* Messages include schema IDs for automatic validation and deserialization.


nit: what automatic validation means? we don't validate broker side. do you mean that we do that in transforms? i'd be explicit.

nvartolomei · 2026-03-02T17:58:29Z

+* Operational simplicity: a single Redpanda cluster handles routing, encoding, validation, and storage.
+* Iceberg tables are immediately queryable by Spark.
+
+Consider this approach for use cases like:


Consider lifting this higher. Way before you start describing the solution.

nvartolomei · 2026-03-02T18:04:50Z

+spark.sql("DESCRIBE lab.redpanda.customers").show()
+----
+
+NOTE: It may take a few seconds for data to appear in Iceberg tables after producing. Redpanda writes to Iceberg based on the `iceberg_target_lag_ms` setting (5 seconds in this lab).


and iceberg_catalog_commit_interval_ms

nvartolomei · 2026-03-02T18:07:35Z

+          │
+          ▼
+┌──────────────────────────────┐
+│  Spark / Jupyter             │


Needs an extra box for iceberg rest catalog itself. I'd put it on the same line as object storage as both iceberg-enabled topics (above) and spark/jupyter (below) communicate with both: object storage and iceberg catalog.

nvartolomei · 2026-03-02T18:10:33Z

@@ -0,0 +1,20 @@
+{


what are these used for?

nvartolomei · 2026-03-02T18:11:26Z

@@ -0,0 +1,7 @@
+jupyter==1.0.0
+spylon-kernel==0.4.1
+pyiceberg[pyarrow,duckdb,pandas]==0.7.1


what is pyiceberg and duckdb used for?

nvartolomei · 2026-03-02T18:15:06Z

+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mAnalysisException\u001b[0m                         Traceback (most recent call last)",


This shouldn't contain exceptions imho when committed.
Either clean execution so that it has something to render on github which supports rendering .ipynb files or strip outputs with https://github.com/kynan/nbstripout before committing.

rockwotj

can look after Nicolae's feedback is addressed

kbatuigas changed the title ~~Add multi-topic fanout to Iceberg tables lab~~ New lab for multi-topic fanout to Iceberg tables Feb 20, 2026

kbatuigas added 9 commits February 20, 2026 08:58

Sym link for docs site publishing

99653e3

Minor edit

5da32bd

Try removing doc xrefs

639eb68

Try adding attributes for rendering variables in code block

e1c3376

Transform should take care of both parsing and routing

acd4afe

Dynamic schema registration

890dcec

Use Avro

b602e48

Minor edits

498ac65

Minor edits

4c174d4

kbatuigas requested a review from rockwotj February 26, 2026 20:36

nvartolomei reviewed Mar 2, 2026

View reviewed changes

rockwotj reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New lab for multi-topic fanout to Iceberg tables#279

New lab for multi-topic fanout to Iceberg tables#279
kbatuigas wants to merge 10 commits intomainfrom
iceberg-multi-topic-fanout-using-transforms

kbatuigas commented Feb 20, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

nvartolomei Mar 2, 2026

Uh oh!

rockwotj left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		== Overview

		This lab demonstrates how to build a streaming data lakehouse using Redpanda's Wasm data transforms and Iceberg integration. You'll deploy a transform that performs true 1:N fanout, parsing batch messages and routing individual updates from a single input topic to multiple domain-specific output topics, each configured as an Iceberg table for analytics.


		This lab demonstrates how to build a streaming data lakehouse using Redpanda's Wasm data transforms and Iceberg integration. You'll deploy a transform that performs true 1:N fanout, parsing batch messages and routing individual updates from a single input topic to multiple domain-specific output topics, each configured as an Iceberg table for analytics.

		A Go producer sends complete JSON batches to an input topic. The transform dynamically registers Avro schemas in Schema Registry during initialization, then parses each batch, converts the data to Avro format with Schema Registry wire format encoding, and fans out messages to the appropriate Iceberg-enabled topics. Redpanda automatically validates messages against the Avro schemas and writes to Iceberg tables.

Conversation

kbatuigas commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for redpanda-labs-preview ready!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kbatuigas commented Feb 20, 2026 •

edited

Loading

netlify Bot commented Feb 20, 2026 •

edited

Loading