CDC-Demo update #206

mkhelghati-db · 2025-09-24T15:08:04Z

Major Transformations

Serverless Compute Migration
Old: Continuous streaming (trigger(processingTime='10 seconds'))
New: Serverless batch processing (trigger(availableNow=True))
Result: 60-80% cost reduction, pay only for processing time
CDC Data Simulation to test streaming data pipelines
Added: Background data generators creating CDC events every 60 seconds
Operations: INSERT, UPDATE, DELETE with realistic patterns
Coverage: Both single-table and multi-table scenarios
CDF Efficiency Demonstrations
Added: Explicit volume comparisons (CDF vs non-CDF processing)
Metrics: Processing efficiency, cost reduction, speed improvements
Impact: Shows 60-90% reduction in data processing volume
Added: Real-time monitoring and progress tracking
Performance Optimizations
Delta Properties: Optimized file sizes and rewrite tuning
Auto Loader: Incremental processing configuration
Schema Evolution: Robust handling with mergeSchema=true

QuentinAmbard · 2025-10-20T13:07:30Z

hey, that's a great update, but you have a lot of extra file that shouldn't be there in the PR, could you clean it up ? We should only have the notebook files
Thanks!!

- Migrate to serverless compute with trigger(availableNow=True) - Add continuous CDC data generators for realistic simulation - Implement CDF vs non-CDF processing volume demonstrations - Restructure demos with 8 numbered steps for clarity - Add performance optimizations (delta properties, auto loader config) - Fix schema evolution and column name issues - Remove deprecated Spark configurations - Add real-time monitoring and progress tracking

Resolved conflicts by keeping our CDC pipeline updates: - product_demos/cdc-pipeline/01-CDC-CDF-simple-pipeline.py - product_demos/cdc-pipeline/02-CDC-CDF-full-multi-tables.py Our CDC updates include: - Serverless compute with trigger(availableNow=True) - Continuous CDC data generators - CDF vs non-CDF processing demonstrations - 8-step structured storyline - Performance optimizations - Real-time monitoring and progress tracking

mkhelghati-db · 2025-11-24T13:28:37Z

@QuentinAmbard sorry it took too long. I had forgot about it. Please have a look and let me know if it is all ok.

mkhelghati-db force-pushed the main branch from 22e173d to 1bb3720 Compare November 24, 2025 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CDC-Demo update #206

CDC-Demo update #206

mkhelghati-db commented Sep 24, 2025

Uh oh!

QuentinAmbard commented Oct 20, 2025

Uh oh!

mkhelghati-db commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CDC-Demo update #206

Are you sure you want to change the base?

CDC-Demo update #206

Conversation

mkhelghati-db commented Sep 24, 2025

Uh oh!

QuentinAmbard commented Oct 20, 2025

Uh oh!

mkhelghati-db commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants