You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add comprehensive CDF vs non-CDF processing volume demonstrations
- Add clear explanations of CDF vs non-CDF approaches with processing volume examples
- Demonstrate actual processing volume differences with live metrics
- Show processing efficiency calculations (percentage reduction, speed improvements)
- Add real-time processing volume tracking in batch operations
- Display actual changes detected by CDF vs total table size
- Add multi-table CDF processing volume analysis per table
- Show cost impact and performance benefits of CDF processing
- Demonstrate up to 99%+ reduction in processing volume for incremental changes
- Add visual output showing records processed vs total records
- Include real-world impact examples (1K vs 1M records processing)
- Enhance both simple and multi-table demos with processing volume insights
#getting the latest change is still needed if the cdc contains multiple time the same id. We can rank over the id and get the most recent _commit_version
555
+
#Select only the first value
556
+
#getting the latest change is still needed if the cdc contains multiple time the same id. We can rank over the id and get the most recent _commit_version
498
557
data_deduplicated=data.withColumn("rank", dense_rank().over(windowSpec)).where("rank = 1 and _change_type!='update_preimage'").drop("_commit_version", "rank")
499
558
500
-
#Add some data cleaning for the gold layer to remove quotes from the address
559
+
#Add some data cleaning for the gold layer to remove quotes from the address
# MAGIC - **Cost Savings**: Significant reduction in compute costs per table
418
+
# MAGIC - **Performance**: Much faster processing times per table
419
+
# MAGIC
420
+
# MAGIC **📊 Key Metrics Per Table**:
421
+
# MAGIC - **Total Bronze Records**: Shows full table size per table
422
+
# MAGIC - **CDF Records Processed**: Shows only changed records per table
423
+
# MAGIC - **Efficiency Gain**: Percentage reduction in processing volume per table
424
+
# MAGIC - **Speed Improvement**: Multiplier for processing speed per table
425
+
# MAGIC
426
+
# MAGIC **💡 Multi-Table Impact**: In production, this can mean processing 1,000 records across 5 tables instead of 5,000,000 records for incremental updates!
427
+
# MAGIC
428
+
# MAGIC ### 3.4 Starting all the streams
369
429
# MAGIC
370
430
# MAGIC We can now iterate over the folders to start the bronze & silver streams for each table.
0 commit comments