Skip to content

srikantande/Change-Data-Capture-CDC

Repository files navigation

  • Change Data Capture (CDC) pipeline using Debezium, Kafka, Redis and BigQuery

    • Infrastructure & Deployment:

The whole architecture setup is on Apple M1 Mac Air. All components (databases, Debezium, Kafka, Python app, etc.) are containerized using Docker. All software and components used are OSS.

    • Left Side - Kafka UI & Monitoring:

In this side of architecture we will be discussing about the Conduktor Console provides a UI for Kafka management and monitoring. It interacts with a PostgreSQL instance for metadata storage. Conduktor Cortex is used for monitoring Kafka performance. An Admin User accesses Kafka through the Conduktor Console. This allow admin user to configure Kafka and Connectors, monitor topics, debug issues, and analyse data, etc.

    • Center - Apache Kafka & Debezium (CDC Engine) ecosystem:

Apache Kafka acts as the event streaming platform, receiving change events from Debezium. The architecture also includes ZooKeeper, which helps in managing Kafka brokers. Kafka Connect is a framework that runs connectors that move data in and out of Apache Kafka The Kafka Connect framework is used to extract, transform, and load (ETL) data. Debezium is configured to monitor changes in these databases. It captures real-time data changes and streams them into Kafka.

    • Top Side - Data Sources (Databases):

These databases serve as the source systems where data changes occur. PostgreSQL MySQL MSSQL Each DB is enabled with CDC mechanism.

    • Right Side - Data Processing & Storage
      • Use Case#1

A Python-based containerised application consumes Kafka messages and processes them. The processed data is stored in Redis stack. Redis stack is used as a cache mechanism with indexing.

      • Use Case#2

Using WePay’ (Apache License 2.0) the Kafka Connect Google BigQuery Sink connector to stream data into BigQuery tables. Kafka Connect’s BigQuery Sink Connector automatically pushes Kafka data to BigQuery (Google Cloud).

About

CDC Architecture to sync data to BQ and Redis Stack

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors