Streaming Data Pipeline

Description

This project serves as an example for teaching the HWE Course at 1904Labs.

A Kafka producer publishes to the kafka topic reviews.
A spark streaming application consumes reviews from the kafka topic. Within each review is a customer_id.
The Spark streaming application joins each review with a record retrieved from Hbase, and uses this customer_ic to make that join.
Spark streaming stores this enriched record in HDFS.
Hive is used to query the data from hdfs.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
kafka		kafka
spark		spark
.gitignore		.gitignore
README.md		README.md