Building Open Data Lakes on AWS with Debezium and Apache Hudi

Introduction

In the following recorded demonstration, we will build a simple open data lake on AWS using a combination of open-source software (OSS), including Red Hat’s Debezium, Apache Kafka, and Kafka Connect for change data capture (CDC), and Apache Hive, Apache Spark, Apache…

--

--

--

AWS Senior Solutions Architect | 8x AWS Certified Pro | Polyglot Developer | DataOps | DevOps | Technology consultant, writer, and speaker

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

AWS Landing Zone #2: Control Tower Account Factory and baselining

Configure and run CryptoGirls automatic payments script for TRX rewards

One month in the clouds

Java Script Error Handling Process.

Unleash the power of your data to reduce your technical debt

Implementing a reusable Share Extension in iOS

Microsoft Released First Feature In Customers Product Powered by GPT-3

30 Days of Code in HackerRank with Python (Day 7: Arrays)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gary A. Stafford

Gary A. Stafford

AWS Senior Solutions Architect | 8x AWS Certified Pro | Polyglot Developer | DataOps | DevOps | Technology consultant, writer, and speaker

More from Medium

The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium

Landing data on S3: the good, the bad and the ugly.

Managing a Data Engineering Stack With AWS CDK

Building an Apache Kafka cluster using AWS CDK.