Building Open Data Lakes on AWS with Debezium and Apache Hudi

Introduction

In the following recorded demonstration, we will build a simple open data lake on AWS using a combination of open-source software (OSS), including Red Hat’s Debezium, Apache Kafka, and Kafka Connect for change data capture (CDC), and Apache Hive, Apache Spark, Apache…

--

--

--

AWS Senior Solutions Architect | 8x AWS Certified Pro | Polyglot Developer | DataOps | DevOps | Technology consultant, writer, and speaker

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Is TDD Actually Important?

The difference between extend and append list methods in Python

Mental Model: How does software work?

Flutter — 5 reasons why you may love it

VinJS18: Speaker’s Review or 6 ingredients of this great conf 🔥

Building GraphQL client for Spring Boot Java project

Advice on Software Projects to Survive at COVID-19: Time to Skill up Continuous Testing

Pairs whose sum is divisible by a given number

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gary A. Stafford

Gary A. Stafford

AWS Senior Solutions Architect | 8x AWS Certified Pro | Polyglot Developer | DataOps | DevOps | Technology consultant, writer, and speaker

More from Medium

Apache Oozie Monitoring

Orchestration of AWS EMR Clusters Using Airflow —The Insider Way

Building an analytics pipeline using StreamSets Data Collector, Apache Kafka, and Pinot

Five Ways to Run Analytics on MongoDB — Their Pros and Cons