Getting Started with Apache Zeppelin on Amazon EMR, using AWS Glue, RDS, and S3: Part 1 — Setup

18 min readNov 23, 2019

Introduction

There is little question big data analytics, data science, artificial intelligence (AI), and machine learning (ML), a subcategory of AI, have all experienced a tremendous surge in popularity over the last 3–5 years. Behind the hype cycles and marketing buzz, these technologies are having a significant influence on all aspects of our modern lives. Due to their popularity, commercial enterprises, academic institutions, and the public sector have all rushed to develop hardware and software solutions to decrease the barrier to entry and increase the velocity of ML and Data Scientists and Engineers.

*Data Science: 5-Year Search Trend (courtesy* *Google Trends*)

*Machine Learning: 5-Year Search Trend (courtesy* *Google Trends*)

Technologies

All three major cloud providers, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud, have rapidly maturing big data analytics, data science, and AI and ML services. AWS, for example, introduced Amazon Elastic MapReduce (EMR) in 2009, primarily as an Apache Hadoop-based big data processing service. Since then, according to Amazon, EMR has evolved into a service that uses Apache Spark, Apache Hadoop, and several other leading…

Getting Started with Apache Zeppelin on Amazon EMR, using AWS Glue, RDS, and S3: Part 1 — Setup

Introduction

Technologies

Written by Gary A. Stafford