Open Source Data Lake Table Formats: Evaluating Current Interest and Rate of Adoption

Measuring the current levels of interest and potential adoption rates of leading data lake table formats using commonly available metrics

Gary A. Stafford
17 min readFeb 12, 2022

--

This post examines the current levels of interest and potential adoption rates for the three popular data lake table formats: Apache Hudi™, Apache Iceberg™, and Delta Lake™. Using publicly available data, this post unbiasedly reviews analytics community involvement, project activity, commercial support, and levels of third-party vendor integration. Understanding these metrics is critical to an organization’s decision to adopt a data lake table format. Being confident that an open-source project or commercial product has sufficient backing, longevity, and a robust user base must be part of any product selection criteria.

Image copyright: peshkov (123rf.com)
Image copyright: peshkov

Prelude: Big Data and Analytics Market

According to Pitchbook, US venture capital-backed companies raised $329.6 billion in 2021, nearly double the previous record of $166.6 billion raised in 2020. According to CB Insights in their Global 2021 State of Venture report, global venture funding reached a record $621 billion in 2021, more than double the 2020 mark of $294 billion. According to FactSet, over 500 VC-backed companies became unicorns in 2021, reaching valuations over $1 billion, with some reaching decacorn status with a valuation of over $10 billion.

Again according to FactSet, global investments in the technology services sector in 2021 were 5–6x greater than those in any other sector, including finance, commercial services, and health technologies. Within the technology services sector, investments in big data and analytics startups were red hot in 2021. This investment trend continues into Q1-2022. In late January, data warehouse startup Firebolt raised $100 million in a Series C funding round on a valuation of $1.4 billion. Also, in January, Prophecy, creators of an…

--

--

Gary A. Stafford

Area Principal Solutions Architect @ AWS | 10x AWS Certified Pro | Polyglot Developer | DataOps | DevOps | Technology consultant, writer, and speaker