This post examines the current levels of interest and potential adoption rates for the three popular data lake table formats: Apache Hudi™, Apache Iceberg™, and Delta Lake™. Using publicly available data, this post unbiasedly reviews analytics community involvement, project activity, commercial support, and levels of third-party vendor integration. Understanding these metrics is critical to an organization’s decision to adopt a data lake table format. Being confident that an open-source project or commercial product has sufficient backing, longevity, and a robust user base must be part of any product selection criteria.
Prelude: Big Data and Analytics Market
According to Pitchbook, US venture capital-backed companies raised $329.6 billion in 2021, nearly double the previous record of $166.6 billion raised in 2020. According to CB Insights in their Global 2021 State of Venture report, global venture funding reached a record $621 billion in 2021, more than double the 2020 mark of $294 billion. According to FactSet, over 500 VC-backed companies became unicorns in 2021, reaching valuations over $1 billion, with some reaching decacorn status with a valuation of over $10 billion.
Again according to FactSet, global investments in the technology services sector in 2021 were 5–6x greater than those in any other sector, including finance, commercial services, and health technologies. Within the technology services sector, investments in big data and analytics startups were red hot in 2021. This investment trend continues into Q1-2022. In late January, data warehouse startup Firebolt raised $100 million in a Series C funding round on a valuation of $1.4 billion. Also, in January, Prophecy, creators of an…