Hello, any thoughts/insight about using lakeFS with Deltalake / Iceberg / Hudi. Those are pretty popular and all provide some version control functionality. Would it be any benefit of using lakeFS together with them? How should we architect if we have a data lake with some of the tabular data in those formats, and some non tabular files?
Atlan: The Rise of the Metadata LakeModern business operations increasingly depend on data to derive their business. As the data takes the central role in the business operation, the number of stakeholders interacting with the data is more diverse than ever. In this increasingly diverse data world, metadata holds the key to the elusive promised land. Is it a time to think about metadata lake? The blog narrates the role of metadata lake in the modern data stack.https://towardsdatascience.com/the-rise-of-the-metadata-lake-1e95127594de
1 year ago
Hey all, I'm trying to stress-test lakeFS using Spark. I'm looking for ideas for how to generate "heavy" Spark jobs. By heavy I mean lots of reads, writes, computes and large amount of data written. Would love to get your input.
🐦* Tweet of the Week😗 A question from Erik Bernhardsson that touches on themes of unbundling and ETL connectors discussed above
11 months ago
🎥Video of the Week:
Advanced Apache Spark Training▾
– Sameer FarooquiThis 6 hour video (sorry!) from 2015 is outdated in some respects, but still gives the clearest explanations of Spark fundamentals that are certainly relevant today. Sameer covers:
– Spark’s history
– RDD fundamentals
– Runtime architecture & resource managers
– Memory & persistence
…and more! One thing I found interesting is when he shares stats about the Spark project and contributions (500 total contributors, 370k lines of code, 500 active production deployments). For some of those figures it is hard to get updated numbers (there certainly are a lot more than 500 production deployments), but I can say from Github there are now 1,720 total contributors. Not as high as I might have thought.Anyway, if you’re a Spark user, hop around to a section that interests you and I’m sure you’ll learn something new.