Hi there, I'm very new at lakeFS and is looking fo...
# help
e
Hi there, I'm very new at lakeFS and is looking for a way to implement the following: • we got a streaming price input, coming from a message bus with GB of data each day • we want to read those messages into lakeFS using python in a running service • Stored lakeFS data should get read by Dagster Assets and into new "tables" in an ETL manner • Eventually the data get's into future price estimations and stored in lakeFS again (or somewhere else) for use in a production service In such a setup, what would be the easiest integration to use for lakeFS? • How do I partition the data? Daily/hourly? I mean so it does not become to big? • How do I read data into Dagster Assets across partitions? • What would be the best way to store the results for querying in a service? Is it possible just to use lakeFS? Sorry for the noob'y question. I know I'm very early on the lakeFS/data journey 🙈 Any input is appreciated.
j
Hi @Emil Ingerslev, welcome to the lake! These are all very fine and valid questions, so no worries about it simple smile > How do I partition the data? Daily/hourly? I mean so it does not become to big? I would first like to mention that you can think of using lakeFS in the same way you would use S3 (or Azure Block Storage or GCS), meaning that the data partitions are entirely up for you to decide. > How do I read data into Dagster Assets across partitions? Check this and this, and tell me if it answers your question. > What would be the best way to store the results for querying in a service? Is it possible just to use lakeFS? It is absolutely possible to use lakeFS to store the results! As mentioned in the first answer, lakeFS is an abstraction layer on top of your object store, meaning that you can read and write to it. lakeFS is also agnostic to the type of data you use, so the results’ type doesn’t matter… I hope I managed to answer your questions (let me know if not 😅)