hi all, im trying to wrap my head around lakefs fo...
# help
s
hi all, im trying to wrap my head around lakefs for a specific use case. Can anyone validate if lakefs is the right solution? I have a process where I have external datasets that I transform and want to version and package up and deploy into a production relational database that my application reads. Would lakefs be a good solution in the middle of that in between sourcing and transforming data then deploying it into production database? Could it help streamline the dev/testing effort of these data packages?
e
Hi @Seth Therrien The use case you describe is probably the most common use case for lakeFS. To make sure you can use it, can you please share your tech stack?
s
currently python for extract/transform, postgres for intermediate, mysql for production application database on aws
my background is with modern data stack, I'm suggesting this gets migrated to python extract -> snowflake -> dbt -> deploy to production database in the future
e
lakeFS currently supports version control of data management in object storage only
s
I dont fully understand the relationship between lakefs versioned data and production (application relational) databases, seems like a hook would need to sync data to external database after verifying quality within lakefs?
e
Snowflake support is coming towards the end of the year in lakeFS enterprise
Yes, hooks are the right solution for the flow you describe.
s
I have a lot more questions, is there any better place for me to rapidly learn so I dont waste anybodys time
like, ideal architecture for testing data versions in the UX/UI, referential integrity when using versioned data with lakefs
e
Did you see my object storage comment? Check out docs.lakefs.io and our YouTube channel.
s
i think it makes sense assuming object data syncs to relational databases via hooks after promotion
in my use case
e
Great! So I believe it's a good fit for you.
s
awesome