Hey all! I came across this article mentioning the use of lakeFS in the CI/CD of data pipelines.
I’m wondering if anyone knows and can recommend of tools (OSS or not) that helps in continuous deployment of AI inference workflows in prod.
The scenario: complex pipeline in prod serving users in real-time, involving several models and business logic that is constantly changing.
Challenge: testing how a change will impact this DAG before landing a change (model, data, code)
Any pointers will be greatly appreciated, and sorry if this is a noob question or out of context 🙂
6 months ago
Hey all. A question on talking to S3 using the Hadoop-LakeFS-assembly:
Hi, I just joined this channel and have a question on how to best use lakefs for a specific use case. I am looking for some kind of 'best practice' workflow for reprocessing data. So the situation would be that you have some pipeline logic (code) that you changed and want to apply to your historical production data. I guess that you would have a git branch with your modified pipeline code, and could have a lakefs branch to test the changes on in isolation. But if you want to release those changes to production, how would you go about it? Also taking into account that during the development and testing of your new pipeline code, new data might have been ingested and processed by the current pipeline logic.
6 months ago
Been following the great tutorial on how to use lakeFS and airflow together, to branch out of the main data branch, run spark logic, and later commit automatically. thanks! one question, using the
can I add
there too? I want to run validation checks on the data. do I need a spark engine to qualify the data?
I have a question, and I'm sorry in advance if it's a stupid one. But I work for a company that has a lot of PII on our datalake, if we create a branch, we'd be able to see that info and it would be a problem. Did anybody has this issue and solved it somehow?Thanks in advance
I first started lakefs by running "docker-compose up", then do some initial setup, etc, then run "docker-compose down" to stop lakefs, then run "docker-compose up" to start lakefs again, but all previous setup are lost, I have to re-setup again. Does anyone know how I can solve this?