Hi, everyone! I would like to use LakeFS for produ...
# help
a
Hi, everyone! I would like to use LakeFS for production. At the current moment, we support only on-premise solutions. Do u have any recommendations how LakeFS can be setup in production for on-premise way?
o
hey @Artsiom Yudovin - I would suggest going with something similar to the recommended Kubernetes deployment, replacing S3 with something compatible such as MinIO
@Artsiom Yudovin happy to hear it btw! can you elaborate a bit about your use case for it?
a
yep, I see two use cases where LakeFS can help us: 1. we have huge pipeline and want to conduct different experiments with changing logic methodology in our pipeline and see and compare results. 2. we want to support several versions of our pipeline in production and enable one of the versions when we would like.
o
cool! these are indeed really good use cases for lakeFS. Let me know if the Kubernetes + MinIO setup fits your needs!
๐Ÿ™ 1
Additionally, if youโ€™re using Airflow to orchestrate these pipelines, this might come in handy for running parallel pipelines: https://lakefs.io/the-airflow-and-lakefs-integration/
a
yep, we use Airflow, thx!
๐Ÿ™ 1
@Oz Katz, question about LakeFS: when I create a new branch, Does it mean that the data from the main branch is duplicated physically?
In terms of disk spaces how is working?
o
good question! no, lakeFS implements copy on write across branches. See https://docs.lakefs.io/faq.html#2-how-does-lakefs-data-versioning-work
๐Ÿ™ 1
it will only store the delta between the source branch and the changes you make