王麒詳
08/11/2022, 1:32 AMdata coming->|
MINIO <--> lakeFS <--> LabelStudio <-- data annotation
|<--> User access data
However, I have several questions about data access.
1. What are recommended ways to access data in lakeFS? Most of the data stored in my lakeFS server are image files. I’m currently using boto3
or python lakeFS API
client.object.get_object
to get bytes and then transfer them to image files. I’m wondering if there is an efficient way to access data or not in the development stage.
2. Another question is cloning data from a repo (or a branch) to a local. I tried some tutorial examples from official docs and ran it successfully. I’d like to know if cloning a large data repo to local is a common way in practice or not. Because when the data is larger than a hundred GB, cloning all of the data to a local machine, and using it seems not the best choice.
Thanks for all of your kindly help and a good patience to read my questions:)Eden Ohana
08/11/2022, 2:03 AM王麒詳
08/11/2022, 3:40 AMEden Ohana
08/11/2022, 3:51 AM王麒詳
08/11/2022, 3:52 AMeinat.orr
08/11/2022, 6:25 AM王麒詳
08/11/2022, 6:27 AMeinat.orr
08/11/2022, 6:30 AM王麒詳
08/11/2022, 6:40 AMeinat.orr
08/11/2022, 6:50 AM王麒詳
08/11/2022, 7:16 AMI assume you have the same concern for the case of using just min.io, correct?You’re right! currently, I only fetch data by using the S3 gateway from lakeRS.
Cause with the right provisioning lakeFS should not create a significant overhead.
(Same is true for min.io 🤔 )Wow, Really🤠!? I will try some toy examples for testing the scenario I mentioned above in the next few days. I’d like to share testing info for all of you😀.
einat.orr
08/11/2022, 7:18 AM王麒詳
08/11/2022, 7:28 AM