Hey, Few questions for things im might need on lak...
# help
u
Hey, Few questions for things im might need on lakefs? • There is a way to read multiple files from specific commit by one request? • There is a way to commit a file by writing it directtly to s3 or download specific file from s3 from specific commit/latest master ? • API for diff of what lines change ?
u
Hi Tamir, • you can use https://docs.lakefs.io/reference/commands.html#lakectl-fs-download • Can you please share your use case for it? Why do you need to write it directly to s3? if you write the file directly to s3 you need to link it to lakefs so you can commit it. • No, there isn’t a diff API operation for a line change, only for a file change
u
1. I meant in rest api way ? I see there /repositories/{repository}/refs/{ref}/objects but only for one object. It can support multiple or many? 2. I want to save in elastic what changed between revisions of the files and do the compare by myself because answer for third question is no. Therefor im thinking about saving a big file of all data of files together so i can get it with one request (because the answer for question one seems to be no) and do the compares by myself only on what the diff api says was changed. Because file might be large for example in maximum case around 50mb maybe working with rest will be less efficient/possible then directly with s3. 3. Thanks for the answer
u
Do you want to save on one file versions of all the files? Lakefs can give you information about what files have changed. You can retrieve the files from lakefs one by one. Then on your client - perform a diff on the file line level and save it wherever you want. Lakefs doesn't support reading multiple objects on one request (same as s3 doesn’t- If the file is large you can get connection timeout).
u
The timeout is configurable if I want to manage lakefs on my own cluster? What are the maximum adviced size for file in lakefs when we use s3 as the backend ?
u
Yes regarding "Do you want to save on one file versions of all the files?" - this is the idea I have for comparing the diffs them selves in one request instead of many requests
u
The objects are saved on s3 (the bucket you provided) and not on lakeFS. LakeFS is used for versioning on top of it. The file limit is the same as s3
u
Why do you have a limit on the amount of the requests?
u
I want to make the saving history process as efficient and fast as I can and networking seems a big time consumer
u
Where do you plan to deploy lakeFS?
u
On my k8s cluster with the helm chart as documented here https://docs.lakefs.io/deploy/k8s.html
u
I don’t think performing one request to LakeFS will help here because s3 doesn't support getting multiple objects in one request and will need to do multiple requests to s3 anyway
u
Yes but it can still reduce number of requests to s3 if there will be such an option
u
Can you please open an issue and mention your use can and what you expect from this operation? get by what? batch or names
u
Thanks, let me know if any other questions come up