https://lakefs.io/ logo
Title
j

Jonas

01/27/2023, 1:56 PM
I'm new to lakeFS. We have now setup lakeFS on our own server with MinIO providing the s3 interface. So far this works as far as I can tell. Now I want to read a simple iris.data file from lakeFS that was uploaded manually to the main branch. In my understanding this should be possible using the lakeFS API, following this example: https://pydocs.lakefs.io/docs/ObjectsApi.html#get_object I assum that the username/password is equivalent to the Access Key pair? I also assume that the host is the same port as used for the UI, specified in .lakectl.yaml? Unfortunately, with this setup I receive the following error message """ Exception when calling ObjectsApi->get_object: (401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Request-Id': '02494329-b95e-4652-98d8-5733efd05f70', 'Date': 'Fri, 27 Jan 2023 13:41:49 GMT', 'Content-Length': '43'}) HTTP response body: {"message":"error authenticating request"} """ Any help is appreciated!
o

Or Tzabary

01/27/2023, 2:04 PM
Hi @Jonas, welcome to lakeFS! In order to use the API you should use access secret key pair generated for your user. Is this the case? The host and port should match the UI, yes
j

Jonas

01/27/2023, 2:05 PM
Hi and thanks for the fast reply! Yes both is the case. Unfortunately I still get the same error
o

Or Tzabary

01/27/2023, 2:15 PM
Are you able to login to the UI using these credentials and download the object?
j

Jonas

01/27/2023, 2:24 PM
Well... I was until someone has removed my credentials... thanks it works now after resetting them! πŸ™‚
o

Or Tzabary

01/27/2023, 2:24 PM
πŸ™ƒ happy to hear! Happy lakeFSing!
Is there anything else I can help with?
j

Jonas

01/27/2023, 2:53 PM
Not for now, thank you very much!
@Or Tzabary There is actually a question I have: what is the intended/recommended way to read images from lakeFS? I have a small sample in my repo with the following structure: repo |_train |_cat |_ *.png |_dog |_ *.png |_test |_cat |_ *.png |_dog |_ *.png One way I found was to us the lakeFS Python API to get a list of all images and to iterate over it using PIL on the io.Buffer. The issue though is that as far as I can tell I get all files in the branch so I cannot select the train or test set separately without filtering the paths on the results object. Is that how it is supposed to work? Would you recommend to keep that structure or do you suggest to use separate branches to store train and test data?
o

Oz Katz

01/27/2023, 4:47 PM
@Jonas hey! I think a directory structure is actually the correct way of doing this! when calling the object listing api you can pass a prefix (e.g. train/dog/ ) and the listing api will only return objects under that prefix, so it scales better and is less complex than filtering on the client
πŸ™ 1
does that make sense?
j

Jonas

01/27/2023, 4:50 PM
@Oz Katz Prefect I was looking for something like that! The doc (https://github.com/treeverse/lakeFS/blob/master/clients/python/docs/ObjectsApi.md#list_objects) just didn't show any possible prefix! I should learn to scroll down.... I was just looking for the signature shown at the very top
Thank you! πŸ™‚
o

Oz Katz

01/27/2023, 4:51 PM
😊
the docs could be improved however, there’s no clear example of how to use the prefix parameter.. so thanks for catching that πŸ™‚
j

Jonas

01/27/2023, 4:55 PM
I was also not able to find an example of the get_object for tabular data in your sample git repo. If there is none I'm happy to provide my code snippets to be included if that helps
:60fps_parrot: 1
o

Or Tzabary

01/27/2023, 5:06 PM
πŸ™πŸ» we appreciate contributions!
thank you