I'm new to lakeFS. We have now setup lakeFS on our...
# help
j
I'm new to lakeFS. We have now setup lakeFS on our own server with MinIO providing the s3 interface. So far this works as far as I can tell. Now I want to read a simple iris.data file from lakeFS that was uploaded manually to the main branch. In my understanding this should be possible using the lakeFS API, following this example: https://pydocs.lakefs.io/docs/ObjectsApi.html#get_object I assum that the username/password is equivalent to the Access Key pair? I also assume that the host is the same port as used for the UI, specified in .lakectl.yaml? Unfortunately, with this setup I receive the following error message """ Exception when calling ObjectsApi->get_object: (401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Request-Id': '02494329-b95e-4652-98d8-5733efd05f70', 'Date': 'Fri, 27 Jan 2023 134149 GMT', 'Content-Length': '43'}) HTTP response body: {"message":"error authenticating request"} """ Any help is appreciated!
o
Hi @Jonas, welcome to lakeFS! In order to use the API you should use access secret key pair generated for your user. Is this the case? The host and port should match the UI, yes
j
Hi and thanks for the fast reply! Yes both is the case. Unfortunately I still get the same error
o
Are you able to login to the UI using these credentials and download the object?
j
Well... I was until someone has removed my credentials... thanks it works now after resetting them! πŸ™‚
o
πŸ™ƒ happy to hear! Happy lakeFSing!
Is there anything else I can help with?
j
Not for now, thank you very much!
@Or Tzabary There is actually a question I have: what is the intended/recommended way to read images from lakeFS? I have a small sample in my repo with the following structure: repo |_train |_cat |_ *.png |_dog |_ *.png |_test |_cat |_ *.png |_dog |_ *.png One way I found was to us the lakeFS Python API to get a list of all images and to iterate over it using PIL on the io.Buffer. The issue though is that as far as I can tell I get all files in the branch so I cannot select the train or test set separately without filtering the paths on the results object. Is that how it is supposed to work? Would you recommend to keep that structure or do you suggest to use separate branches to store train and test data?
o
@Jonas hey! I think a directory structure is actually the correct way of doing this! when calling the object listing api you can pass a prefix (e.g. train/dog/ ) and the listing api will only return objects under that prefix, so it scales better and is less complex than filtering on the client
πŸ™ 1
does that make sense?
j
@Oz Katz Prefect I was looking for something like that! The doc (https://github.com/treeverse/lakeFS/blob/master/clients/python/docs/ObjectsApi.md#list_objects) just didn't show any possible prefix! I should learn to scroll down.... I was just looking for the signature shown at the very top
Thank you! πŸ™‚
o
😊
the docs could be improved however, there’s no clear example of how to use the prefix parameter.. so thanks for catching that πŸ™‚
j
I was also not able to find an example of the get_object for tabular data in your sample git repo. If there is none I'm happy to provide my code snippets to be included if that helps
🦜 1
o
πŸ™πŸ» we appreciate contributions!
thank you
151 Views