Hi, for the lakefs running on a docker instance, w...
# help
u
Hi, for the lakefs running on a docker instance, what is the filepath one should use to access the files (using python) ? I tried local://demo-repo/main/example.csv and lakefs://demo-repo/main/example.csv but both of them don't work. I have storage namespace as
local
u
Hi @Vino, Great question. I am trying to understand what you are trying to accomplish?
u
Hey Shimi. I'm trying to read file from main, modify the file and write the modified file to a new branch.
u
Quick question. Is it because the underlying file system with LakeFS is distributed (like HDFS) I'm not able to read the file with python open()? Should I use spark read.csv() instead?
u
Hey @Vino, will you please share the command you are trying to run?
u
Simply modify the original file you uploaded to main and recommit it to the branch.
u
Let me try and explain
u
@Yoni Augarten
Copy code
with open("<lakefs://demo-repo/main/example.csv|lakefs://demo-repo/main/example.csv>") as fp:
    data = fp.read()
u
I get
No such file or directory
error
u
@Vino, to access files in lakeFS you can't simply use Python filesystem commands. You need to install and use the lakeFS Python client.
u
Lakefs works in form of comparing one file with changes that have been made, so once you added the file to a branch and committed, go ahead and make some changes in that same local file and upload and commit again
u
Does this answer your question?
u
In the case you are interested, in reading the file as it stored in lakefs you will need the client as @Yoni Augarten suggested
u
Hi @Yoni Augarten I have the lakefs_client installed. In the API reference, I can't seem to find an API for reading the contents of a file in LakeFS. There's one to add a file though.
u
I see
u
Let me try to guide you
u
What you're looking for is the get_object method.
u
Note that when using the client, paths (or more accurately, keys) in lakeFS are to be used without a scheme, repository or branch. The repository and branch are given through the other method arguments. So, for example:
Copy code
api_instance.get_object("demo-repo", "main", "example.csv")
u
Got it. Thanks a ton!