Silly question sorry, but once I've created a Lake...
# help
u
Silly question sorry, but once I've created a LakeFS repo, ingested some data into a branch. How does a different user read data from that branch? Using the python or spark client? I'm looking through the github, it seems like it would be this: https://github.com/treeverse/lakeFS/blob/master/clients/python/lakefs_client/api/objects_api.py#:~:text=def-,get_object,-( Does anyone have any community examples of being an end user of a lakefs repository?
u
Hi @Yusuf K! which application are you using to read data from the object store in your organization? the answer really depends on how you work
u
I'm thinking of the data scientist end user. They will use an HPC, or a GPU-backed VM. So vanilla python is one use case. Down the link we will also have managed spark. But for now python
u
let’s distinguish between two things: • How to make lakeFS accessible to the data scientists in your org • How to read data from an accessible lakeFS server Your question is about the later. and yes the Python client is the right way to go.
u
Sorry if I am stating the obvious, but in many cases using the S3 gateway is a really good choice, one which many of our users make. End-users need to configure a different S3 endpoint URL and re-use their lakeFS access key ID and secret access key. Then accessing s3://my-repo/branch/path/to/something.csv will access the path on the specified repo and branch. Since most (all?) tooling supports the S3 API, often switching to lakeFS can require nothing more than configuration.