Silly question sorry but once I ve created a LakeFS repo ing lakeFS #help

Silly question sorry, but once I've created a Lake...

user

12/30/2021, 5:13 AM

Silly question sorry, but once I've created a LakeFS repo, ingested some data into a branch. How does a different user read data from that branch? Using the python or spark client? I'm looking through the github, it seems like it would be this: https://github.com/treeverse/lakeFS/blob/master/clients/python/lakefs_client/api/objects_api.py#:~:text=def-,get_object,-( Does anyone have any community examples of being an end user of a lakefs repository?

user

12/30/2021, 5:18 AM

Hi @Yusuf K! which application are you using to read data from the object store in your organization? the answer really depends on how you work

user

12/30/2021, 5:20 AM

I'm thinking of the data scientist end user. They will use an HPC, or a GPU-backed VM. So vanilla python is one use case. Down the link we will also have managed spark. But for now python

user

12/30/2021, 5:26 AM

let’s distinguish between two things: • How to make lakeFS accessible to the data scientists in your org • How to read data from an accessible lakeFS server Your question is about the later. and yes the Python client is the right way to go.

user

02/02/2022, 6:53 AM

Sorry if I am stating the obvious, but in many cases using the S3 gateway is a really good choice, one which many of our users make. End-users need to configure a different S3 endpoint URL and re-use their lakeFS access key ID and secret access key. Then accessing s3://my-repo/branch/path/to/something.csv will access the path on the specified repo and branch. Since most (all?) tooling supports the S3 API, often switching to lakeFS can require nothing more than configuration.

Open in Slack

Previous Next