https://lakefs.io/ logo
Title
s

Sanidhya Singh

02/22/2023, 4:18 AM
Hi everyone! I’m exploring LakeFS for our data versioning use case. I followed the tutorial to create a repo -> upload file -> commit to master. However, I’m unable to actually read the file from the MinIO bucket using Pandas.
import pandas as pd
import os
import s3fs

class S3FileSystemPatched(s3fs.S3FileSystem):
    def __init__(self, *k, **kw):
        super(S3FileSystemPatched, self).__init__(*k,
                                                  key = os.environ["AWS_ACCESS_KEY_ID"],
                                                  secret = os.environ["AWS_SECRET_ACCESS_KEY"],
                                                  client_kwargs={'endpoint_url': os.environ["AWS_S3_ENDPOINT"]},
                                                  **kw)
        print('S3FileSystem is patched')
s3fs.S3FileSystem = S3FileSystemPatched

data = pd.read_csv("<s3://example/master/test.csv>")
it throws
FileNotFoundError: example/master/test.csv
a

Adi Polak

02/22/2023, 7:13 AM
Hi @Sanidhya Singh, interesting case, I wonder, can you read directly from lakeFS ?
<lakefs://example/master/text.csv>
j

Jonathan Rosenberg

02/22/2023, 7:42 AM
Hi @Sanidhya Singh, Can you share your s3a configurations (minus the secrets)?
b

Barak Amar

02/22/2023, 8:53 AM
@Sanidhya Singh also check that the object is on
master
or
main
which is the new default.
s

Sanidhya Singh

02/22/2023, 9:00 AM
Hi @Adi Polak, it says
ValueError: Protocol not known: lakefs
Hi @Jonathan Rosenberg, I’m using just s3, updated my code reflect it. The configuration is being passed through s3fs above.
j

Jonathan Rosenberg

02/22/2023, 9:02 AM
and what’s the endpoint that you pass to it as the
endpoint_url
?
s

Sanidhya Singh

02/22/2023, 9:02 AM
Hi @Barak Amar, I’m using
master
@Jonathan Rosenberg the endpoint is the URL to the MinIO API
j

Jonathan Rosenberg

02/22/2023, 9:03 AM
it should be the endpoint to your lakefs server, and the
AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
should be your lakefs key and secret
s

Sanidhya Singh

02/22/2023, 9:07 AM
Ah, I see. I’m running a LakeFS container locally on my machine and using a remote MinIO instance. What would be default LakeFS endpoint http://localhost/api/v1?
tried with http://localhost:8000/api/v1, got
FileNotFoundError: The specified bucket does not exist
j

Jonathan Rosenberg

02/22/2023, 9:11 AM
So what you would want to do is to use lakeFS’s S3 gateway which is an S3 compatible endpoint that S3 clients can work with (like S3f). The endpoint should be: http://localhost:8000
👍 1
If you want to take it to the next level, you can try to use the lakeFS Hadoop Filesystem (instead of S3fs) which will use lakeFS for your metadata only, and directly interact with MinIO to write data.
s

Sanidhya Singh

02/22/2023, 9:27 AM
I haven’t gotten it to work, even after passing in the LakeFS URL, Access Key and Secret. The idea is to read a CSV, versioned through LakeFS and served by MinIO. Please let me know if I’m doing something fundamentally incorrect.
j

Jonathan Rosenberg

02/22/2023, 9:34 AM
if you can share your configurations (you can DM me if you prefer) it would be very helpful. Also make sure that you provided all 4 necessary configurations specified here