Hello. I'm trying to iterate and read every object...
# help
y
Hello. I'm trying to iterate and read every object in my repository, using the example provided in the package, with lakefs deployed locally through lakefs-samples/everything bagel but I am getting a MaxRetryError with
Failed to resolve 'minio' ([Errno -2] Name or service not known)"))
. This is what my code currently looks like. Is is something obvious I'm missing?
Copy code
for object in repo.branch("main").objects():
    source = get_filename(object.path)
    file_size = repo.branch("main").object(object.path).stat().size_bytes

    with repo.branch("main").object(object.path).reader(
        mode="r", pre_sign=True
    ) as fd:
        while fd.tell() < file_size:
            print(fd.read(10))
            fd.seek(10, os.SEEK_CUR)
o
Hey @Yiannis Zachariadis! If you wish to use
pre_sign
mode, your client application must be able to communicate with the HTTP/HTTPS endpoint used by your lakeFS installation (i.e. if lakeFS is deployed in front of minio, and uses
minio
as the hostname, these should resolve and be accessible by your python program as well). If you want lakeFS to proxy the data from the underlying storage back to the client, you can explicitly set
pre_sign
to False. Does that help?
btw - you can skip the extra
stat()
call in your example - the
object
used in your for-loop already has a
size_bytes
member that you can use.
y
Yes, setting
pre_sign=False
seems to have done it.
🙌 1
Now it's throwing a decode error, but that should be unrelated. Thank you.
🙏 1
o
keep in mind,
mode="r"
means decoding the read bytes into characters.. if you're reading binary data this might be the cause
y
I'm reading html files to try and pass on to unstructured for partitioning, so I probably won't need
mode="r"
in the long run, just sanity checking the procedures
o
got it.. keep in mind that for performance reasons,
fd.read
's size parameter is always in bytes, not characters. even in
"r"
mode. so reading arbitrary bytes is not guaranteed to always be decodable with variable encoding (utf-8)
gratitude thank you 1