Anybody know how to make polars talk to lakefs ? I...
# help
h
Anybody know how to make polars talk to lakefs ? I tried:
Copy code
optPl={"aws_access_key_id": conf["access_key_id"],
     "aws_secret_access_key":conf["secret_access_key"],
      'aws_region':"us-east-1",
     'endpoint_url':conf['endpoint']}
pl.concat([pl.scan_parquet(f"s3://{file}",storage_options=optPl) for file in paths])
But got :
ComputeError: Generic S3 error: response error "No Body", after 0 retries: HTTP status client error (403 Forbidden) for url (<https://LAKEFS-SELF-HOSTED-SERVER/repo/main/><redacted_path>/annotation.parquet)
I checked that the path to the file is correct ...
i
Hey @HT, not a polars user myself, but I’ll try to help. I found this blogpost, which they use polars with MinIO. They created a boto client using the MinIO creds and endpoint - you can configure it to use lakeFS in the same way. They then processed the data with the polars library and upload it with the boto client. Will the same setup with lakeFS creds and endpoint work for you?
h
Ah yes, I did come across that post. The thing is that Polar latest version support S3 natively and allow fancy stuff like: read multi parquet in parallel and using glob pattern, or multi parquet file from given list. All that in parallel by polars I got the feeling that their current implementation is expecting the AWS S3 url format ... or I am just missing a flag somewhere ... https://github.com/pola-rs/polars/pull/11210
I like polars as it have all Spark like feature without being the dawnting Spark ....
i
What does “support S3 natively” mean? I couldn’t find it in their docs. If I could understand how this configuration work we could find how to make it happen with lakeFS as well
h
Historically, <= 0.18.x, polar python use fsspec as interface to all cloud access, azure, s3, .... Now they have their own "native" implementation of Azure, AWS and Google, in Rust I believe ?
And I think this "native" code may have a bunch of assumption about how S3 == AWS ...
I am asking on their Discord a way to get it in verbose mode to see what it is actually trying to do when talking S3 API ... Rust code is way beyond my understanding ... a bit like Go 😛
😅 1