Hi, I am trying to read lakefs files from pyspark,...
# help
r
Hi, I am trying to read lakefs files from pyspark, can anybody guide me where I am getting wrong. Below is my code snippet.
y
Hi @Rahul Kumar! In order for Spark to be able to reach lakeFS, you need to set the s3a endpoint to the address of the lakeFS S3-compatible API. This is done in a similar way to how you set the the s3a credentials.
Copy code
spark._jsc.hadoopConfiguration().set('fs.s3a.endpoint', '<http://s3.lakefs.example.com>')
Could you tell me which version of lakeFS you're using, so I can guide you exactly which address to use there? The version can be found in the UI when you click your username at the top-right corner.
r
Hi @Yoni Augarten! I have edited my code. Now getting an error, bucket name doesn't exists.
y
Thanks for the information, @Rahul Kumar. Can you please share with me the version of lakeFS that you're using?
r
Hi @Yoni Augarten forgot to mention that. LakeFS version is 0.40.3
y
And can you tell me how you are running lakeFS (binary/docker-compose/k8s/other?) and what is the "http://lakefs" address you're using?
The newest version of lakeFS (0.48.0) simplifies addressing the s3-compatible API, so if it's possible for you, I suggest that you upgrade. Then, you can remove the
gateways.s3.domain_name
configuration from lakeFS.
r
Hi @Yoni Augarten i can't upgrade the lakeFS. I dont have permission to do that. I am using the exact address which i used to connect to lakefs ui.
y
Can you try to put this url in a browser and see what is returned?
r
Yes, lakefs ui is opening
y
Sorry, @Rahul Kumar, I missed your message. In your version of lakeFS, there should be two different subdomains configured: one for the lakeFS API and UI, and the other one for the S3 gateway. The fact that you are seeing the UI confirms that you are using the first rather than the second. Usually, if your API/UI domain is lakefs.example.com - the s3 gateway would be s3.lakefs.example.com. In the new version we made things simpler by allowing you to use the same domain for both.