hi, i am working with a lakeFS python client and t...
# help
a
hi, i am working with a lakeFS python client and trying to read data from lakefs branch. lakefs client api works for creating and listing branchs. yet something with the configurations for reading from a branch with Spark is probably missing. File path to load datafram from:
Copy code
main_repo_path = "<lakefs://adi-test/main/>"
This is the exception i get:
Copy code
y4JJavaError: An error occurred while calling o625.load.
: java.io.IOException: statObject
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:731)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:43)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1777)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:59)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:407)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:369)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:325)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:238)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)
Caused by: io.lakefs.shaded.api.ApiException: Content type "text/html; charset=utf-8" is not supported for type: class io.lakefs.shaded.api.model.ObjectStats
	at io.lakefs.shaded.api.ApiClient.deserialize(ApiClient.java:822)
	at io.lakefs.shaded.api.ApiClient.handleResponse(ApiClient.java:1020)
	at io.lakefs.shaded.api.ApiClient.execute(ApiClient.java:944)
	at io.lakefs.shaded.api.ObjectsApi.statObjectWithHttpInfo(ObjectsApi.java:1115)
	at io.lakefs.shaded.api.ObjectsApi.statObject(ObjectsApi.java:1089)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:727)
I configured Spark cluster with the same credentials and endpoint as the programmable lakeFS client. any idea what is missing?
b
From the error it looks like it is not related to the python / client API - but how you configured to work with "lakefs://" scheme. Check your configuration related to lakeFS's hadoopfs.
a
thanks. I am validating it now!
ok. added s3 credentials and now it works. thanks @Barak Amar!