Al
11/30/2023, 6:37 PMlakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'
spark.conf.set("fs.lakefs.access.mode", "presigned")
spark.conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
spark.conf.set("fs.lakefs.access.key", f"{lakefsAccessKey}")
spark.conf.set("fs.lakefs.secret.key", f"{lakefsSecretKey}")
spark.conf.set("fs.lakefs.endpoint", f"{lakefsEndPoint}")
spark.conf.set("spark.databricks.delta.logStore.crossCloud.fatal", "false")
installed libraries in the Databricks on Azure
io.lakefs:hadoop-lakefs-assembly:0.2.1 Maven
io.lakefs:lakefs-spark-client-312-hadoop3_2.12:0.10.0 Maven
lakefs-client PyPI
Attempt to read file:
repo_name = "test-rep-1"
sourceBranch = "main"
dataPath = 'tst_3a.csv'
dataPath = f"lakefs://{repo_name}/{sourceBranch}/{dataPath}"
print(dataPath)
print(f"Reading CSV from {dataPath}")
df = spark.read.csv(dataPath)
df.show()
causes error:
java.io.IOException: statObject
Py4JJavaError: An error occurred while calling o480.csv.
: java.io.IOException: statObject
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:764)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1777)
...
Caused by: io.lakefs.hadoop.shade.sdk.ApiException: Message: Content type "text/html; charset=utf-8" is not supported for type: class io.lakefs.hadoop.shade.sdk.model.ObjectStats
HTTP response code: 200
HTTP response body: <!DOCTYPE html>
<html lang="en">
<head>
<!-- Generated with Vite-->
...
</html>
HTTP response headers: {accept-ranges=[bytes], cache-control=[no-cache, no-store, no-transform, must-revalidate, private, max-age=0], content-length=[480], content-type=[text/html; charset=utf-8], date=[Thu, 30 Nov 2023 12:52:33 GMT], expires=[Thu, 01 Jan 1970 00:00:00 GMT], pragma=[no-cache], x-accel-expires=[0], x-frame-options=[SAMEORIGIN]}
at io.lakefs.hadoop.shade.sdk.ApiClient.deserialize(ApiClient.java:925)
at io.lakefs.hadoop.shade.sdk.ApiClient.handleResponse(ApiClient.java:1127)
and when i try to write parquet to the repositore i have the same error:
fileName = 'tst_3.csv'
dataPath = 'tst_3_df'
df = spark.read.csv(f'/{source_data}/{fileName}')
df.write.format("csv").save(f"lakefs://{repo_name}/{sourceBranch}/{dataPath}")
What can be a mistake ? And I use this instructions https: // lakefs.io/blog/databricks-lakefs-integration-tutorial/ (for Azure storage) but it doesn`t work for me.Barak Amar
lakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'
setting the endpoint to
<https://rntlj-151-236-193-133.a.free.pinggy.link/api/v1>
Al
12/01/2023, 6:44 AMBarak Amar
/api/v1
in some SDKs can be dropped or partly specified and the code does complete what's missing. But not in this case, something we need to align.