Hello. I have some problem when I try to read file...
# help
a
Hello. I have some problem when I try to read file from lakefs repository. The repository is based on Azure Storage account. preinstall config is:
Copy code
lakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'

spark.conf.set("fs.lakefs.access.mode", "presigned")
spark.conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
spark.conf.set("fs.lakefs.access.key", f"{lakefsAccessKey}")
spark.conf.set("fs.lakefs.secret.key", f"{lakefsSecretKey}")
spark.conf.set("fs.lakefs.endpoint", f"{lakefsEndPoint}")

spark.conf.set("spark.databricks.delta.logStore.crossCloud.fatal", "false")
installed libraries in the Databricks on Azure
Copy code
io.lakefs:hadoop-lakefs-assembly:0.2.1 Maven
io.lakefs:lakefs-spark-client-312-hadoop3_2.12:0.10.0 Maven
lakefs-client PyPI
Attempt to read file:
Copy code
repo_name = "test-rep-1" 
sourceBranch = "main" 

dataPath = 'tst_3a.csv'
dataPath = f"lakefs://{repo_name}/{sourceBranch}/{dataPath}"
print(dataPath)
print(f"Reading CSV from {dataPath}")
df = spark.read.csv(dataPath)
df.show()
causes error:
Copy code
java.io.IOException: statObject
Py4JJavaError: An error occurred while calling o480.csv.
: java.io.IOException: statObject
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:764)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1777)
...
Caused by: io.lakefs.hadoop.shade.sdk.ApiException: Message: Content type "text/html; charset=utf-8" is not supported for type: class io.lakefs.hadoop.shade.sdk.model.ObjectStats
HTTP response code: 200
HTTP response body: <!DOCTYPE html>
<html lang="en">
  <head>
    <!-- Generated with Vite-->
...
</html>

HTTP response headers: {accept-ranges=[bytes], cache-control=[no-cache, no-store, no-transform, must-revalidate, private, max-age=0], content-length=[480], content-type=[text/html; charset=utf-8], date=[Thu, 30 Nov 2023 12:52:33 GMT], expires=[Thu, 01 Jan 1970 00:00:00 GMT], pragma=[no-cache], x-accel-expires=[0], x-frame-options=[SAMEORIGIN]}
	at io.lakefs.hadoop.shade.sdk.ApiClient.deserialize(ApiClient.java:925)
	at io.lakefs.hadoop.shade.sdk.ApiClient.handleResponse(ApiClient.java:1127)
and when i try to write parquet to the repositore i have the same error:
Copy code
fileName = 'tst_3.csv'
dataPath = 'tst_3_df'

df = spark.read.csv(f'/{source_data}/{fileName}')
df.write.format("csv").save(f"lakefs://{repo_name}/{sourceBranch}/{dataPath}")
What can be a mistake ? And I use this instructions https: // lakefs.io/blog/databricks-lakefs-integration-tutorial/ (for Azure storage) but it doesn`t work for me.
b
from a quick look I think you need to update:
Copy code
lakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'
setting the endpoint to
Copy code
<https://rntlj-151-236-193-133.a.free.pinggy.link/api/v1>
a
yes, it works. I was confused, since I try use link https://rntlj-151-236-193-133.a.free.pinggy.link/api/v1 directly in browser and it returns: {"message":"invalid API endpoint"} and it looks like a bad link. But https://rntlj-151-236-193-133.a.free.pinggy.link/api and https://rntlj-151-236-193-133.a.free.pinggy.link work fine
b
The same listen port serves the web UI and the API. The base URL
/api/v1
in some SDKs can be dropped or partly specified and the code does complete what's missing. But not in this case, something we need to align.
👍 2