Hello. I have some problem when I try to read file...
# help
Hello. I have some problem when I try to read file from lakefs repository. The repository is based on Azure Storage account. preinstall config is:
Copy code
lakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'

spark.conf.set("fs.lakefs.access.mode", "presigned")
spark.conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
spark.conf.set("fs.lakefs.access.key", f"{lakefsAccessKey}")
spark.conf.set("fs.lakefs.secret.key", f"{lakefsSecretKey}")
spark.conf.set("fs.lakefs.endpoint", f"{lakefsEndPoint}")

spark.conf.set("spark.databricks.delta.logStore.crossCloud.fatal", "false")
installed libraries in the Databricks on Azure
Copy code
io.lakefs:hadoop-lakefs-assembly:0.2.1 Maven
io.lakefs:lakefs-spark-client-312-hadoop3_2.12:0.10.0 Maven
lakefs-client PyPI
Attempt to read file:
Copy code
repo_name = "test-rep-1" 
sourceBranch = "main" 

dataPath = 'tst_3a.csv'
dataPath = f"lakefs://{repo_name}/{sourceBranch}/{dataPath}"
print(f"Reading CSV from {dataPath}")
df = spark.read.csv(dataPath)
causes error:
Copy code
java.io.IOException: statObject
Py4JJavaError: An error occurred while calling o480.csv.
: java.io.IOException: statObject
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:764)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1777)
Caused by: io.lakefs.hadoop.shade.sdk.ApiException: Message: Content type "text/html; charset=utf-8" is not supported for type: class io.lakefs.hadoop.shade.sdk.model.ObjectStats
HTTP response code: 200
HTTP response body: <!DOCTYPE html>
<html lang="en">
    <!-- Generated with Vite-->

HTTP response headers: {accept-ranges=[bytes], cache-control=[no-cache, no-store, no-transform, must-revalidate, private, max-age=0], content-length=[480], content-type=[text/html; charset=utf-8], date=[Thu, 30 Nov 2023 12:52:33 GMT], expires=[Thu, 01 Jan 1970 00:00:00 GMT], pragma=[no-cache], x-accel-expires=[0], x-frame-options=[SAMEORIGIN]}
	at io.lakefs.hadoop.shade.sdk.ApiClient.deserialize(ApiClient.java:925)
	at io.lakefs.hadoop.shade.sdk.ApiClient.handleResponse(ApiClient.java:1127)
and when i try to write parquet to the repositore i have the same error:
Copy code
fileName = 'tst_3.csv'
dataPath = 'tst_3_df'

df = spark.read.csv(f'/{source_data}/{fileName}')
What can be a mistake ? And I use this instructions https: // lakefs.io/blog/databricks-lakefs-integration-tutorial/ (for Azure storage) but it doesn`t work for me.
from a quick look I think you need to update:
Copy code
lakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'
setting the endpoint to
Copy code
yes, it works. I was confused, since I try use link https://rntlj-151-236-193-133.a.free.pinggy.link/api/v1 directly in browser and it returns: {"message":"invalid API endpoint"} and it looks like a bad link. But https://rntlj-151-236-193-133.a.free.pinggy.link/api and https://rntlj-151-236-193-133.a.free.pinggy.link work fine
The same listen port serves the web UI and the API. The base URL
in some SDKs can be dropped or partly specified and the code does complete what's missing. But not in this case, something we need to align.
👍 2