https://lakefs.io/ logo
#help
Title
# help
j

James Daus

10/04/2023, 6:31 PM
Hi guys! We're importing folders containing parquet from Azure Blob (see below) using
common_prefix
, but upon import an additional "file" with the same name as every folder is created (see below). When we try to run
df = sedona.read.parquet(lakefspath)
we are met with this error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 75.0 failed 4 times, most recent failure: Lost task 0.3 in stage 75.0: org.apache.spark.SparkException: Exception thrown in awaitResult: [CANNOT_READ_FILE_FOOTER] Could not read footer for file: <lakefs://test/testrepo1/providerfeeds/places1/theme=places1>
Note: Upon deleting this extra file and all extra files within every level of nesting, we are then able to read.
a

Amit Kesarwani

10/04/2023, 7:13 PM
Are you using adls or blob subdomain to the URL when importing, as follows: https://<my-account>.adls.core.windows.net/path/to/import/
j

James Daus

10/04/2023, 8:01 PM
Using blob, like so: https://{}.blob.core.windows.net/{}/{}/{}/
a

Amit Kesarwani

10/04/2023, 8:25 PM
I think adls resolves the issue
So, try adls
j

James Daus

10/04/2023, 8:43 PM
That worked perfectly, thanks
a

Amit Kesarwani

10/04/2023, 8:52 PM
👍
n

Niro

10/05/2023, 7:46 AM
@James Daus Please note that the use of "adls" as part of the subdomain for import should be done only when providing an import source which is ADLS Gen2 storage account. This is in fact a hint provided to lakeFS which allows us to choose the correct way to list over the storage objects and is not a valid url when creating a repository or when importing from a Blob Storage account.
👍 1
j

James Daus

10/05/2023, 11:09 PM
Thanks for the replies @Amit Kesarwani and @Niro! Unfortunately, it looks like the read is still sometimes failing, especially on larger loads. Is this error related?
We also got this error on one of the rerun fails: com.databricks.sql.ui.FileReadException: Error while reading file <lakefs://c000.snappy.parquet|lakefs://<path>.c000.snappy.parquet>. java.io.EOFException
a

Amit Kesarwani

10/05/2023, 11:16 PM
@James Daus I don’t think this error is related. Can you please report this error in a new thread and with additional information? I don’t know about this error but somebody else will assist you.
5 Views