Hi We ve encountered something weird after we started using lakeFS #help

Hi! We’ve encountered something weird after we sta...

Daniel Satubi

12/15/2021, 10:39 AM

Hi! We’ve encountered something weird after we started using hadoop-lakefs 0.1.4, we’re getting the following error:

Copy code

10:17:13 Caused by: java.io.FileNotFoundException: No such file or directory: <lakefs://windward/hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30/ww-stage/merged-vesselstories/_temporary/0>
10:17:13 	at io.lakefs.LakeFSFileSystem.listStatus(LakeFSFileSystem.java:591)

writing the same data to s3 succeeds. I saw there are some changes to “LakeFSFileSystem.listStatus” in 0.1.4 vs 0.1.1 which works for us, could you think of a reason why this happens?

Ariel Shaqed (Scolnicov)

12/15/2021, 10:43 AM

Hi Daniel,Could you post a more complete stacktrace and previous logs, please?

Ariel Shaqed (Scolnicov)

12/15/2021, 10:45 AM

But at a guess: that doesn't look like a good path. Perhaps you are missing a branch name?

Ariel Shaqed (Scolnicov)

12/15/2021, 10:47 AM

Copy code

<lakefs://windward/hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30/ww-stage/merged-vesselstories/_temporary/0>

would try to write to a branch named

hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30

. Does this branch exist?

Ariel Shaqed (Scolnicov)

12/15/2021, 11:46 AM

If this issue persists (i.e. if the branch ``hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30` exists at the time Spark tries to write there, I'd like to try to speed up the first few rounds of debugging it by requesting a lot more information. I hope this will be OK with you -- it will require some unused information, but I hope that by reducing the number of rounds it will reduce time-to-fix and variance of this time. If you agree, could you please send some or all of the following? 1. Versions used of Spark, Hadoop, lakeFS API client, lakeFS Spark client, and where they are running (DataBricks? AWS EMR? Self-hosted?) 2. File format that you are writing. 3. OutputCommitter used, particularly if not using the default. If this is the default

FileOutputCommitter

, the algorithm version (1 is the default, 2 is often better; config property spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
) 4. Spark configuration:

sc.getConf().getAll()

. THIS MAY CONTAIN SECRETS - be sure to edit them out before publishing! 5. Spark driver and worker logs from the failed run. 6. (EDIT: Added this!) lakeFS server version 🙂 Thanks!

Daniel Satubi

12/15/2021, 5:35 PM

1. we are using spark 2.4.7, hadoop 2.7.3, lakefs API Client 0.56.0, (not sure what is lakefs spark client) self hosted cluster 2. parquet 3. 2 6. 0.45 I’ll get back to you with 4,5 tomorrow, Thanks :)

Ariel Shaqed (Scolnicov)

12/15/2021, 5:53 PM

Thanks! Looking forward to getting even more info (I know I asked for a huge dump...). In the meantime, I'm not sure I tested Spark 2.4.7 with the v2 FOC (AFAIK it is less commonly used with s3 due to some known issues). Could you try it with the v1? (If v1 does work, we'll anyway open a bug for v2...)

Daniel Satubi

12/16/2021, 8:35 AM

Sure I’ll make this change and let you know if it helped!

👍🏼 1

Daniel Satubi

12/16/2021, 10:26 AM

Copy code

2021-12-16 10:19:42,856 [testing.vesselstory-app] [pool-18-thread-1] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter- File Output Committer Algorithm version is 1
2021-12-16 10:19:57,084 [testing.vesselstory-app] [pool-18-thread-1] ERROR org.apache.spark.sql.execution.datasources.FileFormatWriter- Aborting job 4c7e13c2-a7f6-4432-832a-3ab6dc96e0fd.
java.io.FileNotFoundException: No such file or directory: <lakefs://windward/testing-vesselstory-app-2021-12-16_10-17-41/ww-sandbox/vesselstories/_temporary/0>
	at io.lakefs.LakeFSFileSystem.listStatus(LakeFSFileSystem.java:591)

Daniel Satubi

12/16/2021, 10:27 AM

this still happens with committer v1

Ariel Shaqed (Scolnicov)

12/16/2021, 10:29 AM

Thanks! 😞

Ariel Shaqed (Scolnicov)

12/16/2021, 10:29 AM

I'll try to look for other clues, but I think we will need to see some more logs.

Daniel Satubi

12/16/2021, 10:52 AM

I’ll send you the allConf soon

👍🏼 1

7 Views

Open in Slack

Previous Next