Hi! We’ve encountered something weird after we sta...
# help
d
Hi! We’ve encountered something weird after we started using hadoop-lakefs 0.1.4, we’re getting the following error:
Copy code
10:17:13 Caused by: java.io.FileNotFoundException: No such file or directory: <lakefs://windward/hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30/ww-stage/merged-vesselstories/_temporary/0>
10:17:13 	at io.lakefs.LakeFSFileSystem.listStatus(LakeFSFileSystem.java:591)
writing the same data to s3 succeeds. I saw there are some changes to “LakeFSFileSystem.listStatus” in 0.1.4 vs 0.1.1 which works for us, could you think of a reason why this happens?
a
Hi Daniel,Could you post a more complete stacktrace and previous logs, please?
But at a guess: that doesn't look like a good path. Perhaps you are missing a branch name?
Copy code
<lakefs://windward/hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30/ww-stage/merged-vesselstories/_temporary/0>
would try to write to a branch named
hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30
. Does this branch exist?
If this issue persists (i.e. if the branch ``hourly-staging-vs-ongoing-merge-app-2021-12-15_10-11-30` exists at the time Spark tries to write there, I'd like to try to speed up the first few rounds of debugging it by requesting a lot more information. I hope this will be OK with you -- it will require some unused information, but I hope that by reducing the number of rounds it will reduce time-to-fix and variance of this time. If you agree, could you please send some or all of the following? 1. Versions used of Spark, Hadoop, lakeFS API client, lakeFS Spark client, and where they are running (DataBricks? AWS EMR? Self-hosted?) 2. File format that you are writing. 3. OutputCommitter used, particularly if not using the default. If this is the default
FileOutputCommitter
, the algorithm version (1 is the default, 2 is often better; config property
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
) 4. Spark configuration:
sc.getConf().getAll()
. THIS MAY CONTAIN SECRETS - be sure to edit them out before publishing! 5. Spark driver and worker logs from the failed run. 6. (EDIT: Added this!) lakeFS server version 🙂 Thanks!
d
1. we are using spark 2.4.7, hadoop 2.7.3, lakefs API Client 0.56.0, (not sure what is lakefs spark client) self hosted cluster 2. parquet 3. 2 6. 0.45 I’ll get back to you with 4,5 tomorrow, Thanks :)
a
Thanks! Looking forward to getting even more info (I know I asked for a huge dump...). In the meantime, I'm not sure I tested Spark 2.4.7 with the v2 FOC (AFAIK it is less commonly used with s3 due to some known issues). Could you try it with the v1? (If v1 does work, we'll anyway open a bug for v2...)
d
Sure I’ll make this change and let you know if it helped!
👍🏼 1
Copy code
2021-12-16 10:19:42,856 [testing.vesselstory-app] [pool-18-thread-1] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter- File Output Committer Algorithm version is 1
2021-12-16 10:19:57,084 [testing.vesselstory-app] [pool-18-thread-1] ERROR org.apache.spark.sql.execution.datasources.FileFormatWriter- Aborting job 4c7e13c2-a7f6-4432-832a-3ab6dc96e0fd.
java.io.FileNotFoundException: No such file or directory: <lakefs://windward/testing-vesselstory-app-2021-12-16_10-17-41/ww-sandbox/vesselstories/_temporary/0>
	at io.lakefs.LakeFSFileSystem.listStatus(LakeFSFileSystem.java:591)
this still happens with committer v1
a
Thanks! 😞
I'll try to look for other clues, but I think we will need to see some more logs.
d
I’ll send you the allConf soon
👍🏼 1