Vasyl Klindukhov
04/11/2024, 10:34 AMio.lakefs:api-client:0.57.2
& io.lakefs:lakefs-assembly:0.1.6
. Overall parquet data in lakefs branch is more than 1.5TiB: read, compute and write back to the lakefs the result of computation. Works fine with spark cluster about 35 executors and 900GB
But have the issue when working with this full data after cluster upgrade. Some part of data may be processed (50-70%) but full data processing is aborted.
New cluster info: openjdk-17 Amazon Corretto, Spark 3.5 with the Magic committer, Scala 2.12, Hadoop 3.4, AWS 1.12, io.lakefs:api-client:1.15.0
& io.lakefs:lakefs-assembly:0.2.3
The issue is related to the renaming (copy and delete), socket is closed:
Caused by: java.io.IOException: renameObject: src:<lakefs://some/path/to/temp/_temporary/0/_temporary/attempt_some/path/to/file.parquet>, dst: <lakefs://some/path/to/final/dst/file.parquet>, call to copyObject failed
...
Caused by: io.lakefs.hadoop.shade.sdk.ApiException: Message: java.net.SocketTimeoutException: timeout
...
Caused by: java.net.SocketTimeoutException: timeout
...
Caused by: java.net.SocketException: Socket closed
Ariel Shaqed (Scolnicov)
04/11/2024, 10:50 AM