Hi has anyone experienced this type of error during a commit lakeFS #help

Hi, has anyone experienced this type of error duri...

Michael Gaebel

10/11/2023, 1:59 PM

Hi, has anyone experienced this type of error during a commit? This is using routerfs and the lakefs-icerberg lib to create a table. It appears as though the metadata file is lost, or otherwise referenced incorrectly while attempting to rename it to

v1.metadata.json

org.apache.iceberg.exceptions.CommitFailedException: Failed to commit changes using rename: <s3a://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/v1.metadata.json>

(more stacktrace in reply)

Michael Gaebel

10/11/2023, 1:59 PM

Copy code

org.apache.iceberg.exceptions.CommitFailedException: Failed to commit changes using rename: <s3a://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/v1.metadata.json>
	at org.apache.iceberg.hadoop.HadoopTableOperations.renameToFinal(HadoopTableOperations.java:378) ~[iceberg-spark-runtime-3.3_2.12-1.3.1.jar:?]
	at org.apache.iceberg.hadoop.HadoopTableOperations.commit(HadoopTableOperations.java:162) ~[iceberg-spark-runtime-3.3_2.12-1.3.1.jar:?]
	at io.lakefs.iceberg.LakeFSTableOperations.commit(LakeFSTableOperations.java:37) ~[lakefs-iceberg-0.1.3.jar:0.1.3]
	at org.apache.iceberg.BaseTransaction.commitCreateTransaction(BaseTransaction.java:311) ~[iceberg-spark-runtime-3.3_2.12-1.3.1.jar:?]
	at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:290) ~[iceberg-spark-runtime-3.3_2.12-1.3.1.jar:?]
	at org.apache.iceberg.spark.source.StagedSparkTable.commitStagedChanges(StagedSparkTable.java:34) ~[iceberg-spark-runtime-3.3_2.12-1.3.1.jar:?]
.
.
.
Caused by: java.io.FileNotFoundException: No such file or directory: <lakefs://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/25193118-7546-49de-b229-ef0f039bc2d9.metadata.json>
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3866) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initiateRename(S3AFileSystem.java:1887) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1988) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$rename$7(S3AFileSystem.java:1846) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499) ~[hadoop-client-api-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444) ~[hadoop-client-api-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:1844) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at io.lakefs.routerfs.RouterFileSystem.rename(RouterFileSystem.java:197) ~[hadoop-router-fs-hadoop-2.9.2-assembly-0.1.0.jar:?]
	at org.apache.iceberg.hadoop.HadoopTableOperations.renameToFinal(HadoopTableOperations.java:368) ~[iceberg-spark-runtime-3.3_2.12-1.3.1.jar:?]

Isan Rivkin

10/11/2023, 2:24 PM

Hey @Michael Gaebel can you share your Spark config and a minimal code snippet that causing this error?

Michael Gaebel

10/11/2023, 2:25 PM

Copy code

#General Spark configs
    ("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"),
    ("spark.sql.sources.partitionOverwriteMode", "dynamic"),
    ("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED"),
    #LakeFS configuration for Iceberg
    ("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.1,io.lakefs:lakefs-iceberg:v0.1.3,io.lakefs:hadoop-router-fs-hadoop-2.9.2-assembly:0.1.0"),
    ("spark.sql.catalog.lakefs", "org.apache.iceberg.spark.SparkCatalog"),
    ("spark.sql.catalog.lakefs.catalog-impl", "io.lakefs.iceberg.LakeFSCatalog"),
    ("spark.sql.catalog.lakefs.warehouse", f"lakefs://{lakefs_repo}"),
    ("spark.sql.catalog.lakefs.uri", lakefs_endpoint),
    ("spark.sql.catalog.lakefs.cache-enabled", "false"),
    #LakeFs filesystem
    ("spark.hadoop.fs.s3a.impl", "io.lakefs.routerfs.RouterFileSystem"),
    ("spark.hadoop.routerfs.mapping.s3a.1.replace", f"s3a://{lakefs_repo}"),
    ("spark.hadoop.routerfs.mapping.s3a.1.with", f"lakefs://{lakefs_repo}"),
    ("spark.hadoop.routerfs.default.fs.s3a", "org.apache.hadoop.fs.s3a.S3AFileSystem"),

    ("spark.hadoop.fs.lakefs.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"),

    #LakeFS S3 access
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.endpoint", f"{lakefs_endpoint}"),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.access.key", lakefs_access_key),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.secret.key", lakefs_secret_key),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.path.style.access", "true"),

    #Regular S3 access
    ("spark.hadoop.fs.s3a.endpoint.region", "ca-central-1"),
    ("spark.hadoop.fs.s3a.endpoint", "<https://s3.ca-central-1.amazonaws.com>"),
    ("spark.hadoop.fs.s3a.path.style.access", "true"),

    #Configs needed for Iceberg
    ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"),

Michael Gaebel

10/11/2023, 2:26 PM

The commit is here:

Copy code

lakefs.commits.commit(repo.id, lakefs_branch, CommitCreation(
        message=f"Initial load table 'lakefs.{lakefs_branch}.{target_database}.{table}' for schemas '{schemaList}'",
        metadata={'author': "glue"}
    ))

where

lakefs

is the configured client and the repo is fetched from that client

Copy code

lakefs = LakeFSClient(lakefs_config)
repo = lakefs.repositories.get_repository(lakefs_repo)

Isan Rivkin

10/11/2023, 2:34 PM

Thanks, I see that you’re running lakeFS api call (

lakefs.commits.commit

) but the stack trace is Java / Spark / Iceberg - What’s the connection between them?

Michael Gaebel

10/11/2023, 2:37 PM

The command is executed in a Pyspark job running in AWS Glue. I truncated the stacktrace for readability, but if you want I can post the whole thing

Michael Gaebel

10/11/2023, 2:40 PM

oh, wait nevermind, it's occurring during the SQL that's creating the table, prior to the lakefs commit.

Isan Rivkin

10/11/2023, 2:56 PM

So I think that this won’t work

("spark.hadoop.fs.s3a.impl", "io.lakefs.routerfs.RouterFileSystem"),

You should try

("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

Check this doc for more info

Michael Gaebel

10/11/2023, 3:04 PM

So, I need the router filesystem to be able to access regular S3 paths outside of lakefs. This was suggested previously on this channel here: https://lakefs.slack.com/archives/C016726JLJW/p1695999430843489?thread_ts=1695995997.176189&cid=C016726JLJW

Michael Gaebel

10/11/2023, 3:07 PM

based on the stacktrace, it appears to be correctly calling the

S3AFileSystem

after routing...

Copy code

at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:1844) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
	at io.lakefs.routerfs.RouterFileSystem.rename(RouterFileSystem.java:197) ~[hadoop-router-fs-hadoop-2.9.2-assembly-0.1.0.jar:?]
	at

Isan Rivkin

10/11/2023, 3:27 PM

Hmm Im not quite sure that RouterFS and Iceberg integration works together. Let me verify that and I’ll get back to you by tomorrow

Michael Gaebel

10/11/2023, 3:30 PM

Thanks! I'll also try to poke around more

Isan Rivkin

10/11/2023, 3:30 PM

YW 🙏

Jonathan Rosenberg

10/11/2023, 3:40 PM

Hi @Michael Gaebel I think that the RouterFS usage is redundant in your case. > So, I need the router filesystem to be able to access regular S3 paths outside of lakefs. This is actually done using your other configuration:

Copy code

#LakeFS S3 access
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.endpoint", f"{lakefs_endpoint}"),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.access.key", lakefs_access_key),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.secret.key", lakefs_secret_key),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.path.style.access", "true"),

The renaming of the scheme from

s3a

lakefs

is what’s causing the problem:

Copy code

Caused by: java.io.FileNotFoundException: No such file or directory: <lakefs://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/25193118-7546-49de-b229-ef0f039bc2d9.metadata.json>

Spark doesn’t know what to do with it (because no Filesystem was configured to handle it in the lakeFS catalog’s context). This is fine. Can you please change your configurations to:

Copy code

#General Spark configs
    ("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"),
    ("spark.sql.sources.partitionOverwriteMode", "dynamic"),
    ("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED"),
    #LakeFS configuration for Iceberg
    ("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.1,io.lakefs:lakefs-iceberg:v0.1.3"),
    ("spark.sql.catalog.lakefs", "org.apache.iceberg.spark.SparkCatalog"),
    ("spark.sql.catalog.lakefs.catalog-impl", "io.lakefs.iceberg.LakeFSCatalog"),
    ("spark.sql.catalog.lakefs.warehouse", f"lakefs://{lakefs_repo}"),
    ("spark.sql.catalog.lakefs.uri", lakefs_endpoint),
    ("spark.sql.catalog.lakefs.cache-enabled", "false"),

    #LakeFS S3 access
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.endpoint", f"{lakefs_endpoint}"),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.access.key", lakefs_access_key),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.secret.key", lakefs_secret_key),
    (f"spark.hadoop.fs.s3a.bucket.{lakefs_repo}.path.style.access", "true"),

    #Regular S3 access
    ("spark.hadoop.fs.s3a.endpoint.region", "ca-central-1"),
    ("spark.hadoop.fs.s3a.endpoint", "<https://s3.ca-central-1.amazonaws.com>"),
    ("spark.hadoop.fs.s3a.path.style.access", "true"),

    #Configs needed for Iceberg
    ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"),

and test again?

👀 1

Michael Gaebel

10/11/2023, 3:48 PM

That worked! I've got a different error, but it's likely on my side. Thanks! I assumed to be able to use the other configuration I would need to route it, but what you said makes sense. Thanks again.

lakefs 1

Jonathan Rosenberg

10/11/2023, 3:49 PM

Glad to hear that. If you have any other issue, do share

gratitude thank you 2

jumping lakefs 1

75 Views

Open in Slack

Previous Next