Michael Gaebel
10/05/2023, 2:17 PMblockstore.s3.region
and gateways.s3.region
as ca-central-1
, and the underlying bucket is in the same region, but I'm getting this error when it tries to open a file from the translated path. org.apache.hadoop.fs.s3a.AWSRedirectException: getFileStatus on <lakefs://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/version-hint.text>: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-east-1. Please use this region to retry the request
. Have I perhaps missed a region configuration for the repo on creation? Or is this maybe related to the use of RouterFS. More of the stack in a reply.at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:242) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3796) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at io.lakefs.routerfs.RouterFileSystem.open(RouterFileSystem.java:125) ~[hadoop-router-fs-0.1.0.jar:?]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:983) ~[hadoop-client-api-3.3.3-amzn-0.jar:?]
at org.apache.iceberg.hadoop.HadoopTableOperations.findVersion(HadoopTableOperations.java:318) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:104) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:43) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:587) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:142) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:99) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:156) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
2023-10-04 20:58:41,059 INFO [Thread-10] routerfs.RouterFileSystem (RouterFileSystem.java:logTranslatedPaths(348)): getFileStatus: path <s3a://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata> converted to <lakefs://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata>
Elad Lachmi
10/05/2023, 2:34 PMfs.s3a.*
configuration looks like?Michael Gaebel
10/05/2023, 2:44 PM("spark.hadoop.fs.s3a.impl", "io.lakefs.routerfs.RouterFileSystem"),
#LakeFS S3 access
("spark.hadoop.fs.s3a.bucket.repo.endpoint", "<https://s3.ca-central-1.amazonaws.com>"),
("spark.hadoop.fs.s3a.bucket.repo.endpoint.region", "ca-central-1"),
("spark.hadoop.fs.s3a.bucket.repo.access.key", lakefs_access_key),
("spark.hadoop.fs.s3a.bucket.repo.secret.key", lakefs_secret_key),
#Regular S3 access
("spark.hadoop.fs.s3a.endpoint.region", "ca-central-1"),
("spark.hadoop.fs.s3a.endpoint", "<https://s3.ca-central-1.amazonaws.com>"),
and the mapping for RouterFS
("spark.hadoop.routerfs.mapping.s3a.1.replace", f"s3a://{lakefs_repo}"),
("spark.hadoop.routerfs.mapping.s3a.1.with", f"lakefs://{lakefs_repo}"),
("spark.hadoop.routerfs.default.fs.s3a", "org.apache.hadoop.fs.s3a.S3AFileSystem"),
Elad Lachmi
10/05/2023, 2:46 PMI believe this endpoint needs to point to your lakeFS serverCopy code("spark.hadoop.fs.s3a.bucket.repo.endpoint", "<https://s3.ca-central-1.amazonaws.com>"),
Michael Gaebel
10/05/2023, 2:46 PM2023-10-05 15:01:44,155 WARN [Thread-10] hadoop.HadoopTableOperations (HadoopTableOperations.java:findVersion(347)): Error trying to recover version-hint.txt data for <s3a://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/version-hint.text>
org.apache.hadoop.fs.s3a.AWSRedirectException: getFileStatus on <lakefs://lakefs-poc/main/rl_dev_datastage_01_ma_snapshot/sys_audit_event/metadata/version-hint.text>: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-east-1. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 95WM8SYPVNYEC2RK; S3 Extended Request ID: V1NPPfA5JD5ZbhdX6nBF4+OPSq4plxV6ep1uXbSHpwu53rleiIKZIeA/zG+ka4rv31Aj6ExbpMM=; Proxy: null), S3 Extended Request ID: V1NPPfA5JD5ZbhdX6nBF4+OPSq4plxV6ep1uXbSHpwu53rleiIKZIeA/zG+ka4rv31Aj6ExbpMM=:301 Moved Permanently: The bucket is in this region: us-east-1. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 95WM8SYPVNYEC2RK; S3 Extended Request ID: V1NPPfA5JD5ZbhdX6nBF4+OPSq4plxV6ep1uXbSHpwu53rleiIKZIeA/zG+ka4rv31Aj6ExbpMM=; Proxy: null)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:242) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3796) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441) ~[hadoop-aws-3.3.3-amzn-0.jar:?]
at io.lakefs.routerfs.RouterFileSystem.open(RouterFileSystem.java:125) ~[hadoop-router-fs-0.1.0.jar:?]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:983) ~[hadoop-client-api-3.3.3-amzn-0.jar:?]
at org.apache.iceberg.hadoop.HadoopTableOperations.findVersion(HadoopTableOperations.java:318) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:104) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:43) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:587) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:142) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:99) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?]
at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:156) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
I double checked and ran it a couple times to be sureElad Lachmi
10/05/2023, 3:24 PM("spark.hadoop.fs.s3a.bucket.repo.endpoint.region", "ca-central-1"),
Can you please remove this and try again?
Also, can you please show the full command you're running?Michael Gaebel
10/05/2023, 3:38 PMorg.apache.iceberg.exceptions.RuntimeIOException: Failed to refresh the table
at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:126)
at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84)
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:43)
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:587)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:142)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:99)
at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:156)
and what we're executing is lake_spark.read.table(db_tbl_name)
where lake_spark
is the spark session.#LakeFS S3 access
("spark.hadoop.fs.s3a.bucket.repo.endpoint", lakefs_endpoint),
("spark.hadoop.fs.s3a.bucket.repo.access.key", lakefs_access_key),
("spark.hadoop.fs.s3a.bucket.repo.secret.key", lakefs_secret_key),
Elad Lachmi
10/05/2023, 3:40 PMMichael Gaebel
10/05/2023, 3:42 PMElad Lachmi
10/05/2023, 3:43 PMMichael Gaebel
10/06/2023, 4:56 PMca-central-1
. Any luck on your side?Elad Lachmi
10/06/2023, 5:00 PMMichael Gaebel
10/06/2023, 5:17 PMio.lakefs.iceberg.LakeFSCatalog
, but I'm not sure if it's meant to be with the configs like this:
("spark.sql.catalog.lakefs", "org.apache.iceberg.spark.SparkCatalog"),
("spark.sql.catalog.lakefs.catalog-impl", "io.lakefs.iceberg.LakeFSCatalog"),
Elad Lachmi
10/06/2023, 5:19 PMMichael Gaebel
10/06/2023, 5:23 PMElad Lachmi
10/06/2023, 5:25 PM