Sid Senthilnathan
04/28/2022, 1:11 PM--conf "spark.sql.catalog.feature=org.apache.iceberg.spark.SparkCatalog" \
--conf "spark.sql.catalog.feature.type=hadoop" \
--conf "spark.sql.catalog.feature.warehouse=<lakefs://origin/>" \
We expect that this will create a metastore with its root at <lakefs://origin/> and then when we create feature branches we will be able to reference the iceberg tables within them like feature.branch_name.schema_name.table_name (`feature./branch_name.schema_name.table_name` in Spark syntax). And indeed `SHOW tables in feature./sid-test.common` returns the tables we want, but attempting to query from the table `SELECT * FROM feature./sid-test.common.paige_dimension_ai_module LIMIT 10` returns a table or view not found error.
This all appears to work fine when we set the warehouse path to a location on S3. Any ideas what could be going on? cc @Oz Katz @Sander HartlageAriel Shaqed (Scolnicov)
04/28/2022, 1:25 PMSander Hartlage
04/28/2022, 1:28 PMSander Hartlage
04/28/2022, 1:29 PMAriel Shaqed (Scolnicov)
04/28/2022, 1:41 PMs3a://... URLs), or using LakeFSFileSystem (and lakefs://... URLs)?Sander Hartlage
04/28/2022, 1:47 PMs3a but are currently working to get the lakefs filesystem configuredSander Hartlage
04/28/2022, 1:47 PMs3a, credentials are in the core-site.xml)Sander Hartlage
04/28/2022, 1:47 PMspark.sql.catalog.feature: org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.feature.type: hadoop
spark.sql.catalog.feature.warehouse: "lakefs://{{ lakefs_repo_name }}/"
spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensionsSander Hartlage
04/28/2022, 1:49 PMSander Hartlage
04/28/2022, 1:49 PMspark.sql.catalog.feature: org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.feature.type: hadoop
spark.sql.catalog.feature.warehouse: "lakefs://{{ lakefs_repo_name }}/"
spark.sql.catalog.feature.io-impl: org.apache.iceberg.hadoop.HadoopFileIO
spark.sql.catalog.feature.hadoop.fs.s3a.endpoint: "<https://s3>.{{ lakefs_datalake_s3_region }}.<http://amazonaws.com/|amazonaws.com/>"
spark.sql.catalog.feature.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.InstanceProfileCredentialsProvider
spark.sql.catalog.feature.hadoop.fs.lakefs.impl: io.lakefs.LakeFSFileSystem
spark.sql.catalog.feature.hadoop.fs.lakefs.access.key: "{{ emr_lakefs_access_key }}"
spark.sql.catalog.feature.hadoop.fs.lakefs.secret.key: "{{ emr_lakefs_secret_access_key }}"
spark.sql.catalog.feature.hadoop.fs.lakefs.endpoint: "https://{{ lakefs_api_domain_name }}"
spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensionsSander Hartlage
04/28/2022, 1:50 PM--conf options when we start the process)Ariel Shaqed (Scolnicov)
04/28/2022, 2:38 PMSander Hartlage
04/28/2022, 2:39 PMfs.lakefs.impl in the core-site to use the s3a filesystemSander Hartlage
04/28/2022, 2:39 PMlakefs:// urls are essentially s3a://Ariel Shaqed (Scolnicov)
04/28/2022, 3:06 PMAriel Shaqed (Scolnicov)
04/28/2022, 3:10 PM``` spark.sql.catalog.feature: org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.feature.type: hadoop
spark.sql.catalog.feature.warehouse: "lakefs://{{ lakefs_repo_name }}/"
spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions```IIUC. From where do you get the name of the branch to use on your lakeFS repo?
Sander Hartlage
04/28/2022, 3:13 PMfeature1 branch in lakefs, we'd like to address the schemas and tables in that branch as feature.feature1.schema.table , which should correspond to directories under <lakefs://origin/feature1/>Sander Hartlage
04/28/2022, 3:13 PMmaster branch in lakefs, and the feature catalog should expose all the branchesSander Hartlage
04/28/2022, 3:14 PMSELECT * FROM schema.table should read from <lakefs://origin/master/schema/table> and SELECT * FROM feature.feature1.schema.table should read from <lakefs://origin/feature1/schema/table>Sander Hartlage
04/28/2022, 3:15 PMmaster branch is what our BI tools are pointed at, and the feature branches allow isolated development and access to historical snapshotsAriel Shaqed (Scolnicov)
04/28/2022, 3:48 PMSander Hartlage
04/28/2022, 3:52 PMSander Hartlage
04/28/2022, 3:57 PMAriel Shaqed (Scolnicov)
04/28/2022, 4:00 PMSander Hartlage
04/29/2022, 8:32 PMlakefs hadoop filesystem, which is something that's been on our backlog for a while. thanks for all your help!