hi, can someone confirm if lakefs works with icebe...
# help
u
hi, can someone confirm if lakefs works with iceberg tables in hive as well? i came across this: https://lakefs.io/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/ which suggests that it should work seamlessly, but currently we are having a hard time getting hive + iceberg + lakefs to work with external tables.
u
Copy code
create table my_schema.z_verun_test
using ICEBERG
location "<lakefs://origin/>..."
doesn’t error out but when issuing a
SELECT…
against the table, nothing is returned. we’ve already double checked our spark configuration and it seems to be ok, and the above works with ORC tables
u
fyi we are using spark sql for the above against a hive metastore, with data stored in lakefs (s3 is the backing store)
u
spark version is 3.1.1 if that matters
u
Hi, @Verun Rahimtoola. I am looking into this, and will update shortly
u
@Verun Rahimtoola I assume you followed this documentation in order to configure lakeFS with hive?
u
yeah we already have lakefs working with hive for ORC files
u
no issues there
u
recently we switched to iceberg and have started to see these issues
u
Did you try using a regular s3 bucket to create the table, maybe try swapping the location for a bucket and see if it is succefull
u
FWIW, Just wanted to make sure we communicate as Shimi is taking a look that lakeFS is format agnostic and should work with any open table format. 🙂
u
hmm I suppose I could try to go against the actual s3 location itself… will report back on what I find later today
u
Also, note that in the doc I linked above, the example uses
Copy code
'<s3a://example/main/request_logs>'
as the location, I am not sure it will work
<lakefs://repo>
style location as this is mostly used by the CLI.
u
but it has worked before for ORC
u
we have already been using that “lakefs://origin/…” location URI for our hive tables in ORC format without issues, so we were hoping using it for iceberg tables (in parquet format) would also just work
u
I understand, I would first thing try it against a regular bucket to verify the table creation is successful
u
ok, will do - be back shortly…
u
Hi @Verun Rahimtoola ! How is it going? Wanted to ask if you configured Spark to use lakeFS’ Hadoop file system? I'm asking because you mentioned that you are already using Uris of the form “lakefs://..”. Also, when you try to create the iceberg table, do you see any errors on the lakefs logs? We will be happy to assist with troubleshooting.
u
hi - sorry for the delay - but i think our data team has resolved these issues. thanks for your help!
u
Hi @Verun Rahimtoola! So happy to hear this was resolved! Would you mind sharing what the issue was and how it was resolved?
u
our data team had to set up the hadoop catalog properly and configure spark accordingly and then it started to work - it had nothing to do with lakefs as such
u
basically just some configuration tweaks for spark
u
(we use spark sql)