Hello all. I started experimenting lakefs and have...
# help
u
Hello all. I started experimenting lakefs and have not found a definitive answer for Athena integration use case. I would like to know if it' s possible to continue using Athena as is on top of lakefs locations. ie: CREATE TABLE, ADD PARTITION. Based on the documentation here, the first paragraph sounds like it's not possible. Yet it then goes on to describe how to update the metastore with the assumption that there is an existing table pointing to a lakefs location. How does one create the table in the first place? Any success stories with Athena? Any gotchas? Our shop uses Athena heavily but open to run our own PrestoDB clusters if it's justified.
u
Hi Alex! I agree that the documentation is somewhat confusing, we’ll make sure to update it soon. If you created a table using glue (which can point to lakeFS as the underlying object-store), you can use the
create-symlink
command and create symlinks in s3 along with updating
glue
to point to those symlinks. Then you can query the table with
Athena
.
u
Thanks @Itai Admi. I am still unclear how the table is created initially. Can it be done in Athena? Specifically, Can I do this: (I guess I can try when I get back to my office) 1. Create table in Athena with location s3://my-lakefs-repo/main/mytable/ 2. create sym-link using lakectl metastore cmd
u
create-symlink
command creates the symlink for tables already stored in lakeFS. Since tables created in Athena cannot be stored in lakeFS (due to Athena inflexibility with setting the s3 endpoint), you can't perform the steps you suggested. The command is intended to be used for tables that were created by either glue or hive and then queried by Athena. Let us discuss internally for a solution that will enable something similar for Athena created tables and get back to you.
u
Sounds good. Thanks