https://lakefs.io/ logo
Title
a

Anandkarthick Krishnakumar

09/06/2022, 4:44 AM
Hello - I'm setting up lakefs but I don't see any reference to creating the metastore table using lakectl. If you're using for the first time, how would you create the table before enabling
symlink
thanks
i

Itai Admi

09/06/2022, 5:08 AM
Hey @Anandkarthick Krishnakumar, welcome to lakeFS.
lakectl
provides several metastore commands to allow your metadata (managed by the metastore) to have the same ‘versions’ as your data (managed by lakeFS). To create your first table, you would have to use the metastore’s
CREATE TABLE
commands (e.g. Hive) - not a
lakectl
command. I think that lakeFS flow with metastore is summed up nicely in this doc. Let me know if your use-case isn’t covered there..
a

Anandkarthick Krishnakumar

09/06/2022, 5:40 AM
@Itai Admi - thanks! I guess I have a basic question.. the document doesn't say anything about where or how to run it. Is there an example? I have the ddl..
i

Itai Admi

09/06/2022, 5:53 AM
Do you have metastore up and running? Hive/Glue/other? How do you access it?
a

Anandkarthick Krishnakumar

09/06/2022, 5:57 AM
Metastore with Glue
i

Itai Admi

09/06/2022, 6:00 AM
I guess you have many options to create the table. UI is one option, aws cli is another one..
a

Anandkarthick Krishnakumar

09/06/2022, 6:05 AM
excellent - now, I have lakefs with it's own storage namespace and the document example shows the same but the table location is at s3 level and not at lakefs level - hence we are creating symlink? my question is - I should create glue table using awscli with lakefs storage namespace?
Happy to get into a call
if that helps
i

Itai Admi

09/06/2022, 6:24 AM
How do you plan on querying the data? Spark/Athena/other? The location of the table should point to lakeFS, e.g.
<s3a://my-repo/main/path/to/data>
. Your question depends on the application you choose to process the data. For Athena, you need symlinks. For Spark, you can simply change the
s3a.endpoint
.
a

Anandkarthick Krishnakumar

09/06/2022, 6:42 AM
Thanks but I think I'm still a bit confused with "path" here. I'm working with Athena at the moment but eventually to spark. LakeFS creates references to objects without touching the actual data and with Athena (even spark) - are we supposed to create data with "physical_address" (like s3://lakefs-XXX/) or with the path where our actual data resides?
i

Itai Admi

09/06/2022, 6:50 AM
Cool - let me try to clarify things. lakeFS has an s3 gateway that enables many apps to interact with it as if it was S3. For example, Spark lets you override the s3.endpoint (and credentials) to point wherever you like. If you point it to lakeFS you can read from it seamlessly. Unfortunately, Athena can only work with S3 directly. lakeFS stores the objects in its storage namespace inside an s3 bucket, but it uses uuid names for the objects. lakeFS metadata links the logical path to the actual location (e.g.
<lakefs://repo/branch/foo/bar>
to
<s3://storagenamespace/uuid1>
). For Athena to be able to query lakeFS data it needs to get the paths from lakeFS. That’s where the
lakectl metastore create-symlinks
command helps.
I think that lakeFS Athena docs has a walkthrough example that clarifies what the
create-symlinks
command does behind the scenes.
a

Anandkarthick Krishnakumar

09/06/2022, 7:14 AM
Let me try these out. meanwhile branch revert is showing this error
must specify 1-based parent number for reverting merge commit
I'm following documentation here
i

Itai Admi

09/06/2022, 7:17 AM
Are you trying to revert a merge commit? I think you need to provide the parent number to revert:
-m, --parent-number int   the parent number (starting from 1) of the mainline. The revert will reverse the change relative to the specified parent.
a

Anandkarthick Krishnakumar

09/06/2022, 7:19 AM
of course, this is Fantastic! thanks..
👍 1