Hello - I'm setting up lakefs but I don't see any ...
# help
a
Hello - I'm setting up lakefs but I don't see any reference to creating the metastore table using lakectl. If you're using for the first time, how would you create the table before enabling
symlink
thanks
i
Hey @Anandkarthick Krishnakumar, welcome to lakeFS.
lakectl
provides several metastore commands to allow your metadata (managed by the metastore) to have the same ‘versions’ as your data (managed by lakeFS). To create your first table, you would have to use the metastore’s
CREATE TABLE
commands (e.g. Hive) - not a
lakectl
command. I think that lakeFS flow with metastore is summed up nicely in this doc. Let me know if your use-case isn’t covered there..
a
@Itai Admi - thanks! I guess I have a basic question.. the document doesn't say anything about where or how to run it. Is there an example? I have the ddl..
i
Do you have metastore up and running? Hive/Glue/other? How do you access it?
a
Metastore with Glue
i
I guess you have many options to create the table. UI is one option, aws cli is another one..
a
excellent - now, I have lakefs with it's own storage namespace and the document example shows the same but the table location is at s3 level and not at lakefs level - hence we are creating symlink? my question is - I should create glue table using awscli with lakefs storage namespace?
Happy to get into a call
if that helps
i
How do you plan on querying the data? Spark/Athena/other? The location of the table should point to lakeFS, e.g.
<s3a://my-repo/main/path/to/data>
. Your question depends on the application you choose to process the data. For Athena, you need symlinks. For Spark, you can simply change the
s3a.endpoint
.
a
Thanks but I think I'm still a bit confused with "path" here. I'm working with Athena at the moment but eventually to spark. LakeFS creates references to objects without touching the actual data and with Athena (even spark) - are we supposed to create data with "physical_address" (like s3://lakefs-XXX/) or with the path where our actual data resides?
i
Cool - let me try to clarify things. lakeFS has an s3 gateway that enables many apps to interact with it as if it was S3. For example, Spark lets you override the s3.endpoint (and credentials) to point wherever you like. If you point it to lakeFS you can read from it seamlessly. Unfortunately, Athena can only work with S3 directly. lakeFS stores the objects in its storage namespace inside an s3 bucket, but it uses uuid names for the objects. lakeFS metadata links the logical path to the actual location (e.g.
<lakefs://repo/branch/foo/bar>
to
<s3://storagenamespace/uuid1>
). For Athena to be able to query lakeFS data it needs to get the paths from lakeFS. That’s where the
lakectl metastore create-symlinks
command helps.
I think that lakeFS Athena docs has a walkthrough example that clarifies what the
create-symlinks
command does behind the scenes.
a
Let me try these out. meanwhile branch revert is showing this error
must specify 1-based parent number for reverting merge commit
I'm following documentation here
i
Are you trying to revert a merge commit? I think you need to provide the parent number to revert:
Copy code
-m, --parent-number int   the parent number (starting from 1) of the mainline. The revert will reverse the change relative to the specified parent.
a
of course, this is Fantastic! thanks..
👍 1