Hello everyone! I have 2 questions: In the Open S...
# dev
j
Hello everyone! I have 2 questions: In the Open Source questions, is there a way to manage multiple workspaces or user access to Repos/branches? Is there a configuration to tune LakeFS to be able to query delta tables or create a table using the external location (a pointer to actual delta files) as it can do it with parquets?
t
Hi @Jacobo Calderon!
In the Open Source questions, is there a way to manage multiple workspaces or user access to Repos/branches?
There is no built-in support for access control in the open source. You have the option of building and maintaining an ACL server. as for working with Delta tables over lakeFS - absolutely. You have two options: 1. Importing tables into lakeFS (without copying table data) 2. Writing delta tables directly to lakeFS You may want to check out our delta lake integration docs
j
Hi Tal! Thank you so much for your anwsers! The first one is very insightful! For the second one I have the following issue (or probably this is what you meant): I'm only able to query by partition as is. Is there a way to create a table out of all the partitions? For example, in Databricks you create an external DB as follows:
Copy code
CREATE OR REPLACE TABLE [table_name]
USING delta
LOCATION `/path/to/delta_log`
Can we do something similar to point to the parquet files and create a logical table around it? So when I query:
select * from [repo].[branch].[table]
I can actually query all partitioned tables (current and incremental)
t
IIUC you are asking if you can create an external table in the sense that the delta table data and metadata sit outside of lakeFS? can you help me understand the use case?
And you think of lakeFS as a catalog? Please correct me if I didn’t understand you correctly
j
The external table was just a Databricks Unity Catalog example. In this scenario, I want to consolidate partitioned tables into a single entry point for Downstream processes to query data in SQL syntax, or for spark to be able to load the data as table instead of filesystem. For the second question, yes, I'm trying to use LakeFS as a Catalog. Is that a wrong understandment of the platform?
t
Thanks for clarifying! > For the second question, yes, I’m trying to use LakeFS as a Catalog. Is that a wrong understandment of the platform? lakeFS isn’t a catalog, it’s a data version control system that manages any type of data (including structured data). We do have an Iceberg REST catalog, but it’s for Iceberg. For delta lake the case is different - it does not require a catalog and you can use lakeFS to manage your delta lake tables (Data + metadata), as you can see in our delta lake docs. Hope this helps making things clearer 🙂
j
Interesting... I was hoping to use it as part of the metadata catalog and access control to data. Thanks for clarifying!
lakefs 1
t
What metadata catalog are you using if I may ask?
j
Right now, Athena + Glue. Which we are trying to substitute for a robust integral solution
t
Got it, you can use lakeFS with Glue and Athena to read versioned tables managed by lakeFS. You may want to check this page out