<@U02BRC5J7U0> is there a tutorial on how to use S...
# help
e
@Iddo Avneri is there a tutorial on how to use Spark tables with LakeFS? I was wondering what's the right approach. Should one create a clone of a database but pointing it to a different branch?
i
Are you on Databricks?
o
hey @Edmondo Porcu ! mind sharing some context on the use-case?
We have our Spark integration guide which contains a high-level introduction on how to use lakeFS with Spark, you can also read Similarweb's case study. Looking forward on hearing more on you use-case.
e
This uses s3 API directly, while people often use the hive metastore and interacts with data using tables
In practice if you want to restore a previous version of a certain database, you need to change the location at which the database is pointing
o
You can use lakeFS to revert a branch to specific commits/tags and this way you won't need to change the location
You can see it in the first example here: https://docs.lakefs.io/usecases/production.html
e
@Edmondo Porcu You can use lakeFS with Hive Metastore. Instructions on how to do so are here: https://docs.lakefs.io/integrations/hive.html
🙏 1
o
hi @Edmondo Porcu I wonder, did you manage to solve your issue?
e
nope 😞
I don't understand how you switch the data underneath a table transparently, but maybe is the wrong thing to do. However that's how people on Spark are used to deal with tables
o
I'm not sure I understand, are you referring to lakeFS in general? Mind clarifying?
e
So, if you have a pipeline that works with "file API" Spark, you have a prefix that point to the current branch. You create a new branch, you change the prefix, great. But what about when you work with tables API?
o
lakeFS does support Hive integration. Have you seen the link Einat shared?
e
yes, it sounds thought the workflow would not what people expect
o
care to share how would you expect to see it? we're open source and open to changes 🙂
e
You would want to change the branch of a datasource and get all the hive tables updated, probably...