Hello guys, I found out this project on yesterday ...
# help
a
Hello guys, I found out this project on yesterday and it's amazing what you're doing here, I am really impressed. I need to start navigating the tool but I wanna make sure if it integrates with AWS glue or not, if so, could u please send me some reference of the way of doing that? I googled a bit but seems there's no much resources.
y
Hi @Adly Mousa, welcome! Yes, lakeFS can be used for tables managed by AWS glue. We also have a tool for syncing metadata in AWS glue between lakeFS branches. You can read about here: https://docs.lakefs.io/using/glue_hive_metastore.html
a
@Yoni Augarten so it can be integrated with the ETL flow in glue?
y
Could you please explain a bit what you want to achieve?
a
Let's assume I am using glue as the ETL tool, I wanna use deltalake to support timetravel capabilities in the data lake on s3, and I need some kind of version control against the data and here where I found you. I wanna be able, from glue etl job, to write to a specific branch and so on.
@Yoni Augarten
y
Thanks for the explanation! In order to interact with lakeFS you need to be able to set the S3 endpoint to the lakeFS server. This can be done in Spark as explained here. We haven't tested this yet with AWS glue ETLs. We will test this soon and get back to you.
@Adly Mousa, I've tested and successfully ran a Spark job on Glue ETL and wrote the results into lakeFS. I used the configuration method from the above link to access lakeFS (setting the hadoop
fs.s3a
configurations)
Let me know if you need anything else 🙂
a
@Yoni Augarten Thank you
👍 1