Hello, I am contributor of Openlineage and was just talking to guys present in the Airflow Summit. You're doing amazing thing, especially for the Spark integration. I was wondering if you considered using Openlineage and include some LakeFS specific information within the events (we call it facets). If you did, but then decided to do it different way, I would be happy to know your reasons. The information about dataset versioning you have could be potentially interesting for data catalogs that already integrate with Openlineage like Atlan, Metaphor, etc.
Hi Pawel, I'm sure sharing a lot of information between OpenLineage could help users in both sides. @Oz Katz and/or myself are interested in making this happen. Let me read up on OpenLineage, and then let's talk!
Nice to see you here @Pawel Leszczynski. We canโ€™t do better than @Ariel Shaqed (Scolnicov) & @Oz Katz looking into this!
Cool, thx. Openlineage may allow you to correlate dataset versions with job runs and see the impact of datasets being written by a faulty job. It can be useful to find other datasets which require being reverted.
