Hey <@U02S257TQNS>! as <@U018PQSEPDE> mentioned, w...
# dev
o
Hey @Yusuf K! as @Itai Admi mentioned, we’re indeed talking to the Snowflake team about this use case (External table support for lakeFS). We'd love to work with you to make sure we're building something that meets your needs!
y
Yeah for sure! That's great to hear you guys are already working with them. I think the use case is that in Snowflake you can create a directory table which is an external table over an object store. The downstream value is that I can then merge that into a view or table along side structured/semi-structured data. So one demo I did for my team was I had image data and along side it I had used the location and time to retrieve environmental conditions from a micro-climate api, so then I could query unstructured image data using filters like 'where visibility < 10m, and precipitation = True' and retrieve just those images. This is mostly useful primarily for model diagnostics, because as you can imagine you can also put along side each image the error between ground truth labels and inferred labels. The shortcoming of the directory table is that it by itself doesn't have any versioning, it just statically represents what it saw at the object storage path provided. So the best of both worlds for me would be for Snowflake to be able to ingest from a lakefs path which can include a branch or even a branch + commit or branch + tag. That way lakefs does the versioning and the fact that the directory table is static doesn't matter anymore. Databricks I believe already has this integration as I can read from a lakefs path from a databricks cluster. That's why its not urgent, but I'm trying to decouple the dependency on spark that delta lake comes with (I know there are stand alone readers but those are super early stage)