I imagine that to implement de-duplication, LakeFS must be adopting some form of content addressable storage, while maintaining an index that allows going from "commit + file name" to its physical location on an object storage.
There's a way, maybe, this "book index" can be exposed in ways that one can consume it from BigQuery, Athena, Presto, etc?