Hi devs,
Something I've been thinking about. We see Spark running a very large number of "statObject" calls and "listObject" - and I would like to speed those up. (This is not an immediate suggestion, or even a suggestion, I'm just trying to wrap my head around things and understand the feasible solution space!)
Now almost all of these calls seem very racy: delete a file, see if its "directory" is empty; create a file, see if we want to give it a directory marker - stuff like that. The point about these calls is that the calling code does not expect a consistent answer! Supposewe could identify on the Spark side these calls (because some calls really need consistency - it's just that others don't...). Then we could ask lakeFS for an inconsistent ("eventually" consistent) answer!
Would it be possible? How much faster would it be, say on DynamoDB and on PostgreSQL?