Regarding <this performance bug> in Graveler, do y...
# dev
y
Regarding this performance bug in Graveler, do you think this kind of solution makes sense? https://github.com/treeverse/lakeFS/pull/6339
i
I think so. Looking at the results at hand before reading from the store sounds very logical to me. Even in cases where you don’t find it in the previous returned results, the performance impact is negligible (nano seconds added to tens millis)
a
I agree! I'm worried about the following subtlety: AFAICT, if a range Max is not tight and is actually not equal to the last key, then IsInRange can be true and yet SeekGE will still need to go to the next range. Currently AFAIR this is impossible, but that is almost an accident of how we wrote the Graveler code. Nothing here means we need to worry too much, but we should see when this edge case occurs. To be clear: your solution is not only good but write possibly best. It just might be subtle.
y
I'm also worried about this phenomenon in Graveler, however the "ranges" in this PR are not Graveler ranges. They are simply lists of results received from the store, and we explicitly check that the sought key is between the (actual) min and max keys.
a
I understand. Still worried about this (excuse my ASCII art):
Copy code
seekGE here
                              |
                              |
       *   *    *     *       v                     *
    |-----------------------------|           |------------------|
   min                           max       next min
When I seek to this location, the iterator will need to read again. I'm sure you wrote it correctly - but it is perhaps subtle code. We can probably make it safer by adding enough docs.
y
In the code, the max key I'm using is the actual max key - and not some max key from metadata. But I will double check this.
a
Yeah! Still need to document it really carefully, because it is easy for me to change this kind of thing in one place without realizing that I could break something. As an example: say I want to guess some upper bound (maybe for some KV it is cheaper than deserializing all elements to figure out their keys). Then I need to know this. Not too likely, but I prefer to be safe.