Hello all, New PR is up for review - <https://gith...
# dev
i
Hello all, New PR is up for review - https://github.com/treeverse/lakeFS/pull/2968 TL;DR - there is an efficiency issue when looking for a best common ancestor, as part of diff The search algorithm was (possibly) going over the same commits again and again, causing the process to take much longer than needed This PR diminishes this behaviour, while (hopefully) maintaining the same outcome Please, feel free to review. All comments are welcome 🙂
🙌 1
💪🏻 1
a
Neat! I think it's a very good change and will significantly (literally exponentially) speed up FindMergeAndBase for particular workloads that perform many merges. [+ @Yoni Augarten @Guy Hardonag @Tal Sofer ] While reviewing I noticed that
GetCommit
is a batched DB operation that returns an immutable result. That is really useful in standard object access under load -- something not relevant for how this function is used. In a sequential load like FindMergeBase this just adds latency. This flow should either use an unbatched version, or all
GetCommit
operations should be cached. I wrote it up in a comment on the issue. I think we should (separately from this PR, of course!) cache all (recent-ish) commits in memory, and never access the database. That will speed up
FindMergeBase
, as well as other operations that call this function (e.g. log listing variants). In the medium term, we will be able to support git-like range traversals more efficiently. WDYT? Long-term, if load on this function increases even further, we can spill the cache to disk in a Graveler file (LSM-style!).
t
@Ariel Shaqed (Scolnicov) I like your idea very much, it’s great.
a
The way it became "my" idea was I stole it from Git.
😂 1
t
@Yoni Augarten @Ariel Shaqed (Scolnicov) I adjusted our unit tests to verify that commits are not visited more than expected. Can you please review?
👀 1
y
Very very cool
Does it fail on the code from before the change?
t
I didn’t try but i’m sure it is. let me try to do verify that
i
Hi. A follow-up PR with additional tests https://github.com/treeverse/lakeFS/pull/2991 up for review These tests fail with the older version of
FindMergeBase
, due to multiple unexpected visits (@Tal Sofer's additional check) First test is the simplest commit tree I could come up with that satisfy the multiple visits requirement The other one is a bit more complex example, but the idea is the same Thanks