Itai David
02/26/2022, 5:08 AMAriel Shaqed (Scolnicov)
02/27/2022, 7:37 AMGetCommit is a batched DB operation that returns an immutable result. That is really useful in standard object access under load -- something not relevant for how this function is used. In a sequential load like FindMergeBase this just adds latency. This flow should either use an unbatched version, or all GetCommit operations should be cached.
I wrote it up in a comment on the issue. I think we should (separately from this PR, of course!) cache all (recent-ish) commits in memory, and never access the database. That will speed up FindMergeBase, as well as other operations that call this function (e.g. log listing variants). In the medium term, we will be able to support git-like range traversals more efficiently.
WDYT?
Long-term, if load on this function increases even further, we can spill the cache to disk in a Graveler file (LSM-style!).Tal Sofer
02/27/2022, 7:50 AMAriel Shaqed (Scolnicov)
02/27/2022, 7:51 AMTal Sofer
02/27/2022, 9:57 AMYoni Augarten
02/27/2022, 10:00 AMYoni Augarten
02/27/2022, 10:00 AMTal Sofer
02/27/2022, 10:11 AMItai David
02/28/2022, 11:04 PMFindMergeBase, due to multiple unexpected visits (@Tal Sofer's additional check)
First test is the simplest commit tree I could come up with that satisfy the multiple visits requirement
The other one is a bit more complex example, but the idea is the same
Thanks