Itai David
02/26/2022, 5:08 AMAriel Shaqed (Scolnicov)
02/27/2022, 7:37 AMGetCommit
is a batched DB operation that returns an immutable result. That is really useful in standard object access under load -- something not relevant for how this function is used. In a sequential load like FindMergeBase this just adds latency. This flow should either use an unbatched version, or all GetCommit
operations should be cached.
I wrote it up in a comment on the issue. I think we should (separately from this PR, of course!) cache all (recent-ish) commits in memory, and never access the database. That will speed up FindMergeBase
, as well as other operations that call this function (e.g. log listing variants). In the medium term, we will be able to support git-like range traversals more efficiently.
WDYT?
Long-term, if load on this function increases even further, we can spill the cache to disk in a Graveler file (LSM-style!).Tal Sofer
02/27/2022, 7:50 AMAriel Shaqed (Scolnicov)
02/27/2022, 7:51 AMTal Sofer
02/27/2022, 9:57 AMYoni Augarten
02/27/2022, 10:00 AMYoni Augarten
02/27/2022, 10:00 AMTal Sofer
02/27/2022, 10:11 AMItai David
02/28/2022, 11:04 PMFindMergeBase
, due to multiple unexpected visits (@Tal Sofer's additional check)
First test is the simplest commit tree I could come up with that satisfy the multiple visits requirement
The other one is a bit more complex example, but the idea is the same
Thanks