<@U017Y8JGFB7> after we talked yesterday, I came u...
# dev
a
@Guy Hardonag after we talked yesterday, I came up with this neat trick for transforming any iterator into a prefetching iterator! The basic trick is for the wrapping iterator to start a goroutine that reads values from the base iterator and writes them to a channel of size 10K (for instance). Now to advance and read a value, the wrapping iterator simply reads from the channel. If goroutine gets too far ahead, it fills up the output channel... and blocks. So it cannot move too far, but if it goes to read file or network it still lets the wrapping iterator continue for a bit. How to handle
NextRange
(and even
SeekGE
)? Basically, have another channel from the wrapping iterator to the base iterator. Every time the wrapping iterator needs to change location, it drops the old channel (@Barak Amar do we need to drain a channel, or can it just be garbage-collected while full?), creates a new channel in its stead, and sends a command
<"NextRange", ptrToNewChannel>
to the base iterator. The base iterator now
select
s on being able to write on its output or read from its input; if it gets a command on the input channel it can implement it. Not perfect, but a cheap way to prefetch.
👀 1
Now the prefetcher thread can go faster when it likes. E.g., when it sees a new range header, it can prefetch the next range by running NextRange, Next on yet another goroutine.
b
using the channel as a read a head buffer 😎 don't need to drain the channel, just make sure that what you keep there doesn't have a reference to a resource. don't think it has in your case. gc will do the work later.
Don't have the talk context, but as far as I understand the current method to speedup the processing is to divide the work to background workers, and here we try to speedup each iterator, because we assume that time went on read/pull ranges?
it will not hurt performance in the case we skip ranges?
a
It depends. We may have a problem inside the Pebble sstable implementation: readBlock prefetches from disk but doesn't decompress. A prefetching iterator will decompress in parallel and help with that. (Long term we would need to fix that in the upstream lib, of course.) We always lose some work on NextRange, but I hope to gain it back and now on decompressing internal blocks in the file. There are more blocks than files in the RocksDB format, because every file has multiple blocks. And if NextRange turns out to be a second-level issue, we can avoid iterating into a new range (never go into new ranges unless requested).
b
Guess we need to measure and see if the extra work we put on the system will benefit the performance we like to see. Even if it will not be helpful in this use case I'm sure it will be very helpful for use cases where we need to process the complete commit information, like previous export or symlinks.