As part of some refactoring work we are doing I had some thoughts about how we save the Actions run information
Niro
05/26/2022, 7:22 AM
Currently when actions are run they are saved in 2 places:
1. In our database
2. In a dedicated internal path on the storage
At first I thought there was some usecase in which the data should be read from the storage, but looking at the code I see that it's written to to it but never read from.
Maybe I'm missing some user usecase or something else but I'm leaning on removing the write to storage altogether.
Anybody has any thoughts on this subject?
t
Tal Sofer
05/26/2022, 7:31 AM
This is a relevant read https://github.com/treeverse/lakeFS/issues/1511 that summarizes a discussion we had in the past, but it was created before planning the current KV work you are doing. @Barak Amar@Itai Admi we are taking different approach to reduce the dependency in postgres, right? you can probably provide more details 🙂
b
Barak Amar
05/26/2022, 7:38 AM
The extra write to the underlying storage was done to describe the log actions part of the information we keep for each action.
This is helpful to identify the log files and associate them to the relevant actions/runs without going into the database.
Currently lakeFS is not processing this information directly from the storage - but like GC we can leverage this data if we like to process the objects by external processes that would not require to understand our database schema/format.
a
Ariel Shaqed (Scolnicov)
05/26/2022, 7:38 AM
Personally I (always) think everything should be stored in lakeFS. So not only should it be stored on the backing store, it should be in some repo or inside the relevant repo on some branch / prefix / both, or even just as a commit with metadata.
Our support for committed data is massively better than support for any other type of data, and we should use it. Anything else guarantees disparity in support for different kinds of metadata.
n
Niro
05/26/2022, 7:59 AM
So maybe we should keep this information only on storage? We can leverage the KV feature to migrate current solution to a possible viable solution that will depend on storage only
i
Itai Admi
05/26/2022, 8:15 AM
We’ve been looking for some time for a way to use just the block store for actions - see 1511.
Itai Admi
05/26/2022, 8:16 AM
We have an idea on how to make it work, but not sure this should be a prerequisite to kv actions migration
n
Niro
05/26/2022, 8:48 AM
Not a prerequisite for actions migration, this can be done later - but KV gives us a good opportunity to implement it