Hello team, Here is my X problem: I want to imple...
# help
m
Hello team, Here is my X problem: I want to implement staging using branches (latest, devel, release etc.) in lakeFS. Each branch contains some folders with models and a single yaml file, describing where to find each model. When new model or new version of model is being pushed, it updates (1) model file itself, (2) yaml file (with metadata - new file hash and version). For example in my release stage I have model version v0.0.1, and I want to promote my devel model version v0.0.33 to release. I can neither use purely devel version of yaml file (because it will potentially update other models metadata), nor cherry-pick latest change to yaml file from devel to release (because there were a lot of changes since when devel and release were the same). But for the model file, I can just update file pointer in release stage. So here is my Y problem: I want to simply update model file pointer in new stage, and update yaml file manually (by downloading it, changing model metadata and pushing it back in some script). The question is: Can I simply update file pointer in lakeFS? If yes - how? UPD: Because I don't want to download and push models each time on promotion - they may be large. So I just want to change file pointers.
It would be nice if there was something like
lakectl fs cp
for just changing file pointers.
a
You can use “import” functionality of lakeFS. You can upload model once to another location outside lakeFS storage location and import that to any branch you want. Imported files maintain pointers to the original file.
i
Hey @mpn mbn, couple of options I can think of: 1. lakeFS API does have a copy endpoint (and lakeFS S3 GW too). You can leverage that endpoint for copying objects between 2 lakeFS addresses. It’s not as simple as “changing pointers”, but the actual copy happens on the object store and not through the client. 2. I wonder if the issue isn’t that one uber metadata file that contains all model versions. What happens when 2 devs/processes try to update that file manually at the same time? Last write wins and you may end up losing changes. a. Is braking that file into small files, 1 per model, an option? Then every model change modify just its matching metadata file, you can safely merge, cherry-pick, etc.