Repository within a repository, `lakectl local che...
# help
s
## Repository within a repository,
lakectl local checkout/clone
Can a lakeFS repository contain another (or multiple) lakefs repository, tracked with the
*.yaml
file generated by
lakectl local checkout/clone
? I'm looking for a way a repository can "require" other repositories, Similar to a
git submodule
. See the rough attached mermaid diagram for what I'm trying to achieve. This is because I make 3D animations. A 3d model asset often requires a model, which requires a model, which requires a model, and so on. A whole scene typically contains tens or hundreds of models like this, so a dependency tree of models grows. Why can't you use just one repository: I think it is the case that when creating a repository (A), it can contain any set of objects from your object storage (minio in my case) including all the objects from some other repository (B), but not the knowledge that the objects in B make up a repository called B. But next, if you separately update the objects in repo B (check it out > update some files > commit back to B), will A get the updated objects? If my understanding is correct, the answer is no. A would retain the versions it originally pointed to (not my desired behaviour). Leading to the original question above... It is "not my desired behaviour" because (again, if my understanding is correct) in order to update A with a new set of objects from B without knowing that those objects are a collection making up a repository called B, I would have to manually request each object be updated to a new version individually, which would mean manually keeping lists of what objects represented some "module" that I would like to depend on.
o
Hi @Sam Carter! The use case makes sense but at the moment
lakectl local
has no submodule support. May I ask that you open an issue on GitHub?
👍 1
s
Hi @Oz Katz , yes will do 👍🏻, thanks for replying.
Hi @Oz Katz, I haven't made the issue yet as I've been investigating alternative solutions to "git sub-module" behaviour for my use case, including alternative project structures that might allow another tool to manage a set of lake repositories that require other lake repositories, similar to how a package manager like npm or pip would keep a manifest file based on all the traversed dependencies. I'm curious to know your thoughts on whether that sort of approach, rather than reinventing the git wheel into LakeFS, would be of interest to the community(?) or to the contributors as as possible addition to the documentation on how to work with projects containing more than 1 lakeFS repo? I haven't finished exploring this yet, but have started looking into some possible packaging/build-tool solutions, like Gradle https://gradle.org/features/ Even a python script using the lakeFS python API might be a solution for me, and then just control that code + any metadata in git.