Hi, currently there is no clone from remote lakefs...
# help
b
Hi, currently there is no clone from remote lakefs. I assume this is the question.
m
Hi! sure. If you have full dataset instead of single file it would be much easier to work on data repository than single file.
b
So this is still a case of single lakeFS, and we would like to close a repository in the same lakefs instance, right?
m
now I'm not sure if I understand.
I think about following scenario
1. lakectl clone lakefs://remote-repo
(which makes local copy of remote-repo)
2) work on some files
3. lakectl commit
(wchich commits local changes
4. lakectl push
which sends all changes to the remote-repo
so it would be more similar to what git provides
b
I think I understand, but I'll try to clear some things first. There is no local for lakefs - every operation you perform lakectl ... will perform a request to lakefs instance
m
yes
b
so push and pull I guess will be to a remote instance of lakefs
m
yes
b
because lakefs:// schema today doesn't have real address
both will go to the same lakefs instance
as specified in the lakectl.yaml
m
yes, exactly
which is anyway somewhere e.g. in aws
so if I have let's say set of 20files to work on -- what would be the best "lakefs-flow" to use?
b
so if the second repository is managed by the same instance of lakefs
we can clone the repository lakefs already manages
which keep pointing to the same files
m
I think there is one repo; let's call it lakefs://repo1
and local copy of the repo
or...
do you suggest
b
but what do you mean in "local"
m
I should create local repo
I mean I should have repo created on my station
b
but there is nothing locally
when you create a repo everybody working on the lakeFS instance can see the repo
m
well... I work on some files on my station?
yes, but to work, change the file you need to grab the file
you are not working on the file by "remote edit"
you make pull, edit and push, don't you?
🙂
b
yes, if you work locally and not running a job that process the data on the lake
so you are suggesting having something like local stage area for lakefs
to work locally on files and push just the changes to the lakefs?
m
yes!
b
having track local changes and have them pushed
m
that's just what I know
b
I see
m
but if there is any other option tell me, please
e.g. I have a file in S3
b
think aws s3 got a sync command you can use to copy the local changes
m
as LakeFS is API
I shoudnt work on the file directly
b
lakeFS suppose S3 interface
m
but only through the API
yes, I've tested that and it works very nice!
b
for repositories with a large datasets copy/export/clone the data locally is not an option
m
then how to work on this?
b
but I understand the use-case to have the files locally, process, modify, add and push the changes in one command
lakefs like s3 - build an app that will process the data and produce data back to the object store
just this time you can commit the changes, rollback and etc.
m
aha
so
e.g. use aws to move the data and lakefs to manage metadata?
b
yes
m
got it
phew!
thanks for explanation!
b
lakefs operates as a gateway to s3 - so you can do the operations directly
it will store the data and keep track on the metadata
m
yes
b
but I get your idea for locally working on files
m
cool, I will play with this approach!
b
I guess you can open a feature request
there are couple of things to iron
m
thanks -- let me get familliar with this approach -- maybe I'll like it and give up with "clone/pull/push" 😉
b
like how to track the files locally
🙂
m
indeed
with small files in github it's wasy
but in here it might be a no - go
thanks again.
have a nice Friday and weekend!
b
true, I think it can also be develop using the current open api we have
thanks! you too.
here if you have more questions or want to discuss more ideas
👍 1