Hi currently there is no clone from remote lakefs I assume t lakeFS #help

Join Slack

Hi, currently there is no clone from remote lakefs...

# help

Barak Amar

02/12/2021, 2:40 PM

Hi, currently there is no clone from remote lakefs. I assume this is the question.

mario

02/12/2021, 3:57 PM

Hi! sure. If you have full dataset instead of single file it would be much easier to work on data repository than single file.

Barak Amar

02/12/2021, 3:59 PM

So this is still a case of single lakeFS, and we would like to close a repository in the same lakefs instance, right?

mario

02/12/2021, 4:12 PM

now I'm not sure if I understand.

mario

02/12/2021, 4:12 PM

I think about following scenario

mario

02/12/2021, 4:12 PM

1. lakectl clone lakefs://remote-repo

mario

02/12/2021, 4:13 PM

(which makes local copy of remote-repo)

mario

02/12/2021, 4:13 PM

2) work on some files

mario

02/12/2021, 4:13 PM

3. lakectl commit

mario

02/12/2021, 4:13 PM

(wchich commits local changes

mario

02/12/2021, 4:13 PM

4. lakectl push

mario

02/12/2021, 4:13 PM

which sends all changes to the remote-repo

mario

02/12/2021, 4:14 PM

so it would be more similar to what git provides

Barak Amar

02/12/2021, 4:15 PM

I think I understand, but I'll try to clear some things first. There is no local for lakefs - every operation you perform lakectl ... will perform a request to lakefs instance

mario

02/12/2021, 4:15 PM

yes

Barak Amar

02/12/2021, 4:16 PM

so push and pull I guess will be to a remote instance of lakefs

mario

02/12/2021, 4:16 PM

yes

Barak Amar

02/12/2021, 4:16 PM

because lakefs:// schema today doesn't have real address

Barak Amar

02/12/2021, 4:16 PM

the schema lakefs://repo1 or lakefs://remote-repo

Barak Amar

02/12/2021, 4:16 PM

both will go to the same lakefs instance

Barak Amar

02/12/2021, 4:16 PM

as specified in the lakectl.yaml

mario

02/12/2021, 4:17 PM

yes, exactly

mario

02/12/2021, 4:17 PM

which is anyway somewhere e.g. in aws

mario

02/12/2021, 4:18 PM

so if I have let's say set of 20files to work on -- what would be the best "lakefs-flow" to use?

Barak Amar

02/12/2021, 4:18 PM

so if the second repository is managed by the same instance of lakefs

Barak Amar

02/12/2021, 4:18 PM

we can clone the repository lakefs already manages

Barak Amar

02/12/2021, 4:18 PM

which keep pointing to the same files

mario

02/12/2021, 4:18 PM

I think there is one repo; let's call it lakefs://repo1

mario

02/12/2021, 4:18 PM

and local copy of the repo

mario

02/12/2021, 4:19 PM

or...

mario

02/12/2021, 4:19 PM

do you suggest

Barak Amar

02/12/2021, 4:19 PM

but what do you mean in "local"

mario

02/12/2021, 4:19 PM

I should create local repo

mario

02/12/2021, 4:19 PM

I mean I should have repo created on my station

Barak Amar

02/12/2021, 4:19 PM

but there is nothing locally

Barak Amar

02/12/2021, 4:19 PM

when you create a repo everybody working on the lakeFS instance can see the repo

mario

02/12/2021, 4:19 PM

well... I work on some files on my station?

mario

02/12/2021, 4:20 PM

yes, but to work, change the file you need to grab the file

mario

02/12/2021, 4:20 PM

you are not working on the file by "remote edit"

mario

02/12/2021, 4:20 PM

you make pull, edit and push, don't you?

mario

02/12/2021, 4:20 PM

🙂

Barak Amar

02/12/2021, 4:20 PM

yes, if you work locally and not running a job that process the data on the lake

Barak Amar

02/12/2021, 4:21 PM

so you are suggesting having something like local stage area for lakefs

Barak Amar

02/12/2021, 4:21 PM

to work locally on files and push just the changes to the lakefs?

mario

02/12/2021, 4:21 PM

yes!

Barak Amar

02/12/2021, 4:21 PM

having track local changes and have them pushed

mario

02/12/2021, 4:22 PM

that's just what I know

Barak Amar

02/12/2021, 4:22 PM

I see

mario

02/12/2021, 4:22 PM

but if there is any other option tell me, please

mario

02/12/2021, 4:22 PM

e.g. I have a file in S3

Barak Amar

02/12/2021, 4:22 PM

think aws s3 got a sync command you can use to copy the local changes

mario

02/12/2021, 4:22 PM

as LakeFS is API

mario

02/12/2021, 4:22 PM

I shoudnt work on the file directly

Barak Amar

02/12/2021, 4:22 PM

lakeFS suppose S3 interface

mario

02/12/2021, 4:23 PM

but only through the API

mario

02/12/2021, 4:23 PM

yes, I've tested that and it works very nice!

Barak Amar

02/12/2021, 4:25 PM

for repositories with a large datasets copy/export/clone the data locally is not an option

mario

02/12/2021, 4:25 PM

then how to work on this?

Barak Amar

02/12/2021, 4:25 PM

but I understand the use-case to have the files locally, process, modify, add and push the changes in one command

Barak Amar

02/12/2021, 4:26 PM

lakefs like s3 - build an app that will process the data and produce data back to the object store

Barak Amar

02/12/2021, 4:26 PM

just this time you can commit the changes, rollback and etc.

mario

02/12/2021, 4:26 PM

aha

mario

02/12/2021, 4:26 PM

mario

02/12/2021, 4:27 PM

e.g. use aws to move the data and lakefs to manage metadata?

Barak Amar

02/12/2021, 4:27 PM

yes

mario

02/12/2021, 4:27 PM

got it

mario

02/12/2021, 4:27 PM

phew!

mario

02/12/2021, 4:27 PM

thanks for explanation!

Barak Amar

02/12/2021, 4:27 PM

lakefs operates as a gateway to s3 - so you can do the operations directly

Barak Amar

02/12/2021, 4:27 PM

it will store the data and keep track on the metadata

mario

02/12/2021, 4:27 PM

yes

Barak Amar

02/12/2021, 4:28 PM

but I get your idea for locally working on files

mario

02/12/2021, 4:28 PM

cool, I will play with this approach!

Barak Amar

02/12/2021, 4:28 PM

I guess you can open a feature request

Barak Amar

02/12/2021, 4:28 PM

there are couple of things to iron

mario

02/12/2021, 4:28 PM

thanks -- let me get familliar with this approach -- maybe I'll like it and give up with "clone/pull/push" 😉

Barak Amar

02/12/2021, 4:28 PM

like how to track the files locally

Barak Amar

02/12/2021, 4:29 PM

🙂

mario

02/12/2021, 4:29 PM

indeed

mario

02/12/2021, 4:29 PM

with small files in github it's wasy

mario

02/12/2021, 4:29 PM

but in here it might be a no - go

mario

02/12/2021, 4:29 PM

thanks again.

mario

02/12/2021, 4:29 PM

have a nice Friday and weekend!

Barak Amar

02/12/2021, 4:29 PM

true, I think it can also be develop using the current open api we have

Barak Amar

02/12/2021, 4:30 PM

thanks! you too.

Barak Amar

02/12/2021, 4:30 PM

here if you have more questions or want to discuss more ideas

👍 1

Open in Slack

Previous Next