Title
#lakefs-for-beginners
b

Bart Keulen

08/19/2022, 2:23 PM
Hi everyone! I just started playing around with LakeFS and first of all I want to say that I really like it. Currently I am trying to ingest data from LakeFS
repo-a
into LakeFS repo-b:
lakectl ingest --s3-endpoint-url <http://lakefs:8000> --from <s3://repo-a/main> --to <lakefs://repo-b/main/> --dry-run
But I get the following error:
error walking object store: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Error executing command.
Is there an easier/more direct way to ingest/import data from another lakefs repo? If not, I would like some help in getting this working.
Eden Ohana

Eden Ohana

08/19/2022, 2:40 PM
Hi Bart, Welcome šŸ˜ƒ Ingest command is used to import data from the object store to lakefs. Are Repo-a and repo-b from the same lakefs installation?
b

Bart Keulen

08/19/2022, 2:50 PM
Yes they are from the same lakefs installation. What I would like to achieve is have one lakefs repo as raw data lake and have other lakefs repos import/ingest data from this raw data lake.
Oz Katz

Oz Katz

08/19/2022, 2:55 PM
Hey @Bart Keulen - that's an interesting use case! Currently, lakeFS doesn't support linking data across repositories (only importing from an underlying object store).
2:58 PM
We do have an open proposal to add something sort of similar to that, I'm not sure if it would satisfy your use case: https://github.com/treeverse/lakeFS/blob/26818a221ea815808cba3f3425644f3954487635/design/open/declarative-views.md
2:58 PM
@Bart Keulen would you mind taking a look?
b

Bart Keulen

08/22/2022, 7:00 AM
@Oz Katz that looks very interesting! And I think it would suit my use case. If I understand it correctly it will work something like this: You initialize a virtual LakeFS repo which contains a Lakefile that specifies all source data to inherit. You can access all the data the same way you access a 'normal' LakeFS repo. ā€¢ Can you still add data like a 'normal' LakeFS repo? For example, this would allow you to have a repository containing processed / labeled images but have a clear declarative reference to the source data.