Title
b

Bart Keulen

08/19/2022, 2:23 PM
Hi everyone! I just started playing around with LakeFS and first of all I want to say that I really like it. Currently I am trying to ingest data from LakeFS
repo-a
into LakeFS `repo-b`:
lakectl ingest --s3-endpoint-url <http://lakefs:8000> --from <s3://repo-a/main> --to <lakefs://repo-b/main/> --dry-run
But I get the following error:
error walking object store: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Error executing command.
Is there an easier/more direct way to ingest/import data from another lakefs repo? If not, I would like some help in getting this working.
e

Eden Ohana

08/19/2022, 2:40 PM
Hi Bart, Welcome šŸ˜ƒ Ingest command is used to import data from the object store to lakefs. Are Repo-a and repo-b from the same lakefs installation?
b

Bart Keulen

08/19/2022, 2:50 PM
Yes they are from the same lakefs installation. What I would like to achieve is have one lakefs repo as raw data lake and have other lakefs repos import/ingest data from this raw data lake.
o

Oz Katz

08/19/2022, 2:55 PM
Hey @Bart Keulen - that's an interesting use case! Currently, lakeFS doesn't support linking data across repositories (only importing from an underlying object store).
We do have an open proposal to add something sort of similar to that, I'm not sure if it would satisfy your use case: https://github.com/treeverse/lakeFS/blob/26818a221ea815808cba3f3425644f3954487635/design/open/declarative-views.md
@Bart Keulen would you mind taking a look?
b

Bart Keulen

08/22/2022, 7:00 AM
@Oz Katz that looks very interesting! And I think it would suit my use case. If I understand it correctly it will work something like this: You initialize a virtual LakeFS repo which contains a Lakefile that specifies all source data to inherit. You can access all the data the same way you access a 'normal' LakeFS repo. ā€¢ Can you still add data like a 'normal' LakeFS repo? For example, this would allow you to have a repository containing processed / labeled images but have a clear declarative reference to the source data.