Thread
#lakefs-for-beginners
    b

    Bart Keulen

    1 month ago
    Hi everyone! I just started playing around with LakeFS and first of all I want to say that I really like it. Currently I am trying to ingest data from LakeFS
    repo-a
    into LakeFS repo-b:
    lakectl ingest --s3-endpoint-url <http://lakefs:8000> --from <s3://repo-a/main> --to <lakefs://repo-b/main/> --dry-run
    But I get the following error:
    error walking object store: NoCredentialProviders: no valid providers in chain. Deprecated.
    	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
    Error executing command.
    Is there an easier/more direct way to ingest/import data from another lakefs repo? If not, I would like some help in getting this working.
    Eden Ohana

    Eden Ohana

    1 month ago
    Hi Bart, Welcome šŸ˜ƒ Ingest command is used to import data from the object store to lakefs. Are Repo-a and repo-b from the same lakefs installation?
    b

    Bart Keulen

    1 month ago
    Yes they are from the same lakefs installation. What I would like to achieve is have one lakefs repo as raw data lake and have other lakefs repos import/ingest data from this raw data lake.
    Oz Katz

    Oz Katz

    1 month ago
    Hey @Bart Keulen - that's an interesting use case! Currently, lakeFS doesn't support linking data across repositories (only importing from an underlying object store).
    We do have an open proposal to add something sort of similar to that, I'm not sure if it would satisfy your use case: https://github.com/treeverse/lakeFS/blob/26818a221ea815808cba3f3425644f3954487635/design/open/declarative-views.md
    @Bart Keulen would you mind taking a look?
    b

    Bart Keulen

    1 month ago
    @Oz Katz that looks very interesting! And I think it would suit my use case. If I understand it correctly it will work something like this: You initialize a virtual LakeFS repo which contains a Lakefile that specifies all source data to inherit. You can access all the data the same way you access a 'normal' LakeFS repo. ā€¢ Can you still add data like a 'normal' LakeFS repo? For example, this would allow you to have a repository containing processed / labeled images but have a clear declarative reference to the source data.