Hi, I’m trying to ingest some data from an *Azure ...
# help
c
Hi, I’m trying to ingest some data from an Azure Blob into a branch in LakeFS (with an AWS S3 backend) with the following command (all credentials are double checked and correct):
Copy code
export AZURE_STORAGE_ACCOUNT="my-account-name"
export AZURE_STORAGE_ACCESS_KEY="my-access-key"
lakectl ingest \
   --from https://<my-account-name>.<http://blob.core.windows.net/<container-name>/|blob.core.windows.net/<container-name>/> \
   --to lakefs://<repository>/<branch>/<some-existing-path>
but I get the following error:
Copy code
physical address is not valid for block adapter: s3
400 Bad Request
Does anyone have any idea what might happen? Is such an ingestion possible (Azure Blob -> S3)? I would also highly appreciate any alternative ingestion solution? Thanks in advance! FYI @mishraprafful
e
Hi Cristian, This command will not work because lakeFS supports only one storage adapter (S3 in your case). lakectl ingest objects from an external source into a lakeFS branch without actually copying them. And you can’t reference objects to Azure with an s3 adapter. I’m not familiar with any useful tool to transfer data from Azure blob to S3 but I’ll look into it and get back to you.
c
Thanks for the quick reply, Ohana! And thanks for looking into it.
After the data is migrated to s3 you can use lakectl ingest to ingest it into lakefs
c
Thanks a lot for the suggestion @Eden Ohana ! I will try that.
I managed to migrate the data from Azure Blob to S3. I am running now
lakectl ingest --from <s3-bucket> --to <lakefs-repo-branch>
and it staged more than 1M objects and counting. In my S3 I have less than 30k files. What counts as an object in this case?
I should mention that for some permissions reasons on my end I couldn’t run the
lakefs import
from inventory. It needs permissions to a database, but I couldn’t find any info about database in the documentation. I would greatly appreciate some explanation or a resource for this issue as well.
e
After you’ve run lakectl ingest you have 1M objects on lakefs repo?
lakefs import needs permission for lakefs Postgres. Did you set the config file https://docs.lakefs.io/reference/configuration.html#example-aws-deployment
c
Thanks for the quick reply! I actually fixed the credentials issue. Now it seems to be hitting a migration version mismatch. I can look into that later.
The objects are only staged. Now 1.8M objects.
Staged 1868632 objects so far...
e
Is lakefs storage namespace on the s3-bucket your data is?
c
they are on different buckets
e
Can you paste the logs from lakefs
c
I actually couldn’t find the logs for the lakectl ingest command. In the end I fixed the version mismatch and I could smoothly ingest with lakefs import.
There is one thing I though you might be interested. After the successful
lakefs import
, I got the suggestion to merge the inventory import branch as follows:
lakectl merge lakefs://<repo>@<commit-id> lakefs://<repo>@main
However, when I actually run the command, I get the following error:
Invalid 'source ref': parsing lakefs://<repo>@<commit-id>: malformed lakefs uri
I also tried adding
/
after the commit id (sometimes it requires that) In the end I merged the branch from the UI.
e
your import is stored on branch “import-from-inventory” (latest commit). try
Copy code
lakectl merge lakefs://<repo>/import-from-inventory lakefs://<repo>/main
👍 1
c
Thanks for all the help!
Should I create an issue regarding the above?
e
That would be great
let me know if you are running into any other issue
c
Will do! Thank you for the support, Eden! I opened two issues today - I’m pasting the links here for reference. https://github.com/treeverse/lakeFS/issues/2984 https://github.com/treeverse/lakeFS/issues/2983
🙌 1
i
Thank you, @Cristian Caloian