Hi all, I have an Azure LakeFS installation and I ...
# help
m
Hi all, I have an Azure LakeFS installation and I want to do a zero copy import from Azure Data Lake Storage Gen2 storage account. I used `lakefs ingest`as per the documentation which ingested all files in my repo. However when I try to read them using from Azure Databricks using the s3a path I get a
Missing Credential Scope
error, which I guess comes from lakeFS not being able to authenticate to the storage account where the files actually are. The
ingest
command worked and could authenticate cause I set the Azure Storage Account key in the
AZURE_STORAGE_ACCESS_KEY
environment variable locally, but how can I make LakeFS authenticate later to read the files when I need them?
đź‘€ 2
i
Hi @Miguel RodrĂ­guez
When you upload a file and read it, do you still get the error?
(i.e. is this specific for imported files)
m
If I upload a file and read it works fine
i
That’s helpful, thank you!
m
I guess I somehow need to tell LakeFS how to authenticate to my ADLS Gen2 but I don't see any way in the docs how to add the key to lakeFS (only the key to the underlying Azure Storage but that's a different account and I can't seem to set 2 keys there) LMK if I can share any more info to help @Iddo Avneri
a
@Miguel RodrĂ­guez Can you try
lakectl import
command instead of
lakectl ingest
? As per documentation:
Copy code
The lakectl cli supports import and ingest commands to import objects from an external source.
• The import command acts the same as the UI import wizard. It imports (zero copy) and commits the changes on _<branch_name>_imported branch with an optional flag to also merge the changes to <branch_name>. • The Ingest is listing the source bucket (and optional prefix) from the client, and creating pointers to the returned objects in lakeFS. The objects will be staged on the branch.
m
I get an authentication error on
lakectl import
because the storage account I want to import from is different from my LakeFS underlying storage account
i
That’s a great idea @Amit Kesarwani. I think one way or another we will need to grant access to read from these files to the lakeFS account… @Miguel Rodríguez - let us look into this and update you.
m
Thank you @Iddo Avneri and @Amit Kesarwani, looking forward for your reply I also tried installing LakeFS in the same storage account where I need to import data from, but I had problems because hierarchical namespace is not supported in LakeFS underlying storage. I opened an issue in GitHub so that you can please document this for future people installing LakeFS in Azure https://github.com/treeverse/lakeFS/issues/4931
i
Thank you for the context @Miguel RodrĂ­guez let us consult and get back to you.
dancing lakefs 1
a
@Miguel Rodríguez I am using hierarchical namespace (a.k.a. ADLS Gen2) for my setup and it works fine. Would you like to get on the call with me to resolve this issue? It you can’t then I will send the instructions.
@Miguel RodrĂ­guez You can Authenticate with a Secret Key for ADLS Gen2: https://docs.lakefs.io/v0.86/setup/storage/blob.html#authenticate-with-a-secret-key (this page is missing in the latest doc and we will fix it)
@Miguel Rodríguez I used following instructions for my setup: • From the Azure portal, go to Azure Storage Accounts • Create a Storage Account ◦ Resource Group: xyz ◦ Storage account name: abc ◦ Region: (US) West US 2 ◦ Advanced > Data Lake Storage Gen2 > Enable hierarchical namespace • Authenticate with a Secret Key: go to the Access Keys tab on left menu panel and click Show Keys • lakeFS configurations:
Copy code
blockstore:
  type: azure
  azure:
    auth_method: access-key
    storage_account: "abc"
    storage_access_key: "xxxxxxxx"
@Miguel RodrĂ­guez I verified that
lakectl ingest
command works with 2 different storage accounts. Please use following commands:
Copy code
lakectl config
Access key ID: <lakeFS Access Key>
Secret access key: <lakeFS Secret Key>
Server endpoint URL: http://<lakeFS_server_ip_address>:8000/api/v1

export AZURE_STORAGE_ACCOUNT="from_storage_account"
export AZURE_STORAGE_ACCESS_KEY="access_key_for_storage_account"
lakectl ingest \
   --from <https://from_storage_account.blob.core.windows.net/container/> \
   --to <lakefs://my-repo/main/>
n
@Miguel RodrĂ­guez I replied to the issue you've opened. Please note that the error you are receiving comes from the azure blob storage and not from lakeFS. The feature you were trying to use is not supported in HNS mode: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-feature-support-in-storage-accounts
m
Thank you @Amit Kesarwani, I did what you mentioned to install LakeFS in ADLS Gen2 but I got the error mentioned in the Github issue when trying to write multiple large files with Spark. I'm happy to jump into a call today to discuss today or one of these days if you have the time. @Niro I replied to the issue too, please note I'm not using the Blob Tag feature myself, it's probably the LakeFS installation trying to use it
a
@Miguel RodrĂ­guez Will today 10am PST work for you? If not then please schedule a call with me: https://calendly.com/d/dmv-xcy-6zr/meeting-with-amit-iddo
m
Yeah @Amit Kesarwani that time is good! See you in 30 min then. You have my email already you can send it there 🙂
a
@Miguel Rodríguez I sent you the meeting invite to your Gmail address. If you didn’t receive it then here is the Zoom link for the call at 10am PST: https://us02web.zoom.us/j/84503853423?pwd=UjRDem5QNGUwbVNwU3RIWlpycEp4UT09
đź‘Ť 1