Title
m

Miguel Rodríguez

01/04/2023, 10:19 PM
Hi all, I have an Azure LakeFS installation and I want to do a zero copy import from Azure Data Lake Storage Gen2 storage account. I used `lakefs ingest`as per the documentation which ingested all files in my repo. However when I try to read them using from Azure Databricks using the s3a path I get a
Missing Credential Scope
error, which I guess comes from lakeFS not being able to authenticate to the storage account where the files actually are. The
ingest
command worked and could authenticate cause I set the Azure Storage Account key in the
AZURE_STORAGE_ACCESS_KEY
environment variable locally, but how can I make LakeFS authenticate later to read the files when I need them?
👀 2
i

Iddo Avneri

01/04/2023, 10:23 PM
Hi @Miguel Rodríguez
When you upload a file and read it, do you still get the error?
(i.e. is this specific for imported files)
m

Miguel Rodríguez

01/04/2023, 10:24 PM
If I upload a file and read it works fine
i

Iddo Avneri

01/04/2023, 10:25 PM
That’s helpful, thank you!
m

Miguel Rodríguez

01/04/2023, 10:29 PM
I guess I somehow need to tell LakeFS how to authenticate to my ADLS Gen2 but I don't see any way in the docs how to add the key to lakeFS (only the key to the underlying Azure Storage but that's a different account and I can't seem to set 2 keys there) LMK if I can share any more info to help @Iddo Avneri
a

Amit Kesarwani

01/04/2023, 10:39 PM
@Miguel Rodríguez Can you try
lakectl import
command instead of
lakectl ingest
? As per documentation:
The lakectl cli supports import and ingest commands to import objects from an external source.
• The import command acts the same as the UI import wizard. It imports (zero copy) and commits the changes on _<branch_name>_imported branch with an optional flag to also merge the changes to <branch_name>. • The Ingest is listing the source bucket (and optional prefix) from the client, and creating pointers to the returned objects in lakeFS. The objects will be staged on the branch.
m

Miguel Rodríguez

01/04/2023, 10:41 PM
I get an authentication error on
lakectl import
because the storage account I want to import from is different from my LakeFS underlying storage account
i

Iddo Avneri

01/04/2023, 10:44 PM
That’s a great idea @Amit Kesarwani. I think one way or another we will need to grant access to read from these files to the lakeFS account… @Miguel Rodríguez - let us look into this and update you.
m

Miguel Rodríguez

01/04/2023, 10:52 PM
Thank you @Iddo Avneri and @Amit Kesarwani, looking forward for your reply I also tried installing LakeFS in the same storage account where I need to import data from, but I had problems because hierarchical namespace is not supported in LakeFS underlying storage. I opened an issue in GitHub so that you can please document this for future people installing LakeFS in Azure https://github.com/treeverse/lakeFS/issues/4931
i

Iddo Avneri

01/04/2023, 10:57 PM
Thank you for the context @Miguel Rodríguez let us consult and get back to you.
:dancing_lakefs: 1
a

Amit Kesarwani

01/04/2023, 11:06 PM
@Miguel Rodríguez I am using hierarchical namespace (a.k.a. ADLS Gen2) for my setup and it works fine. Would you like to get on the call with me to resolve this issue? It you can’t then I will send the instructions.
@Miguel Rodríguez You can Authenticate with a Secret Key for ADLS Gen2: https://docs.lakefs.io/v0.86/setup/storage/blob.html#authenticate-with-a-secret-key (this page is missing in the latest doc and we will fix it)
@Miguel Rodríguez I used following instructions for my setup: • From the Azure portal, go to Azure Storage Accounts • Create a Storage Account ◦ Resource Group: xyz ◦ Storage account name: abc ◦ Region: (US) West US 2 ◦ Advanced > Data Lake Storage Gen2 > Enable hierarchical namespace • Authenticate with a Secret Key: go to the Access Keys tab on left menu panel and click Show Keys • lakeFS configurations:
blockstore:
  type: azure
  azure:
    auth_method: access-key
    storage_account: "abc"
    storage_access_key: "xxxxxxxx"
@Miguel Rodríguez I verified that
lakectl ingest
command works with 2 different storage accounts. Please use following commands:
lakectl config
Access key ID: <lakeFS Access Key>
Secret access key: <lakeFS Secret Key>
Server endpoint URL: http://<lakeFS_server_ip_address>:8000/api/v1

export AZURE_STORAGE_ACCOUNT="from_storage_account"
export AZURE_STORAGE_ACCESS_KEY="access_key_for_storage_account"
lakectl ingest \
   --from <https://from_storage_account.blob.core.windows.net/container/> \
   --to <lakefs://my-repo/main/>
n

Niro

01/05/2023, 8:17 AM
@Miguel Rodríguez I replied to the issue you've opened. Please note that the error you are receiving comes from the azure blob storage and not from lakeFS. The feature you were trying to use is not supported in HNS mode: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-feature-support-in-storage-accounts
m

Miguel Rodríguez

01/05/2023, 10:05 AM
Thank you @Amit Kesarwani, I did what you mentioned to install LakeFS in ADLS Gen2 but I got the error mentioned in the Github issue when trying to write multiple large files with Spark. I'm happy to jump into a call today to discuss today or one of these days if you have the time. @Niro I replied to the issue too, please note I'm not using the Blob Tag feature myself, it's probably the LakeFS installation trying to use it
a

Amit Kesarwani

01/05/2023, 4:18 PM
@Miguel Rodríguez Will today 10am PST work for you? If not then please schedule a call with me: https://calendly.com/d/dmv-xcy-6zr/meeting-with-amit-iddo
m

Miguel Rodríguez

01/05/2023, 5:29 PM
Yeah @Amit Kesarwani that time is good! See you in 30 min then. You have my email already you can send it there 🙂
a

Amit Kesarwani

01/05/2023, 5:42 PM
@Miguel Rodríguez I sent you the meeting invite to your Gmail address. If you didn’t receive it then here is the Zoom link for the call at 10am PST: https://us02web.zoom.us/j/84503853423?pwd=UjRDem5QNGUwbVNwU3RIWlpycEp4UT09
👍 1