venkadesan elangovan

09/29/2022, 5:25 PM
LakeFS is installed on an on-premises server. Operating on the Mino cluster. We want to migrate the data from the on-premises lakeFS server to Azure ADLS storage. Is there a proper document for this migration? We are considering a few options.1) Create an HTTP linked server between Azure and on-premises lakeFS in the Azure data factory, then access the lakeFS object using the lakeFS API. 2) Can I use the Azure Data Factory S3 connector to pull data from MinIO without using LakeFS? 3) Is it possible to connect an on-premises lakeFS server using the Azure data factory Hadoop connector?
Barak Amar

Barak Amar

09/29/2022, 5:55 PM
Hi @venkadesan elangovan, I'll try to answer the above questions first:1. Based on https://learn.microsoft.com/en-us/azure/data-factory/connector-amazon-simple-storage-service?tabs=data-factory#connector-configuration-details you can configure the service to use
(point to your on-premiss lakefs) and pass your lakefs credentials to access the data you currently have on-premises from Azure Data Factory. You will probably be required to open access to lakeFS (https termination, load-balancer and etc). 2. Pulling data directly from MinIO will require query metadata information from lakeFS to understand the which file is which. It will be easier to export the data from lakeFS in order to get them in the same layout you currently access them though lakeFS. 3. Hadoop connector is for HDFS (https://learn.microsoft.com/en-us/azure/data-factory/connector-hdfs?tabs=data-factory) which will not give you access to lakeFS. Option 1 is the connector you want to use. But, when you wrote you would like to migrate to Azure - is the end result will be running lakeFS on Azure? or just export data from your local lakeFS? If you want to copy the data, check https://docs.lakefs.io/reference/export.html Depends on the side of the data or the way you want to sync it - I suggest looking into using rclone or using aws dist-cp.