We have deployed dataset to a local lakefs (docker...
# help
h
We have deployed dataset to a local lakefs (docker). Question: later on, we will have a lakefs deployed in azure. How do you migrate the data and history to the new lakefs server?
n
Hi @HT, welcome to the lake! lakefs Are you planning on using the same storage with a new lakeFS deployment or did you intend to migrate both the storage and lakeFS?
h
Both. Current storage is local disk. Later on will be azure blob storage
n
lakeFS currently has no feature to support migration from different block stores. If you have a relatively small repository it might be possible to achieve this using the export / import functionalities and some scripting. Note that you will have to build your repos and branches manually and then export - import each commit separately
h
Thanks. Good to know this in advance. I will a deeper look. Thank you
If the history is not too big and we don't have too many branches, checkout each commît from local lakefs, then commit to the new lakefs url, in the history order. Would that work?
b
Hi @HT, want to suggest an alternative way to move your environment. It assume that you didn't import any data while working with local - only uploaded data. We can use lakectl's ref-dump command and restore the repository information (commit all changes before dump). Copy the local data to Azure and override the repository storage namespace to the new location. The commits, tags and branch information should be kept.
h
No we don't have imported data.
jumping lakefs 1
So from what i understand, migrate the data from local disk to blob storage using something like rclone. Then get a ref-dump of the local lakefs, then ref-restore to the cloud lakefs And lastly replace all the namespace to point to the cloud lakefs ?
b
Steps - per repository 1. Commit changes 2. ref-dump 3. copy the lakefs data to azure blob store - rclone 4. create bare repository point to the azure blob store 5. refs-restore
👍 1
You can experiment the above with a small repository that will take short amount of time to create and restore - let us know if you have any issues
👍 1
h
Will give it a try with 2 local lakefs. It's easier to setup ;)
lakefs 1
@Barak Amar can you explain a bit more around step 3: copy the lakefs data to azure blob store - rclone I have 2 local lakefs server: old and new In the old, I have lakefs://myrepo/main which contains a bunch of files and commits. I commit all changes for that repo. Then ran
lakectl -c old.yaml refs-dump  <lakefs://myrepo/> > manifest.json
I created a new bare repository on the new server
lakectl -c new.yaml repo create-bare <lakefs://my-new-repo>
<local://new-repo>
Currently empty. What should I
rclone
now ?
n
@HT did you create a bare repository or a regular repository?
b
@HT About step 3 - copy the data to azure. Its my bad, it is probably should be swapped with step 4 - create bare repository. When you create a repository you select the location you like to keep lakeFS metadata and data.
About the copy itself - in your old lakefs. Assume it is local, by default it keeps the data under
~/lakefs/data/block
. It is controlled by configuration/environment variable. The local repository information found under a folder named based on the repository - ex:
~/lakefs/data/block/example
.
Copy the content of this folder to your azure storage namespace you created when you created the repository on the new installation.
At this point you will have all the data and committed metadata on Azure. Having the ref restored - you will have access to the tags, branches and commits of this repository.
h
@Barak Amar Success ! I manage to move one repo from one local server to another. There are 2 questions : 1. How do you delete a bare repo ?? 2.
lakectl refs-dump
output something like :
Copy code
Repository: <lakefs://tomato>

{
  "branches_meta_range_id": "41dcb9faf1495bf6b3915ce5c18c44eb74cd88ba076459ac2f3348992197b537",
  "commits_meta_range_id": "175ab55e94e8ad486c30c3967653202de305d8b0dd01bffd75e4aae57578a2f4",
  "tags_meta_range_id": "e77ccbca0369cb68eb14455302ee10f4baaf7aff716ec1b591aeb9b86b198d19"
}
While the json is only :
Copy code
{
  "branches_meta_range_id": "41dcb9faf1495bf6b3915ce5c18c44eb74cd88ba076459ac2f3348992197b537",
  "commits_meta_range_id": "175ab55e94e8ad486c30c3967653202de305d8b0dd01bffd75e4aae57578a2f4",
  "tags_meta_range_id": "e77ccbca0369cb68eb14455302ee10f4baaf7aff716ec1b591aeb9b86b198d19"
}
This looks like a bug ...
b
1. same as any other repo. btw it is no longer bare after we restore the refs. 2. yes. thank you! if you can open an issue and we will address this one.
h
Thanks ! Was going to log an issue but then the issue been fixed when i woke up ! Thanks !
lakefs 1
b
tz differences 🙂
will be part of this week release 🙏
h
re: deleting bare repo. I could not delete them from the WebUI. I forgot about the CLI. I managed now to delete them from CLI.
i
Thanks @HT. Does that mean you are good to go?
h
Yes. I managed to delete and clean up the bare repo
i
Excellent. 🙏