Hello, I'm having problems exporting my data out o...
# help
j
Hello, I'm having problems exporting my data out of a LakeFS repository. My goal is to export a repository called
lakefs-learning
either to my local machine or to a GCS bucket called
julien_learning
. I'm trying to use rclone v1.63.0, and have followed the tutorials available on https://docs.lakefs.io/howto/export.html and https://docs.lakefs.io/howto/copying.html. I had first tried the docker command, which gave me the error
Copy code
2023/07/12 13:52:03 Failed to create file system for "<gs://julien_learning/lakefs-backup/>": didn't find section in config file 
rclone copy failed
Then I tried using rclone from my command line to have more control over the process. I'm using Windows 10 Pro, and am using rclone v1.63.0 on WSL Ubuntu 20.04.6 LTS. I have LakeFS deployed on my localhost for trying it out. I made a remote for LakeFS following the interactive
rclone config
, but was never prompted to enter a
no-check-bucket
option. When trying
rclone lsd lakefs:
, I get error
Copy code
2023/07/12 09:40:06 ERROR : : error listing: NoSuchBucket: The specified bucket does not exist
        status code: 404, request id: , host id:
2023/07/12 09:40:06 DEBUG : 6 go routines active
2023/07/12 09:40:06 Failed to lsd with 2 errors: last error was: NoSuchBucket: The specified bucket does not exist
        status code: 404, request id: , host id:
I have also tried adding the
--s3-no-check-bucket
flag, but this doesn't change anything. To make sure it's not just me failing to use rclone, I made another remote for my GCS bucket (called
lakefs-learning
, bad name choice I know) tried
rclone lsd lakefs-learning:julien_learning
, which correctly outputs the directories present The
NoSuchBucket
error makes me feel like it's checking for the bucket existing while it shouldn't because of a
no-check-bucket
flag that I do not have :((. Any help is much appreciated🤗lakefs
update: the first time I tried to create my
lakefs
remote I was on version 1.50.2. When retrying with v1.63.0, in the advanced options, I see
Copy code
Option no_check_bucket.
	If set, don't attempt to check the bucket exists or create it.
	This can be useful when trying to minimise the number of transactions
	rclone does if you know the bucket exists already.
	It can also be needed if the user you are using does not have bucket
	creation permissions. Before v1.52.0 this would have passed silently
	due to a bug.
	Enter a boolean value (true or false). Press Enter for the default (false).
	no_check_bucket> true
Which is already a step further. however,
rclone lsd lakefs:
still gives the same error.
i
Hi @Jubiiz Audet 👋 Thanks for providing all that context upfront! To set the context; rsync will read from lakeFS as it was just another S3 target. Regarding the Docker option: This is an image we created to help but it works with s3 only so I don’t it’s going to work at all as you see from the error (didn’t find section in config file). Essentially this docker image is a this python script with this rclone config (s3) you can use as a reference implementation but will not work with gcs. Regarding your attempt to use stand alone rsync: Assuming your lakeFS instance configured correctly, I suspect this is an issue related to your rclone config. Looking now, the lakeFS documentation of using rsync references an older version where part of the config was was the
provider=AWS
field.
Copy code
[lakefs]
provider = AWS
In newer rsync versions this will not work anymore because rsync will try and speak with AWS instead of lakeFS, so the field needs to be
provider=Other
I tested my local lakeFS server with GCS and did rsync copy and other commands with the following config and it worked.
Copy code
[lakefs]
type = s3
provider = Other
env_auth = false
access_key_id = <lakefs-access-key-id>
secret_access_key = <lakefs-secret-key>
endpoint = <http://localhost:8000>
no_check_bucket = true

[gcs]
type = google cloud storage
env_auth = true
Can you please try that? If that still doesn’t work please attach your rsync config, lakeFS server logs and lakeFS config (and remember to omit sensitive information 🙂 ) Thanks!
lakefs 1
gratitude thank you 1
j
So the provider was already set to 'Other', but I changed the endpoint from 'http://127.0.0.1:8000/api/v1' to 'https://localhost:8000' and that would just freeze my command line (command would never end running). A coworker pointed out that I had put 'https' as opposed to 'http', and now my
rclone lsd
and
rclone ls
work! Thanks a ton for the help
i
Yay! Glad that worked out 🙏
lakefs 1