user
01/27/2023, 4:55 PMuser
01/27/2023, 4:56 PMgit checkout on an S3 bucket with zero-copyuser
01/27/2023, 4:58 PMuser
01/27/2023, 5:33 PMuser
01/27/2023, 5:33 PMuser
01/27/2023, 5:42 PMor you’d like to manage existing S3 bucket within lakeFS with zero-copyI think it would be this. Here's an example: commit 1: • add hello_world.txt to s3://my/bucket/path • commit to LakeFS with import from s3 bucket commit 2: • delete hello_world.txt from s3://my/bucket/path • add hello_world_2.txt to s3://my/bucket/path • commit to LakeFS with import from s3 bucket Now, in my s3 bucket, I want to "checkout" commit 1, so I should see hello_world.txt instead of hello_world_2.txt in my s3 bucket
user
01/27/2023, 5:55 PMuser
01/27/2023, 5:57 PMuser
01/27/2023, 6:09 PMyou do this directly against the S3 bucket, right?Yes, isn't this the only way to do zero-copy?
if you’ll upload/delete the files using lakeFS and let lakeFS manage the underlying storage, once checking out to commit 1, the files will be accessible (or if they weren’t deleted from the underlying object store).Would this mean not using zero-copy? Or what use case does this look like?
user
01/27/2023, 6:10 PMuser
01/27/2023, 6:11 PMuser
01/27/2023, 6:12 PMuser
01/27/2023, 6:13 PMlakectl fs upload command and/or python API look like for using an s3 bucket?user
01/27/2023, 6:16 PMuser
01/27/2023, 6:17 PMuser
01/27/2023, 6:17 PMuser
01/27/2023, 6:18 PMuser
01/27/2023, 6:18 PMuser
01/27/2023, 6:18 PMuser
01/27/2023, 6:20 PMuser
01/27/2023, 6:20 PMuser
01/27/2023, 6:21 PMuser
01/27/2023, 6:21 PMuser
01/27/2023, 6:21 PMfile://home/conor/cat.jpg▾
user
01/27/2023, 6:22 PMuser
01/27/2023, 6:22 PMuser
01/27/2023, 6:23 PMuser
01/27/2023, 6:24 PMgit checkout equivalent? Say I want my local files and S3 bucket to checkout a different commituser
01/27/2023, 6:25 PMuser
01/27/2023, 6:27 PMuser
01/27/2023, 6:27 PMuser
01/27/2023, 6:27 PMuser
01/27/2023, 6:28 PMuser
01/27/2023, 6:28 PMuser
01/27/2023, 6:29 PMA rollback operation is used to to fix critical data errors immediately.
user
01/27/2023, 6:31 PMuser
01/27/2023, 6:32 PMuser
01/27/2023, 6:33 PMIf you'd like to access objects from a different commit without changing the branch reference, you can do so also, just like all branches are accessible, commits/refs or tags are accessible too
user
01/27/2023, 6:34 PMuser
01/27/2023, 6:35 PMuser
01/27/2023, 6:35 PMuser
01/27/2023, 6:36 PMYou can point your model to work with a specific ref or tag, you don't need a revert/checkoutIs it a lakefs download?
user
01/27/2023, 6:37 PMuser
01/27/2023, 6:38 PMuser
01/27/2023, 6:40 PMuser
01/27/2023, 6:40 PMuser
01/27/2023, 6:41 PMuser
01/27/2023, 6:42 PMObjectsAPI ? Can I download recursively from a folder in lakefs?user
01/27/2023, 6:46 PMuser
01/27/2023, 6:48 PMuser
01/27/2023, 6:49 PMuser
01/27/2023, 6:52 PMuser
01/27/2023, 6:54 PMuser
01/27/2023, 6:59 PMgit pull equivalent for a whole branch and not individual objectsuser
01/27/2023, 6:59 PMI want to point to a specific ref or tag when training and have the option to read data from local disk or directly from S3To satisfy the first option here
user
01/27/2023, 7:01 PMuser
01/27/2023, 7:01 PMgit checkout && git pull equivalentuser
01/27/2023, 7:02 PMuser
01/27/2023, 7:02 PMuser
01/27/2023, 7:02 PMuser
01/27/2023, 7:03 PMuser
01/27/2023, 7:03 PMuser
01/27/2023, 7:04 PMuser
01/27/2023, 7:04 PMuser
01/27/2023, 7:05 PMuser
01/27/2023, 7:06 PMuser
01/27/2023, 7:06 PMuser
01/27/2023, 7:06 PMuser
01/27/2023, 7:11 PMuser
01/27/2023, 7:11 PMuser
01/27/2023, 7:12 PMuser
01/27/2023, 7:13 PMuser
01/27/2023, 7:14 PMuser
01/27/2023, 7:14 PMuser
01/27/2023, 7:14 PMuser
01/27/2023, 7:15 PMuser
01/27/2023, 7:15 PMIf they're small, rsync should do the trick anywayDo you happen to have an idea of what small means here quantitatively?
user
01/27/2023, 7:18 PMuser
01/27/2023, 7:19 PMuser
01/27/2023, 7:19 PMuser
01/27/2023, 9:24 PMrclone lsd lakefs:
yields
-1 2023-01-27 12:36:40 -1 demo
which is my only lakefs repository right now.
However, even when trying
rclone ls lakefs:
I get
Failed to ls: RequestError: send request failed
caused by: Get "<https://demo.lakefs.example.com/?delimiter=&encoding-type=url&list-type=2&max-keys=1000&prefix=>": dial tcp: lookup <http://demo.lakefs.example.com|demo.lakefs.example.com> on 127.0.0.53:53: no such host
with similar errors for rclone sync, etc
Any idea on this? Could it be related to using BackBlaze?user
01/27/2023, 9:31 PMuser
01/27/2023, 9:32 PMuser
01/27/2023, 9:43 PMuser
01/29/2023, 10:04 AMuser
01/29/2023, 6:36 PMdemo part is the name of the repository (you can see in the ls command. It seems to add that into the host name. I substituted "example" in the actual endpoint name. The config looks like this
[lakefs]
type = s3
provider = AWS
env_auth = false
access_key_id = xxxxxxxxxxxxxxxxxx
secret_access_key = xxxxxxxxxxxxxxxxx
endpoint = <https://lakefs.example.com>
no_check_bucket = trueuser
01/29/2023, 6:39 PMuser
01/29/2023, 6:40 PMuser
01/29/2023, 6:41 PMuser
01/29/2023, 6:42 PMuser
01/30/2023, 5:59 PMuser
01/30/2023, 6:16 PMuser
01/30/2023, 6:29 PMuser
01/30/2023, 6:32 PMuser
01/30/2023, 6:39 PMrclone is --s3-force-path-styleuser
01/30/2023, 6:43 PMuser
01/30/2023, 6:45 PMrclone ls --s3-force-path-style lakefs: and got the same error
2. Our rclone config is based exactly on that integration guideuser
01/30/2023, 6:48 PMuser
01/30/2023, 6:50 PM<7>DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "ls" "-vv" "lakefs:"]
<7>DEBUG : rclone: systemd logging support activated
<7>DEBUG : Creating backend with remote "lakefs:"
<7>DEBUG : Using config file from "/home/conor/.config/rclone/rclone.conf"
<7>DEBUG : 3 go routines active
Failed to ls: RequestError: send request failed
caused by: Get "<https://demo.lakefs.example.com/?delimiter=&encoding-type=url&list-type=2&max-keys=1000&prefix=>": x509: certificate is valid for <http://lakefs.example.com|lakefs.example.com>, not <http://demo.lakefs.example.com|demo.lakefs.example.com>
Not much helpful info imo 😅
But note the error is a bit different since @Matija Teršek changed something on the server sideuser
01/30/2023, 6:52 PMuser
01/30/2023, 6:53 PMuser
01/30/2023, 6:54 PMuser
01/30/2023, 6:56 PMrclone lsd lakefs: yields
-1 2023-01-27 14:42:27 -1 demo
which is our 1 lakefs repo right nowuser
01/30/2023, 6:57 PMprovider = Other in place of provider = AWS
I want to rule that out before we look elsewhereuser
01/30/2023, 6:58 PMuser
01/30/2023, 6:59 PM--s3-force-path-style=true, that's still needed even with provider = Otheruser
01/30/2023, 7:13 PMuser
01/30/2023, 7:15 PMuser
01/30/2023, 7:17 PM