Conor Simmons
01/27/2023, 4:55 PMgit checkout
on an S3 bucket with zero-copyOr Tzabary
01/27/2023, 4:58 PMConor Simmons
01/27/2023, 5:42 PMor you’d like to manage existing S3 bucket within lakeFS with zero-copyI think it would be this. Here's an example: commit 1: • add hello_world.txt to s3://my/bucket/path • commit to LakeFS with import from s3 bucket commit 2: • delete hello_world.txt from s3://my/bucket/path • add hello_world_2.txt to s3://my/bucket/path • commit to LakeFS with import from s3 bucket Now, in my s3 bucket, I want to "checkout" commit 1, so I should see hello_world.txt instead of hello_world_2.txt in my s3 bucket
Or Tzabary
01/27/2023, 5:55 PMConor Simmons
01/27/2023, 6:09 PMyou do this directly against the S3 bucket, right?Yes, isn't this the only way to do zero-copy?
if you’ll upload/delete the files using lakeFS and let lakeFS manage the underlying storage, once checking out to commit 1, the files will be accessible (or if they weren’t deleted from the underlying object store).Would this mean not using zero-copy? Or what use case does this look like?
Or Tzabary
01/27/2023, 6:10 PMConor Simmons
01/27/2023, 6:12 PMlakectl fs upload
command and/or python API look like for using an s3 bucket?Or Tzabary
01/27/2023, 6:16 PMConor Simmons
01/27/2023, 6:20 PMOr Tzabary
01/27/2023, 6:20 PMConor Simmons
01/27/2023, 6:21 PMfile://home/conor/cat.jpg▾
Or Tzabary
01/27/2023, 6:22 PMConor Simmons
01/27/2023, 6:22 PMOr Tzabary
01/27/2023, 6:23 PMConor Simmons
01/27/2023, 6:24 PMgit checkout
equivalent? Say I want my local files and S3 bucket to checkout a different commitOr Tzabary
01/27/2023, 6:27 PMConor Simmons
01/27/2023, 6:28 PMOr Tzabary
01/27/2023, 6:28 PMConor Simmons
01/27/2023, 6:29 PMA rollback operation is used to to fix critical data errors immediately.
Or Tzabary
01/27/2023, 6:31 PMConor Simmons
01/27/2023, 6:33 PMIf you'd like to access objects from a different commit without changing the branch reference, you can do so also, just like all branches are accessible, commits/refs or tags are accessible too
Or Tzabary
01/27/2023, 6:34 PMConor Simmons
01/27/2023, 6:36 PMYou can point your model to work with a specific ref or tag, you don't need a revert/checkoutIs it a lakefs download?
Or Tzabary
01/27/2023, 6:38 PMConor Simmons
01/27/2023, 6:41 PMObjectsAPI
? Can I download recursively from a folder in lakefs?Or Tzabary
01/27/2023, 6:46 PMConor Simmons
01/27/2023, 6:49 PMOr Tzabary
01/27/2023, 6:52 PMConor Simmons
01/27/2023, 6:59 PMgit pull
equivalent for a whole branch and not individual objectsI want to point to a specific ref or tag when training and have the option to read data from local disk or directly from S3To satisfy the first option here
Or Tzabary
01/27/2023, 7:01 PMConor Simmons
01/27/2023, 7:01 PMgit checkout && git pull
equivalentOr Tzabary
01/27/2023, 7:02 PMConor Simmons
01/27/2023, 7:02 PMOr Tzabary
01/27/2023, 7:02 PMConor Simmons
01/27/2023, 7:03 PMOr Tzabary
01/27/2023, 7:04 PMConor Simmons
01/27/2023, 7:11 PMOr Tzabary
01/27/2023, 7:13 PMConor Simmons
01/27/2023, 7:14 PMOr Tzabary
01/27/2023, 7:14 PMConor Simmons
01/27/2023, 7:15 PMIf they're small, rsync should do the trick anywayDo you happen to have an idea of what small means here quantitatively?
Or Tzabary
01/27/2023, 7:18 PMConor Simmons
01/27/2023, 7:19 PMOr Tzabary
01/27/2023, 7:19 PMConor Simmons
01/27/2023, 9:24 PMrclone lsd lakefs:
yields
-1 2023-01-27 12:36:40 -1 demo
which is my only lakefs repository right now.
However, even when trying
rclone ls lakefs:
I get
Failed to ls: RequestError: send request failed
caused by: Get "<https://demo.lakefs.example.com/?delimiter=&encoding-type=url&list-type=2&max-keys=1000&prefix=>": dial tcp: lookup <http://demo.lakefs.example.com|demo.lakefs.example.com> on 127.0.0.53:53: no such host
with similar errors for rclone sync, etc
Any idea on this? Could it be related to using BackBlaze?Iddo Avneri
01/27/2023, 9:31 PMConor Simmons
01/27/2023, 9:32 PMIddo Avneri
01/27/2023, 9:43 PMOr Tzabary
01/29/2023, 10:04 AMConor Simmons
01/29/2023, 6:36 PMdemo
part is the name of the repository (you can see in the ls
command. It seems to add that into the host name. I substituted "example" in the actual endpoint name. The config looks like this
[lakefs]
type = s3
provider = AWS
env_auth = false
access_key_id = xxxxxxxxxxxxxxxxxx
secret_access_key = xxxxxxxxxxxxxxxxx
endpoint = <https://lakefs.example.com>
no_check_bucket = true
Or Tzabary
01/29/2023, 6:39 PMConor Simmons
01/29/2023, 6:41 PMOr Tzabary
01/29/2023, 6:42 PMMatija Teršek
01/30/2023, 5:59 PMElad Lachmi
01/30/2023, 6:16 PMAriel Shaqed (Scolnicov)
01/30/2023, 6:29 PMElad Lachmi
01/30/2023, 6:39 PMrclone
is --s3-force-path-style
Ariel Shaqed (Scolnicov)
01/30/2023, 6:43 PMConor Simmons
01/30/2023, 6:45 PMrclone ls --s3-force-path-style lakefs:
and got the same error
2. Our rclone config is based exactly on that integration guideAriel Shaqed (Scolnicov)
01/30/2023, 6:48 PMConor Simmons
01/30/2023, 6:50 PM<7>DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "ls" "-vv" "lakefs:"]
<7>DEBUG : rclone: systemd logging support activated
<7>DEBUG : Creating backend with remote "lakefs:"
<7>DEBUG : Using config file from "/home/conor/.config/rclone/rclone.conf"
<7>DEBUG : 3 go routines active
Failed to ls: RequestError: send request failed
caused by: Get "<https://demo.lakefs.example.com/?delimiter=&encoding-type=url&list-type=2&max-keys=1000&prefix=>": x509: certificate is valid for <http://lakefs.example.com|lakefs.example.com>, not <http://demo.lakefs.example.com|demo.lakefs.example.com>
Not much helpful info imo 😅
But note the error is a bit different since @Matija Teršek changed something on the server sideElad Lachmi
01/30/2023, 6:52 PMConor Simmons
01/30/2023, 6:53 PMrclone lsd lakefs:
yields
-1 2023-01-27 14:42:27 -1 demo
which is our 1 lakefs repo right nowElad Lachmi
01/30/2023, 6:57 PMprovider = Other
in place of provider = AWS
I want to rule that out before we look elsewhereAriel Shaqed (Scolnicov)
01/30/2023, 6:58 PMElad Lachmi
01/30/2023, 6:59 PM--s3-force-path-style=true
, that's still needed even with provider = Other
Conor Simmons
01/30/2023, 7:13 PMElad Lachmi
01/30/2023, 7:15 PMConor Simmons
01/30/2023, 7:17 PM