Title
#general
t

taylor schneider

09/08/2022, 8:38 PM
Hey folks. Just setup a self hosted 0.70.2 installation. I am walking through the wizard to create a "Spark Quickstart" repository. I am seeing the following error when I try to import from a local storage path: "creating object-store walker: no storage adapter found: for scheme: local". Anyone know what this means?
Oz Katz

Oz Katz

09/08/2022, 8:46 PM
Hey @taylor schneider ! welcome to the lakeFS community 🤗 I agree this error message could do better to explain the problem: importing in lakeFS doesn't actually copy data but in fact creates pointers to existing data objects on an object store. This means that for this to work, you'd need to connect your lakeFS installation to an object store first. Alternatively, if you're ok with simply uploading the data to your repository you can keep your configuration as is and simply skip the import. you'll be able to upload your data after the quickstart is done.
8:46 PM
let me know if this makes sense!
8:47 PM
btw, I'd suggest asking questions on #help or #lakefs-for-beginners - these channels are where devs and community members typcially hang out and are always happy to assist 😀
t

taylor schneider

09/08/2022, 9:28 PM
@Oz Katz thanks for the warm intro! The link you posted isnt working. I think maybe some chars for truncated in the url.
Oz Katz

Oz Katz

09/08/2022, 9:29 PM
t

taylor schneider

09/08/2022, 9:29 PM
I had a look at the source code in github, and I think the problematic line is here: https://github.com/treeverse/lakeFS/blob/71db8f37656ad021f3178a376651519ed24f4cf7/pkg/ingest/store/factory.go#L158
9:29 PM
Basically, we can only walk paths for S3, gc, and http / https
9:36 PM
@Oz Katz can you confirm?
Oz Katz

Oz Katz

09/08/2022, 9:38 PM
this is where this limitation is enforced. as mentioned, the
local
adapter does not support importing since it's files on a local device that are not externally available for consumers
9:39 PM
another option if you want to work locally is to use something like minio that can run locally and expose an s3 interface
9:41 PM
are you trying to setup a local spark + lakeFS environment? or fo you have something else in mind @taylor schneider ?
t

taylor schneider

09/08/2022, 9:43 PM
@Oz Katz my use case was that I have an existing ceph fs installation. I installed the drivers and mounted the ceph cluster to appear as a local directory. This is done identically on my lakefs nodes and my spark nodes. I was thinking that through this method I could enable the lakfs hadoop filesystem. But after reading the source/blogs I see that the lakefs hadoop filesystem only supports s3.
9:44 PM
Will have a look at minio... I may just reconfigure ceph to expose it's s3 endpoint. But I prefer working in POSIX
Oz Katz

Oz Katz

09/08/2022, 9:45 PM
ahhh got it! ceph actually supports the s3 protocol https://docs.ceph.com/en/latest/radosgw/s3/
9:46 PM
this should work well with lakeFS the same way minio does
9:48 PM
while POSIX is a stronger filesystem API, lakeFS was designed with large scale data lakes in mind and so it's optimized for object stores, which are currently the most prevalent storage solution for those..
t

taylor schneider

09/08/2022, 9:48 PM
when you say optimized do you mean in terms of speed or functionality?
Oz Katz

Oz Katz

09/08/2022, 9:49 PM
interested to hear if you do manage to get ceph working with the s3 adapter!
9:49 PM
speed (as in throughput and parallelism, not latency) and compatibility. most data frameworks nowadays optimize specifically for object stores
9:50 PM
at least that's what we're seeing
t

taylor schneider

09/08/2022, 9:53 PM
OK, I will let you know i get ceph working. I would be curious to see how the http traffic over s3 compares to the ceph-fuse driver speaking directly to the OSDs. My gut is telling me s3 would be slower all things equal
10:38 PM
@Oz Katz is it correct to assume that the only difference between the "Spark Quickstart" and "Basic Repo" routes is the creature comforts? At the end of the day, both paths through the wizard create a repo. There isnt a difference in repo type / functionality right?
Oz Katz

Oz Katz

09/08/2022, 11:57 PM
that's 100% correct 🙂
t

taylor schneider

09/09/2022, 12:01 AM
Thanks @Oz Katz!