Thread
#general
    t

    taylor schneider

    2 weeks ago
    Hey folks. Just setup a self hosted 0.70.2 installation. I am walking through the wizard to create a "Spark Quickstart" repository. I am seeing the following error when I try to import from a local storage path: "creating object-store walker: no storage adapter found: for scheme: local". Anyone know what this means?
    Oz Katz

    Oz Katz

    2 weeks ago
    Hey @taylor schneider ! welcome to the lakeFS community 🤗 I agree this error message could do better to explain the problem: importing in lakeFS doesn't actually copy data but in fact creates pointers to existing data objects on an object store. This means that for this to work, you'd need to connect your lakeFS installation to an object store first. Alternatively, if you're ok with simply uploading the data to your repository you can keep your configuration as is and simply skip the import. you'll be able to upload your data after the quickstart is done.
    let me know if this makes sense!
    btw, I'd suggest asking questions on #help or #lakefs-for-beginners - these channels are where devs and community members typcially hang out and are always happy to assist 😀
    t

    taylor schneider

    2 weeks ago
    @Oz Katz thanks for the warm intro! The link you posted isnt working. I think maybe some chars for truncated in the url.
    Oz Katz

    Oz Katz

    2 weeks ago
    t

    taylor schneider

    2 weeks ago
    I had a look at the source code in github, and I think the problematic line is here: https://github.com/treeverse/lakeFS/blob/71db8f37656ad021f3178a376651519ed24f4cf7/pkg/ingest/store/factory.go#L158
    Basically, we can only walk paths for S3, gc, and http / https
    @Oz Katz can you confirm?
    Oz Katz

    Oz Katz

    2 weeks ago
    this is where this limitation is enforced. as mentioned, the
    local
    adapter does not support importing since it's files on a local device that are not externally available for consumers
    another option if you want to work locally is to use something like minio that can run locally and expose an s3 interface
    are you trying to setup a local spark + lakeFS environment? or fo you have something else in mind @taylor schneider ?
    t

    taylor schneider

    2 weeks ago
    @Oz Katz my use case was that I have an existing ceph fs installation. I installed the drivers and mounted the ceph cluster to appear as a local directory. This is done identically on my lakefs nodes and my spark nodes. I was thinking that through this method I could enable the lakfs hadoop filesystem. But after reading the source/blogs I see that the lakefs hadoop filesystem only supports s3.
    Will have a look at minio... I may just reconfigure ceph to expose it's s3 endpoint. But I prefer working in POSIX
    Oz Katz

    Oz Katz

    2 weeks ago
    ahhh got it! ceph actually supports the s3 protocol https://docs.ceph.com/en/latest/radosgw/s3/
    this should work well with lakeFS the same way minio does
    while POSIX is a stronger filesystem API, lakeFS was designed with large scale data lakes in mind and so it's optimized for object stores, which are currently the most prevalent storage solution for those..
    t

    taylor schneider

    2 weeks ago
    when you say optimized do you mean in terms of speed or functionality?
    Oz Katz

    Oz Katz

    2 weeks ago
    interested to hear if you do manage to get ceph working with the s3 adapter!
    speed (as in throughput and parallelism, not latency) and compatibility. most data frameworks nowadays optimize specifically for object stores
    at least that's what we're seeing
    t

    taylor schneider

    2 weeks ago
    OK, I will let you know i get ceph working. I would be curious to see how the http traffic over s3 compares to the ceph-fuse driver speaking directly to the OSDs. My gut is telling me s3 would be slower all things equal
    @Oz Katz is it correct to assume that the only difference between the "Spark Quickstart" and "Basic Repo" routes is the creature comforts? At the end of the day, both paths through the wizard create a repo. There isnt a difference in repo type / functionality right?
    Oz Katz

    Oz Katz

    2 weeks ago
    that's 100% correct 🙂
    t

    taylor schneider

    2 weeks ago
    Thanks @Oz Katz!