• Bjorn Olsen

    Bjorn Olsen

    5 months ago
    Hey all. I've just finished a draft write-up on my time playing with LakeFS. Would appreciate a review and some feedback, before I publish 🙂 Especially where I've marked "TODO review" No rush.https://medium.com/p/1237f73d4a1c
    Bjorn Olsen
    1 replies
    Copy to Clipboard
  • Clinton Monk

    Clinton Monk

    5 months ago
    Hi, I have a question on if I'm using LakeFS correctly. Say that I have a
    main
    branch with tagged commits. I create an
    experiment
    branch from one of those tags. There are more recent commits on
    main
    . I want to essentially cherry-pick one of those recent commits from
    main
    onto my
    experiment
    branch. What is the process for that? I'm most interested in avoiding copying the physical data, but I may have a use case where I want to use same commit ID if possible. From what I found, I think I can copy references to the data:1. Use the API to list files (
    ls
    ) at the LakeFS path on the commit I want to copy. 2. Use the API to
    stat
    each file to get the physical address and other metadata. 3. Use the API to stage each file onto my
    experiment
    branch, using the physical address and other metadata retrieved from the stat operation. 4. Commit those staged changes. Does that sound right or is there a more preferred approach to this problem?
    Clinton Monk
    Barak Amar
    +1
    7 replies
    Copy to Clipboard
  • Stéphane Burwash

    Stéphane Burwash

    5 months ago
    Hi! I'm planning on deploying lakefs on our GCP storage so that we can version-control our data in GBQ. How would I go about doing this, and will I have to bring any modifications to existing code or make modifications everytime I change branches. Thanks!
    Stéphane Burwash
    Barak Amar
    4 replies
    Copy to Clipboard
  • Stéphane Burwash

    Stéphane Burwash

    5 months ago
    Hi! Me again. If I want to set up LakeFS on GCP, does the original endpoint (bucket=>user) still work? I want to test LakeFS on our buckets, but without affecting the BI team's current setup until we're ready to make the transition
    Stéphane Burwash
    Lynn Rozen
    5 replies
    Copy to Clipboard
  • d

    donald

    5 months ago
    I used the "lakectl fs upload" command to upload batch of data. But when I looked at the lakectl reference document, I didn't find option to download/checkout the batch of data I just uploaded. should I use asw cli to do it? what is the best way to download/checkout these data?
    d
    Lynn Rozen
    +1
    4 replies
    Copy to Clipboard
  • d

    donald

    5 months ago
    I just tried to use asw cli to list the file in lakefs, "aws s3 --profile lakefs --endpoint-url http://127.0.0.1/api/v1 ls s3://myexample/main/", but got an error "maximum recusion depth exceeded in comparision". Can anyone give me a clue for this?
    d
    Shimi Wieder
    +2
    12 replies
    Copy to Clipboard
  • Clinton Monk

    Clinton Monk

    5 months ago
    Hi! We are using S3 and want to restrict write access so that, once data is committed to LakeFS, it cannot be updated or removed. Is this possible? If so, what setup do you recommend? We are planning to use the Hadoop Filesystem so that our Spark jobs can write directly to S3. For this to work, the Spark job cluster must have write access to the entire LakeFS S3 bucket. This means that the Spark job could overwrite existing committed files. One idea is to add logic around the staging operations (i.e. between LakeFS and the client) to update the bucket's policy to allow the path to be writable. Similarly, extra logic would be added on commits to remove the write permission for committed paths. This doesn't seem scalable though if there are a large number of files being concurrently staged and committed. We are considering S3 Gateway instead. LakeFS would be the only one with write access to the bucket; all data write requests would go through the gateway. It is unclear though if S3 Gateway has its own permissions that can be configured or if they are simply passthrough to S3? For example, could we create a role that has write permission in LakeFS' S3 Gateway but not write access to the underlying S3 bucket?
    Clinton Monk
    Yoni Augarten
    +1
    10 replies
    Copy to Clipboard
  • Ryan Skinner

    Ryan Skinner

    5 months ago
    Hi all, I was looking for some help using the
    lakectl
    CLI to create a new branch. Ive checked the the docs but Im having trouble running the following command
    lakectl branch create <lakefs://demo-repo> -s <lakefs://demo-repo/upload>
    Invalid branch: not a valid ref uri
    Error executing command.
    Ryan Skinner
    Yoni Augarten
    11 replies
    Copy to Clipboard
  • Ryan Green

    Ryan Green

    5 months ago
    Hi, I have question about LakeFS and DB's DeltaTables. Here's the scenario I'm thinking about:1. I have a DeltaTable with, say, k versions on my main branch 2. I then create a dev branch and merge data into the DeltaTable (creating version k+1) 3. I merge the dev branch into the main branch So I'm expecting that the DeltaTable on the main branch reflects version k+1. could someone help me understand how data is being written and copied in steps 2 and 3? is it the entire DeltaTable or just the differences between versions k and k+1? (also, realize this example may not be entirely clear so happy to help expand on it if that's helpful)
    Ryan Green
    Ariel Shaqed (Scolnicov)
    6 replies
    Copy to Clipboard
  • v

    Verun Rahimtoola

    5 months ago
    hi, can someone confirm if lakefs works with iceberg tables in hive as well? i came across this: https://lakefs.io/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/ which suggests that it should work seamlessly, but currently we are having a hard time getting hive + iceberg + lakefs to work with external tables.
    v
    Shimi Wieder
    +2
    22 replies
    Copy to Clipboard