https://lakefs.io/ logo
Docs
Join the conversationJoin Slack
Channels
announcements
blockers_for_windward
career-opportunities
celebrations
cuddle-corner
data-discussion
data-events
dev
events
general
help
iceberg-integration
lakefs-for-beginners
lakefs-hubspot-cloud-registration-email-automation
lakefs-releases
lakefs-suggestions
lakefs-twitter
linen-dev
memes-and-banter
new-channel
new-channel
say-hello
stackoverflow
test
Powered by Linen
dev
  • a

    Ariel Shaqed (Scolnicov)

    08/05/2021, 2:49 PM
    About the 8 extra bytes at the end of each LevelDB key inside a RocksDB SSTable: that's the "internal" keys, incorporating some versioning. Here's the documentation on Internal Keys.
    🤯 1
    🤔 2
    i
    1 reply · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    08/10/2021, 3:53 PM
    If you're having trouble pushing or doing anything GitHub really, bear in mind it is not feeling well: status.github.com. 😿
    o
    2 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    08/11/2021, 12:12 PM
    Reminder: you should be using 2-factor authentication on your GitHub account. Go and do that now if that's not the case. Still here? Great! If you have no access to your phone then your printed recovery codes, stored in a safe location, may be the only way to gain access to your account from a new device. These are your ultimate fallback, so if you don't know where they are, go to https://github.com/settings/auth/recovery-codes, print out a new set, and keep it safe!
    🔏 1
    🗝️ 1
    📜 1
    👍 1
    2️⃣ 1
    1 reply · 1 participant
  • y

    Yoni Augarten

    08/11/2021, 12:49 PM
    Opened this issue regarding changing the docs to exclude the wildcard DNS record from our basic documentation instructions, to reduce friction and simplify deployment. Question to everyone: for this to work, the user will have to force path-style access, at least in some AWS clients. Considering that 2 DNS records still need to be created, and that the s3_gatway_domain parameter still needs to be specified for lakeFS, how much simpler does it actually make things?
    g
    a
    2 replies · 3 participants
  • a

    Ariel Shaqed (Scolnicov)

    08/12/2021, 2:46 PM
    TIL AWS added a VPC reachability analyzer: give it a source and a destination endpoint, maybe a port, hit a button, (wait a few minutes), it says if that path works... and why it doesn't! https://console.aws.amazon.com/vpc/home?region=us-east-1#ReachabilityAnalyzer:
    👌🏻 1
    🤯 1
    👍 2
    👌 1
    1 reply · 1 participant
  • a

    Ariel Shaqed (Scolnicov)

    08/26/2021, 2:16 PM
    I've been testing https://github.com/treeverse/lakeFS/pull/2335 on our user metadata simulation (thanks, @Yoni Augarten!).... It seems to work! Details on the issue. Any objections to me pulling it? @Oz Katz @Guy Hardonag
    👌 5
    👌🏻 1
    y
    1 reply · 2 participants
  • m

    mishraprafful

    08/27/2021, 9:15 AM
    Hey, I wish to add a PR for some quick documentation bug but am unable to create a branch as I get a 403, is it possible for someone to add me as a contributor maybe ? My GitHub username is @mishraprafful P.S I have singed the SLA as mentioned here https://docs.lakefs.io/contributing.html
    y
    k
    7 replies · 3 participants
  • y

    Yoni Augarten

    08/29/2021, 9:23 AM
    Hey all, working on designing repo-level settings in lakeFS. I'm considering using Graveler for versioning, so as to allow a discoverable history of each setting. I was wondering if anyone has thoughts about whether this is possible and how to go about it. The way I see it, if we use the same database schema as the lakeFS installation, we must create new repositories for storing the settings, but those will be visible to the user which may be strange. WDYT?
    o
    4 replies · 2 participants
  • b

    Barak Amar

    08/31/2021, 6:14 AM
    TIL; vim users (sorry neovim) -
    :smile
    👏 1
    🙃 1
    🤩 1
    👍 1
    o
    2 replies · 2 participants
  • i

    Itai Admi

    09/02/2021, 9:59 AM
    Solving bug #2397 , I think it’s clear that we want the following 2 examples to store files in the same physical location:
    <s3://bucket/prefix>
    and
    <s3://bucket/prefix/>
    . i.e. all objects stored in
    <s3://bucket/prefix/>
    But what about cases when the user provides something strange like
    <s3://bucket/prefix///>
    , should we also store it under
    <s3://bucket/prefix/>
    , or is
    <s3://bucket/prefix///>
    appropriate in this case..?
    o
    2 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    09/09/2021, 7:36 AM
    Working on upgrade script for https://github.com/treeverse/lakeFS/pull/2430 - *Add "attach storage namespace [to repository]" IAM action type. There's a split between possibilities here, complexity of SQL balanced against code complexity for a script that should run only once. I would like the script to "add[...] a new rule allowing action
    fs:AttachRepositoryNamespace
    on
    *
    for any policy that allows
    fs:CreateRepository
    ". Three possibilities that I can see: • Golang all the way! Pros: probably more readable, easier to get right. uses a good programming language. Cons: (very) poor ops experience: not part of the migration library that we use (or, indeed, any reasonable migration library). • PostgreSQL
    jsonb
    SQL whackiness. Write 100% safe pure-SQL code that updates the table. Pros: Perfect ops experience. Cons: The function itself will be tricky to write - it needs to handle IAM-style wildcards (
    *
    ,
    ?
    ) entirely correctly. • PostgreSQL
    jsonb
    SQL whackiness but less whacky. Write SQL code that works on every reasonable IAM policy, but fails if the policy uses special (unused!) characters in action names. Fail migration if such a policy is found (unlikely; such a policy includes an action that is unused!). Specifically, fail for actions that include SQL-style wildcards (
    _
    .
    %
    ); there are no such actions in lakeFS (or anywhere in IAM, really). Unless I hear strenuous objections -- hopefully accompanied by suggestions how to do it better -- I am going with the third option. (Also posting to the PR, of course.)
    i
    2 replies · 2 participants
  • d

    datadavd

    09/26/2021, 9:03 PM
    Hey friends, I was wondering if someone can help point me in the right direction as to where this would necessarily be implemented. https://github.com/treeverse/lakeFS/issues/2415#issuecomment-923728336
    i
    2 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    09/27/2021, 7:14 AM
    TIL about GopherJS. (And I'm pleased, amused, and terrified... concurrently...).
    👍 1
    o
    2 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    09/27/2021, 9:10 AM
    About https://github.com/treeverse/lakeFS/issues/2491: while I disagree with the decision that we made, I am committed to it. So I am trying to use this issue in order to get some explicit arguments for our current design onto the record, which is probably https://github.com/treeverse/lakeFS/pull/2369 or the repo settings PR. My ideal resolution for this issue is to close it as "invalid". AFAIR @Itai Admi and I supported the kind of change that appears in #2491 in the F2F and written design reviews of protected branches. The general feeling in the room was against re-using existing mechanisms and instead creating a new mechanism, IIUC for simplicity of operation by users. However it is very likely I do not understand correctly. Most of all I want to maintain commitment to the agreed design. I agree that this is hard given the current state as documented, and I would be grateful if the people who best understand why we need a new mechanism would be willing to take the time to spell it out. But I do believe that we should continue the discussion if necessarily rather than re-open it, separately and with no context.
    i
    y
    3 replies · 3 participants
  • i

    Itai Admi

    10/10/2021, 12:01 PM
    Waiting for @Barak Amar to complete his PR and releasing a version, let me know if there are any other changes you want me to wait for.
    🙏 1
    👍 1
    b
    1 reply · 2 participants
  • d

    datadavd

    10/12/2021, 10:44 AM
    Any devs familiar with React? Trying to determine how to place paragraph level text under a React Bootstrap Modal.Header’s Modal.Title. My react/js skills are very week so I’m struggling with this. All my tests place the text to the right of the Modal.Header’s Modal.Title Also, I’m trying to similarly center paragraph level text right under component (closer to the component than its border below it; hopefully that makes sense). Right now I’ve got it looking good but the paragraph text is more “attached/closer” to the border than the component above (which is what I want). For reference, I’m trying to match the mocks @Yoni Augarten (I believe) provided me here for Create a Repository and Admin pages. I’m kinda struggling getting it perfect tbh 😿 https://github.com/treeverse/lakeFS/issues/2316#issuecomment-939459473
    y
    2 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    10/17/2021, 7:01 AM
    Trying to test multipart upload to our gateway using Spark as part of Nessie. I'm having a lot of trouble making Spark run multipart uploads to our S3 gateway. It's tough because we need big files and our containers are not so big. I want to be lazy, and run a program that exercises the S3a hadoopfs directly rather than going through Spark. Now I can just easily write a 20MiB file with 5MiB parts, read it back, and be done with it. Pros: Short, nonbrittle (small changes don't make the test silently stop testing what it should) and easy to write. Cons: Doesn't actually test Spark performing multipart uploads to the lakeFS S3 gateway, but merely the S3a hadoopfs. There might be differences between some Spark writer, due to arbitrary changes that it might make to S3a config or similar. I believe it still makes sense: the cons that we overcome are relatively minor; most would occur regardless unless we managed to hit the exact combination that triggers the hypothetical failure. (E.g. something like https://github.com/treeverse/lakeFS/issues/2429). WDYT?
    i
    3 replies · 2 participants
  • i

    Itai Admi

    10/21/2021, 2:03 PM
    I think we all agreed that the current way we handle the changelog is a mess (more details on the current process below). My suggestion is simple, revert it to the way it was before. The releaser looks at the completed PRs and gathers a list of meaningful changes. Yes, it means that the responsibility returns to the releaser. It also means that our commit messages need to describe the change better, which I think we’ve improved at. In case of ambiguity, we can always ask for more details and/or edit the release notes later. Current Process Author and reviewer are responsible for adding a line when required to the changelog. What normally happens is that they forget, only to open an additional PR to add that line when/if they remember later. When releasing the changelog becomes the release notes as is without further actions needed from the releaser.
    👍 1
    g
    a
    +3
    24 replies · 6 participants
  • o

    Oz Katz

    10/24/2021, 4:05 PM
    FYI - Apache Spark (i.e. the OSS, not the Databricks distribution) added a RocksDB StateStore in Spark 3.2.0 (recently released). This means new Spark versions now add a dependency on RocksDB - https://issues.apache.org/jira/browse/SPARK-34198 The version seems to be pinned to 6.20.3 - as we're currently bundling our our sstable parser, not sure if/how this affects us. Perhaps @Tal Sofer or @Ariel Shaqed (Scolnicov) can shed some light?
    a
    5 replies · 2 participants
  • i

    Itai Admi

    10/25/2021, 11:47 AM
    Planning on releasing a minor version soon, any PRs worth waiting for?
    p
    b
    6 replies · 3 participants
  • t

    Tal Sofer

    10/26/2021, 6:49 AM
    I’m using the getObject endpoint to get the contents of two objects (of size <= 100MB) that I would like to compare to calculate their diff. The getObject operation returns the object contents as an
    application/octet-stream
    , and I’m looking into using a react library that can calculate the diff for me, this library gets file contents as strings. • Should I first read the contents and save it in-memory and then compare it? • What is the right way to read from a stream in javascript? @Barak Amar @Ariel Shaqed (Scolnicov) do you have useful tips to share?
    i
    b
    +1
    4 replies · 4 participants
  • t

    Tal Sofer

    10/26/2021, 7:26 AM
    Another React question - I’m trying to add a dependency to our project and i’m getting the following error:
    webui % npm i react-diff-viewer
    
    npm ERR! code ERESOLVE
    npm ERR! ERESOLVE unable to resolve dependency tree
    npm ERR! 
    npm ERR! While resolving: lakefs-ui@0.1.0
    npm ERR! Found: react@17.0.2
    npm ERR! node_modules/react
    npm ERR!   react@"^17.0.0" from the root project
    npm ERR! 
    npm ERR! Could not resolve dependency:
    npm ERR! peer react@"^15.3.0 || ^16.0.0" from react-diff-viewer@3.1.1
    npm ERR! node_modules/react-diff-viewer
    npm ERR!   react-diff-viewer@"*" from the root project
    npm ERR! 
    npm ERR! Fix the upstream dependency conflict, or retry
    npm ERR! this command with --force, or --legacy-peer-deps
    npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
    npm ERR!
    Is this safe to use the
    --legacy-peer-deps
    flag to resolve this? reading about this make me suspect that it can lead to potential conflicts in the future. has anyone ran into a similar error?
    b
    8 replies · 2 participants
  • b

    Barak Amar

    10/26/2021, 11:07 AM
    Going to create a branch
    release/0.53.0
    based on the tag
    v0.53.0
    in order to include the fixes to release the python client and re-tag it as the version. If you see any issue please let me know
    a
    2 replies · 2 participants
  • b

    Barak Amar

    10/26/2021, 2:24 PM
    Let me know if there is anything pending you would like to include in the next release - for now it is bug fixes and clients.
    a
    1 reply · 2 participants
  • t

    Tal Sofer

    11/01/2021, 1:01 PM
    Working on the content diff feature and looking for feedback on visualisation - Specifically speaking, after adding the content diff feature, a directory and object entries differ only in their icon and directories ends with the “/” separator. both are expandable and the results of expanding them are different. @Oz Katz do you have thoughts about this?
    i
    o
    +1
    21 replies · 4 participants
  • t

    Tal Sofer

    11/07/2021, 8:30 AM
    Hi! I will appreciate an advice on how to implement the following: This is my code that wraps a call to our objects API -
    async get(repoId, ref, path, additionalHeaders) {
            const query = qs({path});
            const response = await apiRequest(`/repositories/${repoId}/refs/${ref}/objects?${query}`, {
                method: 'GET',
                headers: new Headers(additionalHeaders)
            });
            if (response.status !== 200 && response.status !== 206) {
                throw new Error(await extractError(response));
            }
            return response.text()
        }
    The promise it returns reads the object text from a stream and then returning. This get operation receives a Range header as a parameter and I need to track the response “Content-Range” to determine whether the full object content is returned or only part of it. The problem i’m facing is how to return this information while the get function must return a promise because i’m invoking it with useAPI? Can I somehow wrap the
    response.text(), response.headers.get("Content-Range")
    with a promise? @Barak Amar @Guy Hardonag @Ariel Shaqed (Scolnicov) maybe you have an advice?
    b
    a
    8 replies · 3 participants
  • i

    Itai Admi

    11/08/2021, 8:54 AM
    Working on issue#2629, trying to figure out what are the FileSystem calls being made to lakeFSFS and what are the calls it does to the underlying s3a FS. I’ve configured the following in
    /usr/lib/hadoop/etc/hadoop/log4j.properties
    file:
    log4j.appender.A1=org.apache.log4j.ConsoleAppender
    log4j.appender.A1.layout=org.apache.log4j.PatternLayout
    log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
    log4j.logger.io.lakefs=TRACE, A1
    But I still can’t see the TRACE logs in my stdout. @Tal Sofer @Ariel Shaqed (Scolnicov) @Barak Amar how should I configure the logs?
    a
    1 reply · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    11/10/2021, 7:14 AM
    Does anyone know how to add fields from context to those emitted by an AWS client? This is for https://github.com/treeverse/lakefs/issues/2682. Context (sorry): Looking to "thread" our request_id field throughout all lakeFS logs. One issue that has me confused is how to get AWS to log with request IDs. Something like https://github.com/aws/aws-sdk-go/pull/3485 would be great, but that PR went silent. I do no want to dupe the client only to change its configuration, that may be overpowering for the issue.
    b
    8 replies · 2 participants
  • a

    Ariel Shaqed (Scolnicov)

    11/10/2021, 9:31 AM
    Looking hard at upgrading to aws-sdk-go-v2. Why does the S3 block adapter have
    streamToS3
    that hanldes its own HTTP request? What did v1 not offer in the interface that made us have to use our own client code?
    b
    g
    12 replies · 3 participants
  • a

    Ariel Shaqed (Scolnicov)

    11/10/2021, 10:57 AM
    Looking at
    StreamingReader
    in our `block/s3` package (click to see): Size is
    int
    . Is that correct?
    1 reply · 1 participant
Powered by Linen
Title
a

Ariel Shaqed (Scolnicov)

11/10/2021, 10:57 AM
Looking at
StreamingReader
in our `block/s3` package (click to see): Size is
int
. Is that correct?
Regardless (huge uploads via the API might fail, assuming they succeed...) -- it seems a bit naff. Fixed it in https://github.com/treeverse/lakeFS/pull/2688 (thanks, @Guy Hardonag for the speedy review!)
👍 1
View count: 4