https://lakefs.io/ logo
Join Slack
Powered by
# help
  • m

    mpn mbn

    10/24/2024, 6:41 PM
    Hello team, It's sad that you can't use '@' in tags for something like "model-name@v0.1.2" You can create such tag, but it will give you "ref: invalid value: validation error" when you try to do something with it.
    n
    a
    • 3
    • 4
  • h

    Haoming Jiang

    10/24/2024, 9:37 PM
    Hi team I tried the latest 1.39.2 version https://github.com/treeverse/lakeFS/releases/tag/v1.39.2 The admin site seems not functional -- I cant click on Users / Groups
    a
    • 2
    • 4
  • m

    mpn mbn

    10/25/2024, 8:03 AM
    Hello team, Here is my X problem: I want to implement staging using branches (latest, devel, release etc.) in lakeFS. Each branch contains some folders with models and a single yaml file, describing where to find each model. When new model or new version of model is being pushed, it updates (1) model file itself, (2) yaml file (with metadata - new file hash and version). For example in my release stage I have model version v0.0.1, and I want to promote my devel model version v0.0.33 to release. I can neither use purely devel version of yaml file (because it will potentially update other models metadata), nor cherry-pick latest change to yaml file from devel to release (because there were a lot of changes since when devel and release were the same). But for the model file, I can just update file pointer in release stage. So here is my Y problem: I want to simply update model file pointer in new stage, and update yaml file manually (by downloading it, changing model metadata and pushing it back in some script). The question is: Can I simply update file pointer in lakeFS? If yes - how? UPD: Because I don't want to download and push models each time on promotion - they may be large. So I just want to change file pointers.
    a
    i
    • 3
    • 3
  • v

    Vincent Caldwell

    10/26/2024, 5:13 AM
    Are there any videos tutorials or resources for connecting a gcp bucket like there are for aws? I don't necessarily want to use a postgres db (the currently gcp get started instructions) for a variety of reasons. There is an aws connection video on youtube (great video btw ->

    https://www.youtube.com/watch?app=desktop&v=lr6ou-Vvy_A▾

    ) but nothing for gcp. Unfortunately, I don't know gcp well enough to figure out the finer points myself, but I need to learn asap. Can anyone help - even pointing me to docs, sites, etc? I would greatly appreciate it.
    a
    • 2
    • 4
  • r

    Rudy Cortembert

    10/27/2024, 9:18 PM
    Hello, is there any plan to support Azurite? I am working on a .NET Aspire custom integration and I would like to test Azure Storage locally running Azurite as an Azure Storage emulator. So far, it looks like lakeFS requires an Azure hosted blob storage and using the Azurite emulator is not possible. Thanks a lot in advance for your guidance!
    o
    a
    • 3
    • 6
  • p

    Parth Ghinaiya

    10/28/2024, 8:14 PM
    Hello Team, Is there any plan or focus to have connection between LakeFS and DLTHub? I'm using Dremio as search engine and LakeFS as DataLake versioning. I want to load a data using DLTHub. I have tried to find solutions but I couldn't. Thank you
    👀 1
    o
    • 2
    • 2
  • a

    Andrij David

    10/29/2024, 8:33 PM
    Hello, I know that it is possible to clone a repository using the command lakectl local clone. Are there any other ways to clone a given repository? For example, using the S3 endpoint, Python library: lakefs, or lakefs-specs?
    h
    n
    • 3
    • 2
  • a

    Andrij David

    10/29/2024, 8:46 PM
    Also Is there any way to make a repository completely public?
    n
    • 2
    • 10
  • h

    Haoming Jiang

    10/30/2024, 1:11 AM
    In the lua hooks, is there anyway we can write data based on the reference here: https://docs.lakefs.io/howto/hooks/lua.html#lua-library-reference I see we can read data by
    lakefs/get_object(repository_id, reference_id, path)
    , but I dont see how to write data
    o
    • 2
    • 4
  • m

    mpn mbn

    10/31/2024, 12:05 PM
    Hello team, I want to upload datasets to lakefs and version them. Each dataset is a separate folder with random files. For example I have folders (datasets) A and B. Files in A: a1, aa1 Files in B: b1, bb1 datasets-versions.yaml: A: v0.0.1 B: v0.0.1 I want to update dataset A - rewrite folder A contents. So after uploading new dataset, folder A contents are the following: Files in A: a2, aa2, aaa2 datasets-versions.yaml: A: v0.0.2 B: v0.0.1 I can do this by using commands:
    lakectl fs rm -r <lakefs://repo/branch/A>
    lakectl fs upload -r <lakefs://repo/branch/A> -s A
    My question is: How can I do this using Python lakefs package?
    o
    a
    h
    • 4
    • 7
  • o

    Ocean Chang

    11/06/2024, 8:12 AM
    How to setup remote authenticator that can have the lakeFS client to pass in additional values in headers and body? config.yaml
    Copy code
    auth:
      remote_authenticator:
        enabled: true
        endpoint: <https://testendpoint.com>
        default_user_group: "Developers"
      ui_config:
        logout_url: /logout
        login_cookie_names:
          - Authorization
    i
    a
    • 3
    • 6
  • b

    Boris

    11/07/2024, 1:01 PM
    Hello! I am trying to make a POST request to listPullRequests via the lakefs UI, but I get a 401 error "insufficient permissions". I used the lakefs demo environment and the standard repository. What didn't I do?
    i
    • 2
    • 3
  • o

    Ocean Chang

    11/08/2024, 2:23 AM
    Context: using the LakeFS API or Python SDK to fetch list of repositories and other API's Problem: First, making the
    v1/auth/login
    API call or the
    Client
    from SDK. They are successful with 200. Login API call returns the
    token
    and
    token_expiration
    However, when subsequently trying to call
    /api/v1/repositories
    , I m getting 401
    error authenticating request
    Question: Do I need to attach the login token being returned in order to make subsequent calls? If so, how?
    i
    m
    i
    • 4
    • 24
  • m

    Mike Fang

    11/08/2024, 7:09 PM
    Is there a way to overridde the default authentication for all API requests from the lakefs_sdk python? I found this :param _request_auth: set to override the auth_settings for an a single request; this effectively ignores the authentication in the spec for a single request. but this is only for every single API call. Is there a way to set it on the actual api_client? I am trying to do sigv4 auth on all requests from lakefs SDK. I am trying to proxy the LakeFS API through API Gateway with IAM authorization.
    i
    • 2
    • 2
  • m

    Mike Fang

    11/09/2024, 1:42 AM
    When I try to create repository from the UI I get this issue with S3:
    Copy code
    time="2024-11-09T01:33:57Z"
     level=warning msg="Could not access storage namespace" 
    func="pkg/api.(*Controller).CreateRepository" 
    file="lakeFS/pkg/api/controller.go:2016" error="operation error S3: 
    PutObject, https response error StatusCode: 400, RequestID: 
    GV2RCD8F49KSN5K3, HostID: 
    P2Te8QubRyKCczc2nt/cJ3YnGfIJFDD2vJRKYoKC7JuDkMkEgN6woYVtsfChFfRhkO2HvM10uYE=,
     api error InvalidRequest: Content-MD5 OR x-amz-checksum- HTTP header is
     required for Put Object requests with Object Lock parameters" 
    reason=unknown service=api_gateway 
    storage_namespace="<s3://nile-data-catalog-storefangmik-406016533510-dev/test-lakefs/>"
    is there something i am missing with setting up s3 wiht lakeFS? I believe the bucket permissions should be set up correctly object lock is usualy default for s3 buckets, do they need ot be turned off now for lakefs?
  • a

    Akshar Barot

    06/29/2025, 5:48 AM
    Sure. Thank you.
    👍 1
    • 1
    • 1
  • a

    A. Katsikarelis

    07/09/2025, 7:07 AM
    Thank you very much for the reply @Offir Cohen. Is garbage collection part of the OSS version?
    o
    • 2
    • 1
  • t

    TsuHao Wang

    07/10/2025, 9:54 PM
    Hello team, I have questions about the permission management. We have an enterprise LakeFS setup on AWS cloud. 1. For a user to download data from a repo, said programmatically, what are the least permissions to succeed the operations? Are they
    Get Repository
    ,
    Get Commit
    ,
    Get Branch
    ,
    Get Object
    ? 2. Can we limit users to access specific commit only? On the RBAC documentation, the Get Commit is only at the repo level (
    arn:lakefs:fs:::repository/{repositoryId}
    ) but not commit level. Thank you
    o
    • 2
    • 1
  • j

    Jason Trinidad

    07/16/2025, 2:53 PM
    Hi all - I'm new to lakefs and hoping to find a way to squash commits during merge. My thinking is that our commit history will also be the version history for the data. Ie I'd like a repo's
    main
    branch to show just the merge commits, which would reflect the final released data for each version. I don't see a squash functionality either on the GUI or in the docs. Does anyone know if one is available? Thanks!
    o
    a
    u
    • 4
    • 8
  • m

    Mark

    07/17/2025, 2:19 PM
    Hi all, I merged multiple branches into the main branch (the default branch), but due to dirty data, I attempted to use
    lakectl revert
    to roll back the main branch to the initial commit (with the message "Repository created"). However, this operation did not succeed. Could you advise me on how to achieve this? Are there alternative methods to revert the branch to its original state?
    ./lakectl branch revert <lakefs://e2e-dt/main> f66e8092ece39d11e2f3a10fab5342cb3a65cf881e237fcd4321eaedd4792dcf -y
    Branch: <lakefs://e2e-dt/main>
    update branch: no changes
    400 Bad Request
    o
    • 2
    • 2
  • k

    Kungim

    07/22/2025, 9:07 AM
    👋 Hello, team! I am trying to set up lakefs on-premises locally with postge, minio and ACL. However, lakefs fails with the following logs and keeps restarting
    Copy code
    {"file":"_build/pkg/auth/basic_service.go:33","func":"pkg/auth.NewBasicAuthService","level":"info","msg":"initialized Auth service","service":"auth_service","time":"2025-07-22T08:49:39Z"}
    {"error":"no users configured: auth migration not possible","file":"_build/pkg/auth/factory/build.go:50","func":"pkg/auth/factory.NewAuthService","level":"fatal","msg":"\ncannot migrate existing user to basic auth mode!\nPlease run \"lakefs superuser -h\" and follow the instructions on how to migrate an existing user\n","time":"2025-07-22T08:49:39Z"}
    How do I fix it? Here is my docker-compose.yml
    Copy code
    services:
      postgres:
        container_name: pg-lakefs
        image: postgres:13
        ports:
          - "5432:5432"
        secrets:
          - postgres_user
          - postgres_password
        environment:
          POSTGRES_DB: lakefs_db
          POSTGRES_USER_FILE: /run/secrets/postgres_user
          POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password
        volumes:
          - pg_lakefs_data:/var/lib/postgresql/data
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U $(cat /run/secrets/postgres_user)"]
          interval: 1s
          timeout: 5s
          retries: 5
        restart: always
    
      minio:
        container_name: minio
        image: <http://quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z|quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z>
        ports:
          - "9000:9000"
          - "9001:9001"
        volumes: 
          - minio_data:/data
        secrets:
          - minio_root_user
          - minio_root_password
        restart: always
        environment:
          MINIO_ROOT_USER_FILE: /run/secrets/minio_root_user
          MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
        command: ["server", "/data", "--console-address", ":9001"]
    
      lakefs:
        container_name: lakefs
        build:
          context: .
          dockerfile: Dockerfile.lakefs
        ports:
          - "8000:8000"
        volumes:
          - lakefs_data:/data
        secrets:
          - lakefs_config
        depends_on:
          postgres:
            condition: service_healthy
          minio:
            condition: service_started
          acl:
            condition: service_started
        restart: always
        command: sh -c "cp /run/secrets/lakefs_config /app/lakefs_config.yaml && /app/lakefs run --config /app/lakefs_config.yaml"
    
      acl:
        container_name: acl
        build:
          context: .
          dockerfile: Dockerfile.acl
        ports:
          - "8001:8001"
        secrets:
          - acl_config
        depends_on:
          postgres:
            condition: service_healthy
        restart: always
        command: sh -c "cp /run/secrets/acl_config /app/acl_config.yaml && /app/acl run --config /app/acl_config.yaml"
    
    volumes:
      pg_lakefs_data:
      minio_data:
      lakefs_data:
    
    secrets:
      postgres_user:
        file: .secrets/postgres_user.txt
      postgres_password:
        file: .secrets/postgres_password.txt
      minio_root_user:
        file: .secrets/minio_root_user.txt
      minio_root_password:
        file: .secrets/minio_root_password.txt
      lakefs_config:
        file: .secrets/.lakefs.yaml
      acl_config:
        file: .secrets/.aclserver.yaml
    .aclserver.yaml
    Copy code
    listen_address: ":8001"
    
    database:
      type: "postgres"
      postgres:
          connection_string: "<postgres://user:pass@postgres:5432/db?sslmode=disable>"
    
    encrypt:
      secret_key: "secret"
    .lakefs.yaml
    Copy code
    logging:
      format: json
      level: INFO
      output: "-"
    
    auth:
      encrypt:
        secret_key: "secret"
    
    blockstore:
      type: s3
      s3:
        force_path_style: true
        endpoint: <http://minio:9000>
        discover_bucket_region: false
        credentials:
          access_key_id: key_id
          secret_access_key: secret
    
    listen_address: "0.0.0.0:8000"
    
    database:
      type: "postgres"
      postgres:
        connection_string: "<postgres://user:pass@postgres:5432/db?sslmode=disable>"
    Please help 🙂
    i
    b
    • 3
    • 13
  • n

    Nikolai Potapov

    07/27/2025, 9:00 AM
    Hello everyone! Does lakeFS have any tutorials or training lessons/videos to help understand how it works and its intricacies?
    b
    i
    a
    • 4
    • 3
  • u

    薛宇豪

    08/07/2025, 1:00 AM
    Hi, I have a question about GC: If I only call
    getPhysicalAddress
    and am writing a file through the S3 interface, and GC is triggered before
    linkPhysicalAddress
    is called, the S3 object will be collected but not marked as active. Will this cause a false GC?
    a
    n
    • 3
    • 4
  • u

    薛宇豪

    08/07/2025, 5:11 AM
    About monitoring, does the grafana dashboard have import code?
    a
    • 2
    • 5
  • u

    薛宇豪

    08/08/2025, 5:29 AM
    Hi, what is the
    create commit record
    API used for? Can I use it to add metadata to an existing commit?
    a
    • 2
    • 4
  • a

    Aaron Taylor

    08/11/2025, 11:33 PM
    We've been encountering an issue where LakeFS files that our system is creating end up being created as directories rather than files, causing issues when other processes try to create them. We've been able to reproduce the "consumer" side of the issue with
    lakectl local checkout
    which produces an error of the following form (file paths edited):
    Copy code
    $ lakectl local checkout --yes .
    ...
    download path/to/example.jsonl failed: could not create file '/Users/aaron/repo/data/path/to/example.jsonl': open /Users/aaron/repo/data/path/to/example.jsonl failed: is a directory
    The LakeFS location looks like this (paths changed, other things not):
    Copy code
    $ lakectl fs ls -r <lakefs://example/COMMIT/path/to/>
    object          2025-08-09 09:15:10 -0700 PDT    83.5 kB         path/to/example.jsonl
    object          2025-08-01 12:06:13 -0700 PDT    86.6 kB         path/to/example.jsonl/9e0b1aabbf762a4494e47dd282e5c4cca1daaed40ac96f8ffcc61ecf38a47242
    What it appears is that some LakeFS operation is partially failing, causing it to leave the object in some sort of broken state? Any guidance on how best to debug this? We've written a script to clean these up and re-run things but that's obviously not ideal! One theory is that seems to happen when the LakeFS deployment is under higher load.
    a
    n
    • 3
    • 4
  • u

    薛宇豪

    08/12/2025, 9:54 AM
    Hi, does Lakefs have a limit on the number of repositories? I ask this question because I noticed that the pgsql implementation is configured with 100 partitioned tables, and the data related to each repository is stored in the same partitioned table. Therefore, I am unsure whether having a large number of repositories would cause any additional issues or side effects. Additionally, what are the benefits of storing all data under the same table structure rather than using different tables? Would using different tables potentially reduce serialization overhead?
    a
    i
    • 3
    • 4
  • u

    薛宇豪

    08/13/2025, 9:16 AM
    Is there any way to restore a branch that was accidentally deleted? Manually querying the database is also acceptable. Or is there any way to prevent a branch from being deleted?
    a
    h
    • 3
    • 13
  • a

    Alan judi

    08/13/2025, 11:39 PM
    Hello Guys, I have setup lakeFS community on my k8s cluster. When I am in the dashboard, I get the following error. Upon inspecting my pod running lakeFS, I see the following:
    Copy code
    time="2025-08-13T22:53:36Z" level=error msg="failed to create user" func="pkg/auth.(*APIAuthService).CreateUser" file="build/pkg/auth/service.go:213" error="Post \"/auth/users\": unsupported protocol scheme \"\"" service=auth_api username=admin
    time="2025-08-13T22:53:36Z" level=error msg="API call returned status internal server error" func="pkg/api.(*Controller).handleAPIErrorCallback" file="build/pkg/api/controller.go:3033" error="create user - Post \"/auth/users\": unsupported protocol scheme \"\"" host=lakefs.*****.com method=POST operation_id=Setup path=/api/v1/setup_lakefs service=api_gateway
    time="2025-08-13T23:31:41Z" level=error msg="failed to create user" func="pkg/auth.(*APIAuthService).CreateUser" file="build/pkg/auth/service.go:213" error="Post \"/auth/users\": unsupported protocol scheme \"\"" service=auth_api username=admin
    time="2025-08-13T23:31:41Z" level=error msg="API call returned status internal server error" func="pkg/api.(*Controller).handleAPIErrorCallback" file="build/pkg/api/controller.go:3033" error="create user - Post \"/auth/users\": unsupported protocol scheme \"\"" host=lakefs.******.com method=POST operation_id=Setup path=/api/v1/setup_lakefs service=api_gateway
    Here are my helm chart values:
    Copy code
    # lakeFS server configuration
    lakefsConfig: |
      logging:
        level: "INFO"
      database:
        type: postgres
        postgres:
          connection_string: "postgres://****:****@****:5432/postgres?sslmode=disable"
      blockstore:
        type: s3
        s3:
          region: us-west-2
      auth:
        # Optional: map display names & default groups from ID token claims
        api:
          skip_health_check: true
          supports_invites: false
          endpoint: ""
        authentication_api:
          endpoint: ""
          external_principals_enabled: false
        ui_config:
           rbac: simplified
           login_url: /auth/login
           logout_url: /auth/logout
    n
    • 2
    • 4
  • j

    Jeffrey Ji

    08/17/2025, 1:28 AM
    hello folks, seems the career page doesn't work, I cannot submit my resume
    i
    i
    • 3
    • 2