https://lakefs.io/ logo
Join Slack
Powered by
# help
  • o

    Ocean Chang

    11/06/2024, 8:12 AM
    How to setup remote authenticator that can have the lakeFS client to pass in additional values in headers and body? config.yaml
    Copy code
    auth:
      remote_authenticator:
        enabled: true
        endpoint: <https://testendpoint.com>
        default_user_group: "Developers"
      ui_config:
        logout_url: /logout
        login_cookie_names:
          - Authorization
    i
    a
    • 3
    • 6
  • b

    Boris

    11/07/2024, 1:01 PM
    Hello! I am trying to make a POST request to listPullRequests via the lakefs UI, but I get a 401 error "insufficient permissions". I used the lakefs demo environment and the standard repository. What didn't I do?
    i
    • 2
    • 3
  • o

    Ocean Chang

    11/08/2024, 2:23 AM
    Context: using the LakeFS API or Python SDK to fetch list of repositories and other API's Problem: First, making the
    v1/auth/login
    API call or the
    Client
    from SDK. They are successful with 200. Login API call returns the
    token
    and
    token_expiration
    However, when subsequently trying to call
    /api/v1/repositories
    , I m getting 401
    error authenticating request
    Question: Do I need to attach the login token being returned in order to make subsequent calls? If so, how?
    i
    m
    i
    • 4
    • 24
  • m

    Mike Fang

    11/08/2024, 7:09 PM
    Is there a way to overridde the default authentication for all API requests from the lakefs_sdk python? I found this :param _request_auth: set to override the auth_settings for an a single request; this effectively ignores the authentication in the spec for a single request. but this is only for every single API call. Is there a way to set it on the actual api_client? I am trying to do sigv4 auth on all requests from lakefs SDK. I am trying to proxy the LakeFS API through API Gateway with IAM authorization.
    i
    • 2
    • 2
  • m

    Mike Fang

    11/09/2024, 1:42 AM
    When I try to create repository from the UI I get this issue with S3:
    Copy code
    time="2024-11-09T01:33:57Z"
     level=warning msg="Could not access storage namespace" 
    func="pkg/api.(*Controller).CreateRepository" 
    file="lakeFS/pkg/api/controller.go:2016" error="operation error S3: 
    PutObject, https response error StatusCode: 400, RequestID: 
    GV2RCD8F49KSN5K3, HostID: 
    P2Te8QubRyKCczc2nt/cJ3YnGfIJFDD2vJRKYoKC7JuDkMkEgN6woYVtsfChFfRhkO2HvM10uYE=,
     api error InvalidRequest: Content-MD5 OR x-amz-checksum- HTTP header is
     required for Put Object requests with Object Lock parameters" 
    reason=unknown service=api_gateway 
    storage_namespace="<s3://nile-data-catalog-storefangmik-406016533510-dev/test-lakefs/>"
    is there something i am missing with setting up s3 wiht lakeFS? I believe the bucket permissions should be set up correctly object lock is usualy default for s3 buckets, do they need ot be turned off now for lakefs?
  • a

    Akshar Barot

    06/29/2025, 5:48 AM
    Sure. Thank you.
    👍 1
    • 1
    • 1
  • a

    A. Katsikarelis

    07/09/2025, 7:07 AM
    Thank you very much for the reply @Offir Cohen. Is garbage collection part of the OSS version?
    o
    • 2
    • 1
  • t

    TsuHao Wang

    07/10/2025, 9:54 PM
    Hello team, I have questions about the permission management. We have an enterprise LakeFS setup on AWS cloud. 1. For a user to download data from a repo, said programmatically, what are the least permissions to succeed the operations? Are they
    Get Repository
    ,
    Get Commit
    ,
    Get Branch
    ,
    Get Object
    ? 2. Can we limit users to access specific commit only? On the RBAC documentation, the Get Commit is only at the repo level (
    arn:lakefs:fs:::repository/{repositoryId}
    ) but not commit level. Thank you
    o
    • 2
    • 1
  • j

    Jason Trinidad

    07/16/2025, 2:53 PM
    Hi all - I'm new to lakefs and hoping to find a way to squash commits during merge. My thinking is that our commit history will also be the version history for the data. Ie I'd like a repo's
    main
    branch to show just the merge commits, which would reflect the final released data for each version. I don't see a squash functionality either on the GUI or in the docs. Does anyone know if one is available? Thanks!
    o
    a
    u
    • 4
    • 8
  • m

    Mark

    07/17/2025, 2:19 PM
    Hi all, I merged multiple branches into the main branch (the default branch), but due to dirty data, I attempted to use
    lakectl revert
    to roll back the main branch to the initial commit (with the message "Repository created"). However, this operation did not succeed. Could you advise me on how to achieve this? Are there alternative methods to revert the branch to its original state?
    ./lakectl branch revert <lakefs://e2e-dt/main> f66e8092ece39d11e2f3a10fab5342cb3a65cf881e237fcd4321eaedd4792dcf -y
    Branch: <lakefs://e2e-dt/main>
    update branch: no changes
    400 Bad Request
    o
    • 2
    • 2
  • k

    Kungim

    07/22/2025, 9:07 AM
    👋 Hello, team! I am trying to set up lakefs on-premises locally with postge, minio and ACL. However, lakefs fails with the following logs and keeps restarting
    Copy code
    {"file":"_build/pkg/auth/basic_service.go:33","func":"pkg/auth.NewBasicAuthService","level":"info","msg":"initialized Auth service","service":"auth_service","time":"2025-07-22T08:49:39Z"}
    {"error":"no users configured: auth migration not possible","file":"_build/pkg/auth/factory/build.go:50","func":"pkg/auth/factory.NewAuthService","level":"fatal","msg":"\ncannot migrate existing user to basic auth mode!\nPlease run \"lakefs superuser -h\" and follow the instructions on how to migrate an existing user\n","time":"2025-07-22T08:49:39Z"}
    How do I fix it? Here is my docker-compose.yml
    Copy code
    services:
      postgres:
        container_name: pg-lakefs
        image: postgres:13
        ports:
          - "5432:5432"
        secrets:
          - postgres_user
          - postgres_password
        environment:
          POSTGRES_DB: lakefs_db
          POSTGRES_USER_FILE: /run/secrets/postgres_user
          POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password
        volumes:
          - pg_lakefs_data:/var/lib/postgresql/data
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U $(cat /run/secrets/postgres_user)"]
          interval: 1s
          timeout: 5s
          retries: 5
        restart: always
    
      minio:
        container_name: minio
        image: <http://quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z|quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z>
        ports:
          - "9000:9000"
          - "9001:9001"
        volumes: 
          - minio_data:/data
        secrets:
          - minio_root_user
          - minio_root_password
        restart: always
        environment:
          MINIO_ROOT_USER_FILE: /run/secrets/minio_root_user
          MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
        command: ["server", "/data", "--console-address", ":9001"]
    
      lakefs:
        container_name: lakefs
        build:
          context: .
          dockerfile: Dockerfile.lakefs
        ports:
          - "8000:8000"
        volumes:
          - lakefs_data:/data
        secrets:
          - lakefs_config
        depends_on:
          postgres:
            condition: service_healthy
          minio:
            condition: service_started
          acl:
            condition: service_started
        restart: always
        command: sh -c "cp /run/secrets/lakefs_config /app/lakefs_config.yaml && /app/lakefs run --config /app/lakefs_config.yaml"
    
      acl:
        container_name: acl
        build:
          context: .
          dockerfile: Dockerfile.acl
        ports:
          - "8001:8001"
        secrets:
          - acl_config
        depends_on:
          postgres:
            condition: service_healthy
        restart: always
        command: sh -c "cp /run/secrets/acl_config /app/acl_config.yaml && /app/acl run --config /app/acl_config.yaml"
    
    volumes:
      pg_lakefs_data:
      minio_data:
      lakefs_data:
    
    secrets:
      postgres_user:
        file: .secrets/postgres_user.txt
      postgres_password:
        file: .secrets/postgres_password.txt
      minio_root_user:
        file: .secrets/minio_root_user.txt
      minio_root_password:
        file: .secrets/minio_root_password.txt
      lakefs_config:
        file: .secrets/.lakefs.yaml
      acl_config:
        file: .secrets/.aclserver.yaml
    .aclserver.yaml
    Copy code
    listen_address: ":8001"
    
    database:
      type: "postgres"
      postgres:
          connection_string: "<postgres://user:pass@postgres:5432/db?sslmode=disable>"
    
    encrypt:
      secret_key: "secret"
    .lakefs.yaml
    Copy code
    logging:
      format: json
      level: INFO
      output: "-"
    
    auth:
      encrypt:
        secret_key: "secret"
    
    blockstore:
      type: s3
      s3:
        force_path_style: true
        endpoint: <http://minio:9000>
        discover_bucket_region: false
        credentials:
          access_key_id: key_id
          secret_access_key: secret
    
    listen_address: "0.0.0.0:8000"
    
    database:
      type: "postgres"
      postgres:
        connection_string: "<postgres://user:pass@postgres:5432/db?sslmode=disable>"
    Please help 🙂
    i
    b
    • 3
    • 13
  • n

    Nikolai Potapov

    07/27/2025, 9:00 AM
    Hello everyone! Does lakeFS have any tutorials or training lessons/videos to help understand how it works and its intricacies?
    b
    i
    a
    • 4
    • 3
  • u

    薛宇豪

    08/07/2025, 1:00 AM
    Hi, I have a question about GC: If I only call
    getPhysicalAddress
    and am writing a file through the S3 interface, and GC is triggered before
    linkPhysicalAddress
    is called, the S3 object will be collected but not marked as active. Will this cause a false GC?
    a
    n
    • 3
    • 4
  • u

    薛宇豪

    08/07/2025, 5:11 AM
    About monitoring, does the grafana dashboard have import code?
    a
    • 2
    • 5
  • u

    薛宇豪

    08/08/2025, 5:29 AM
    Hi, what is the
    create commit record
    API used for? Can I use it to add metadata to an existing commit?
    a
    • 2
    • 4
  • a

    Aaron Taylor

    08/11/2025, 11:33 PM
    We've been encountering an issue where LakeFS files that our system is creating end up being created as directories rather than files, causing issues when other processes try to create them. We've been able to reproduce the "consumer" side of the issue with
    lakectl local checkout
    which produces an error of the following form (file paths edited):
    Copy code
    $ lakectl local checkout --yes .
    ...
    download path/to/example.jsonl failed: could not create file '/Users/aaron/repo/data/path/to/example.jsonl': open /Users/aaron/repo/data/path/to/example.jsonl failed: is a directory
    The LakeFS location looks like this (paths changed, other things not):
    Copy code
    $ lakectl fs ls -r <lakefs://example/COMMIT/path/to/>
    object          2025-08-09 09:15:10 -0700 PDT    83.5 kB         path/to/example.jsonl
    object          2025-08-01 12:06:13 -0700 PDT    86.6 kB         path/to/example.jsonl/9e0b1aabbf762a4494e47dd282e5c4cca1daaed40ac96f8ffcc61ecf38a47242
    What it appears is that some LakeFS operation is partially failing, causing it to leave the object in some sort of broken state? Any guidance on how best to debug this? We've written a script to clean these up and re-run things but that's obviously not ideal! One theory is that seems to happen when the LakeFS deployment is under higher load.
    a
    n
    • 3
    • 5
  • u

    薛宇豪

    08/12/2025, 9:54 AM
    Hi, does Lakefs have a limit on the number of repositories? I ask this question because I noticed that the pgsql implementation is configured with 100 partitioned tables, and the data related to each repository is stored in the same partitioned table. Therefore, I am unsure whether having a large number of repositories would cause any additional issues or side effects. Additionally, what are the benefits of storing all data under the same table structure rather than using different tables? Would using different tables potentially reduce serialization overhead?
    a
    i
    • 3
    • 4
  • u

    薛宇豪

    08/13/2025, 9:16 AM
    Is there any way to restore a branch that was accidentally deleted? Manually querying the database is also acceptable. Or is there any way to prevent a branch from being deleted?
    a
    h
    • 3
    • 13
  • a

    Alan judi

    08/13/2025, 11:39 PM
    Hello Guys, I have setup lakeFS community on my k8s cluster. When I am in the dashboard, I get the following error. Upon inspecting my pod running lakeFS, I see the following:
    Copy code
    time="2025-08-13T22:53:36Z" level=error msg="failed to create user" func="pkg/auth.(*APIAuthService).CreateUser" file="build/pkg/auth/service.go:213" error="Post \"/auth/users\": unsupported protocol scheme \"\"" service=auth_api username=admin
    time="2025-08-13T22:53:36Z" level=error msg="API call returned status internal server error" func="pkg/api.(*Controller).handleAPIErrorCallback" file="build/pkg/api/controller.go:3033" error="create user - Post \"/auth/users\": unsupported protocol scheme \"\"" host=lakefs.*****.com method=POST operation_id=Setup path=/api/v1/setup_lakefs service=api_gateway
    time="2025-08-13T23:31:41Z" level=error msg="failed to create user" func="pkg/auth.(*APIAuthService).CreateUser" file="build/pkg/auth/service.go:213" error="Post \"/auth/users\": unsupported protocol scheme \"\"" service=auth_api username=admin
    time="2025-08-13T23:31:41Z" level=error msg="API call returned status internal server error" func="pkg/api.(*Controller).handleAPIErrorCallback" file="build/pkg/api/controller.go:3033" error="create user - Post \"/auth/users\": unsupported protocol scheme \"\"" host=lakefs.******.com method=POST operation_id=Setup path=/api/v1/setup_lakefs service=api_gateway
    Here are my helm chart values:
    Copy code
    # lakeFS server configuration
    lakefsConfig: |
      logging:
        level: "INFO"
      database:
        type: postgres
        postgres:
          connection_string: "postgres://****:****@****:5432/postgres?sslmode=disable"
      blockstore:
        type: s3
        s3:
          region: us-west-2
      auth:
        # Optional: map display names & default groups from ID token claims
        api:
          skip_health_check: true
          supports_invites: false
          endpoint: ""
        authentication_api:
          endpoint: ""
          external_principals_enabled: false
        ui_config:
           rbac: simplified
           login_url: /auth/login
           logout_url: /auth/logout
    n
    • 2
    • 4
  • j

    Jeffrey Ji

    08/17/2025, 1:28 AM
    hello folks, seems the career page doesn't work, I cannot submit my resume
    i
    i
    • 3
    • 2
  • u

    薛宇豪

    08/26/2025, 8:54 AM
    Hi, I want to mount the lakefs frontend under an existing domain, such as https://test.domain.com/lakefs. This way, all requests to the backend API will also include /lakefs. I need to change the original /api/v1 to /lakefs/api/v1. I see that the current helm chart supports configuring
    ingress.hosts.paths
    . Is it possible to directly modify this configuration? However, I see the frontend JS has a hardcoded `export const API_ENDPOINT = '/api/v1'`; https://github.com/treeverse/lakeFS/blob/master/webui/src/lib/api/index.js#L1
    b
    • 2
    • 7
  • c

    Carlos Luque

    09/02/2025, 8:18 AM
    Hi! one question, the OSS version only supports one user?
    i
    i
    • 3
    • 4
  • k

    Kungim

    09/03/2025, 7:28 AM
    Hello Team! I am trying to make a c# client api as a library using openapi generator using /api/swagger.yml, but I noticed that the api is split into 3 files: /api/authentication.yml, /api/authorization.yml and /api/swagger.yml. Do I need to combine them somehow to get full API? Building with just /api/swagger.yml seems to be missing some API functionality. How do I build the full API? Looking forward to any response!
    i
    • 2
    • 7
  • j

    Jose Ignacio Gascon Conde

    09/03/2025, 8:12 AM
    Hi team, I'm having a persistent issue trying to deploy LakeFS to an EKS cluster using the Terraform
    helm_release
    resource, and I'm hoping someone might have some insight. Passing configuration via `values`: I've tried passing the configuration using both the
    lakefsConfig
    key and the
    config
    key (as shown on Artifact Hub). In both cases,
    helm get values lakefs
    confirms that Helm receives the correct values from Terraform. However, the resulting
    ConfigMap
    in the cluster is still the default one.
    o
    • 2
    • 2
  • m

    Mingke Wang

    09/03/2025, 3:01 PM
    Hi guys, I'm a student in ML and want to use lake mount to mount the dataset since the dataset I have is about 3TB. Is there any cheap option instead of buying the enterprise version?
    i
    h
    • 3
    • 3
  • c

    Carlos Luque

    09/04/2025, 8:18 AM
    Hey everyone, just wanted to share some concerns about LakeFS (Version 1.29.0) 1. Is LakeFS removing from S3 the folder created when a repository is deleted? a. If not, why? I mean, LakeFS is a data versioning tool, if we are keeping data that was potentially removed by the user why are we keeping that in S3 2. Removing a repository make that name not usable anymore (I suppose this is coming by the concern explained above) 3. When I upload the same object to LakeFS (without any change), store the object again, taking up storage space (for small object this is not a big deal but since people normally saves here data and the common usage is to upload the folder directly, not the edited files only) 4. Creating Tags consume storage space?
    i
    • 2
    • 4
  • j

    Jiadong Bai

    03/16/2025, 9:29 PM
    Hi there, I am wondering if there is a native API to download the whole branch/commit as a zip file? I looked through the open API specification but seems that there is no such API.
  • i

    Ion

    09/16/2025, 12:45 PM
    I am seeing random failures
    SignatureDoesNotMatch
    the request signature we calculated does not match the signature you provided. Check your key and signing method.
    Any ideas, I found a issue in the repo that also points to boto, but I am using obstore (object-store-rs)
    j
    o
    • 3
    • 5
  • c

    Carlos Luque

    09/17/2025, 3:12 PM
    Hi! one question, are you going to introduce templates or any way to include a template in the Compare (Pull Request)?
    o
    • 2
    • 1
  • u

    薛宇豪

    09/18/2025, 6:33 AM
    Hey, I'm trying to build a customized LakeFS server. After modifying the code, running
    make build-docker
    doesn't seem to generate a docker image with my local code. Is it still pulling the GitHub code for the build?