https://lakefs.io/ logo
Join Slack
Powered by
# help
  • a

    A. Katsikarelis

    07/09/2025, 7:07 AM
    Thank you very much for the reply @Offir Cohen. Is garbage collection part of the OSS version?
    o
    • 2
    • 1
  • t

    TsuHao Wang

    07/10/2025, 9:54 PM
    Hello team, I have questions about the permission management. We have an enterprise LakeFS setup on AWS cloud. 1. For a user to download data from a repo, said programmatically, what are the least permissions to succeed the operations? Are they
    Get Repository
    ,
    Get Commit
    ,
    Get Branch
    ,
    Get Object
    ? 2. Can we limit users to access specific commit only? On the RBAC documentation, the Get Commit is only at the repo level (
    arn:lakefs:fs:::repository/{repositoryId}
    ) but not commit level. Thank you
    o
    • 2
    • 1
  • j

    Jason Trinidad

    07/16/2025, 2:53 PM
    Hi all - I'm new to lakefs and hoping to find a way to squash commits during merge. My thinking is that our commit history will also be the version history for the data. Ie I'd like a repo's
    main
    branch to show just the merge commits, which would reflect the final released data for each version. I don't see a squash functionality either on the GUI or in the docs. Does anyone know if one is available? Thanks!
    o
    a
    u
    • 4
    • 8
  • m

    Mark

    07/17/2025, 2:19 PM
    Hi all, I merged multiple branches into the main branch (the default branch), but due to dirty data, I attempted to use
    lakectl revert
    to roll back the main branch to the initial commit (with the message "Repository created"). However, this operation did not succeed. Could you advise me on how to achieve this? Are there alternative methods to revert the branch to its original state?
    ./lakectl branch revert <lakefs://e2e-dt/main> f66e8092ece39d11e2f3a10fab5342cb3a65cf881e237fcd4321eaedd4792dcf -y
    Branch: <lakefs://e2e-dt/main>
    update branch: no changes
    400 Bad Request
    o
    • 2
    • 2
  • k

    Kungim

    07/22/2025, 9:07 AM
    👋 Hello, team! I am trying to set up lakefs on-premises locally with postge, minio and ACL. However, lakefs fails with the following logs and keeps restarting
    Copy code
    {"file":"_build/pkg/auth/basic_service.go:33","func":"pkg/auth.NewBasicAuthService","level":"info","msg":"initialized Auth service","service":"auth_service","time":"2025-07-22T08:49:39Z"}
    {"error":"no users configured: auth migration not possible","file":"_build/pkg/auth/factory/build.go:50","func":"pkg/auth/factory.NewAuthService","level":"fatal","msg":"\ncannot migrate existing user to basic auth mode!\nPlease run \"lakefs superuser -h\" and follow the instructions on how to migrate an existing user\n","time":"2025-07-22T08:49:39Z"}
    How do I fix it? Here is my docker-compose.yml
    Copy code
    services:
      postgres:
        container_name: pg-lakefs
        image: postgres:13
        ports:
          - "5432:5432"
        secrets:
          - postgres_user
          - postgres_password
        environment:
          POSTGRES_DB: lakefs_db
          POSTGRES_USER_FILE: /run/secrets/postgres_user
          POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password
        volumes:
          - pg_lakefs_data:/var/lib/postgresql/data
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U $(cat /run/secrets/postgres_user)"]
          interval: 1s
          timeout: 5s
          retries: 5
        restart: always
    
      minio:
        container_name: minio
        image: <http://quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z|quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z>
        ports:
          - "9000:9000"
          - "9001:9001"
        volumes: 
          - minio_data:/data
        secrets:
          - minio_root_user
          - minio_root_password
        restart: always
        environment:
          MINIO_ROOT_USER_FILE: /run/secrets/minio_root_user
          MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
        command: ["server", "/data", "--console-address", ":9001"]
    
      lakefs:
        container_name: lakefs
        build:
          context: .
          dockerfile: Dockerfile.lakefs
        ports:
          - "8000:8000"
        volumes:
          - lakefs_data:/data
        secrets:
          - lakefs_config
        depends_on:
          postgres:
            condition: service_healthy
          minio:
            condition: service_started
          acl:
            condition: service_started
        restart: always
        command: sh -c "cp /run/secrets/lakefs_config /app/lakefs_config.yaml && /app/lakefs run --config /app/lakefs_config.yaml"
    
      acl:
        container_name: acl
        build:
          context: .
          dockerfile: Dockerfile.acl
        ports:
          - "8001:8001"
        secrets:
          - acl_config
        depends_on:
          postgres:
            condition: service_healthy
        restart: always
        command: sh -c "cp /run/secrets/acl_config /app/acl_config.yaml && /app/acl run --config /app/acl_config.yaml"
    
    volumes:
      pg_lakefs_data:
      minio_data:
      lakefs_data:
    
    secrets:
      postgres_user:
        file: .secrets/postgres_user.txt
      postgres_password:
        file: .secrets/postgres_password.txt
      minio_root_user:
        file: .secrets/minio_root_user.txt
      minio_root_password:
        file: .secrets/minio_root_password.txt
      lakefs_config:
        file: .secrets/.lakefs.yaml
      acl_config:
        file: .secrets/.aclserver.yaml
    .aclserver.yaml
    Copy code
    listen_address: ":8001"
    
    database:
      type: "postgres"
      postgres:
          connection_string: "<postgres://user:pass@postgres:5432/db?sslmode=disable>"
    
    encrypt:
      secret_key: "secret"
    .lakefs.yaml
    Copy code
    logging:
      format: json
      level: INFO
      output: "-"
    
    auth:
      encrypt:
        secret_key: "secret"
    
    blockstore:
      type: s3
      s3:
        force_path_style: true
        endpoint: <http://minio:9000>
        discover_bucket_region: false
        credentials:
          access_key_id: key_id
          secret_access_key: secret
    
    listen_address: "0.0.0.0:8000"
    
    database:
      type: "postgres"
      postgres:
        connection_string: "<postgres://user:pass@postgres:5432/db?sslmode=disable>"
    Please help 🙂
    i
    b
    • 3
    • 13
  • n

    Nikolai Potapov

    07/27/2025, 9:00 AM
    Hello everyone! Does lakeFS have any tutorials or training lessons/videos to help understand how it works and its intricacies?
    b
    i
    a
    • 4
    • 3
  • u

    薛宇豪

    08/07/2025, 1:00 AM
    Hi, I have a question about GC: If I only call
    getPhysicalAddress
    and am writing a file through the S3 interface, and GC is triggered before
    linkPhysicalAddress
    is called, the S3 object will be collected but not marked as active. Will this cause a false GC?
    a
    n
    • 3
    • 4
  • u

    薛宇豪

    08/07/2025, 5:11 AM
    About monitoring, does the grafana dashboard have import code?
    a
    • 2
    • 5
  • u

    薛宇豪

    08/08/2025, 5:29 AM
    Hi, what is the
    create commit record
    API used for? Can I use it to add metadata to an existing commit?
    a
    • 2
    • 4
  • a

    Aaron Taylor

    08/11/2025, 11:33 PM
    We've been encountering an issue where LakeFS files that our system is creating end up being created as directories rather than files, causing issues when other processes try to create them. We've been able to reproduce the "consumer" side of the issue with
    lakectl local checkout
    which produces an error of the following form (file paths edited):
    Copy code
    $ lakectl local checkout --yes .
    ...
    download path/to/example.jsonl failed: could not create file '/Users/aaron/repo/data/path/to/example.jsonl': open /Users/aaron/repo/data/path/to/example.jsonl failed: is a directory
    The LakeFS location looks like this (paths changed, other things not):
    Copy code
    $ lakectl fs ls -r <lakefs://example/COMMIT/path/to/>
    object          2025-08-09 09:15:10 -0700 PDT    83.5 kB         path/to/example.jsonl
    object          2025-08-01 12:06:13 -0700 PDT    86.6 kB         path/to/example.jsonl/9e0b1aabbf762a4494e47dd282e5c4cca1daaed40ac96f8ffcc61ecf38a47242
    What it appears is that some LakeFS operation is partially failing, causing it to leave the object in some sort of broken state? Any guidance on how best to debug this? We've written a script to clean these up and re-run things but that's obviously not ideal! One theory is that seems to happen when the LakeFS deployment is under higher load.
    a
    n
    • 3
    • 5
  • u

    薛宇豪

    08/12/2025, 9:54 AM
    Hi, does Lakefs have a limit on the number of repositories? I ask this question because I noticed that the pgsql implementation is configured with 100 partitioned tables, and the data related to each repository is stored in the same partitioned table. Therefore, I am unsure whether having a large number of repositories would cause any additional issues or side effects. Additionally, what are the benefits of storing all data under the same table structure rather than using different tables? Would using different tables potentially reduce serialization overhead?
    a
    i
    • 3
    • 4
  • u

    薛宇豪

    08/13/2025, 9:16 AM
    Is there any way to restore a branch that was accidentally deleted? Manually querying the database is also acceptable. Or is there any way to prevent a branch from being deleted?
    a
    h
    • 3
    • 13
  • a

    Alan judi

    08/13/2025, 11:39 PM
    Hello Guys, I have setup lakeFS community on my k8s cluster. When I am in the dashboard, I get the following error. Upon inspecting my pod running lakeFS, I see the following:
    Copy code
    time="2025-08-13T22:53:36Z" level=error msg="failed to create user" func="pkg/auth.(*APIAuthService).CreateUser" file="build/pkg/auth/service.go:213" error="Post \"/auth/users\": unsupported protocol scheme \"\"" service=auth_api username=admin
    time="2025-08-13T22:53:36Z" level=error msg="API call returned status internal server error" func="pkg/api.(*Controller).handleAPIErrorCallback" file="build/pkg/api/controller.go:3033" error="create user - Post \"/auth/users\": unsupported protocol scheme \"\"" host=lakefs.*****.com method=POST operation_id=Setup path=/api/v1/setup_lakefs service=api_gateway
    time="2025-08-13T23:31:41Z" level=error msg="failed to create user" func="pkg/auth.(*APIAuthService).CreateUser" file="build/pkg/auth/service.go:213" error="Post \"/auth/users\": unsupported protocol scheme \"\"" service=auth_api username=admin
    time="2025-08-13T23:31:41Z" level=error msg="API call returned status internal server error" func="pkg/api.(*Controller).handleAPIErrorCallback" file="build/pkg/api/controller.go:3033" error="create user - Post \"/auth/users\": unsupported protocol scheme \"\"" host=lakefs.******.com method=POST operation_id=Setup path=/api/v1/setup_lakefs service=api_gateway
    Here are my helm chart values:
    Copy code
    # lakeFS server configuration
    lakefsConfig: |
      logging:
        level: "INFO"
      database:
        type: postgres
        postgres:
          connection_string: "postgres://****:****@****:5432/postgres?sslmode=disable"
      blockstore:
        type: s3
        s3:
          region: us-west-2
      auth:
        # Optional: map display names & default groups from ID token claims
        api:
          skip_health_check: true
          supports_invites: false
          endpoint: ""
        authentication_api:
          endpoint: ""
          external_principals_enabled: false
        ui_config:
           rbac: simplified
           login_url: /auth/login
           logout_url: /auth/logout
    n
    • 2
    • 4
  • j

    Jeffrey Ji

    08/17/2025, 1:28 AM
    hello folks, seems the career page doesn't work, I cannot submit my resume
    i
    i
    • 3
    • 2
  • u

    薛宇豪

    08/26/2025, 8:54 AM
    Hi, I want to mount the lakefs frontend under an existing domain, such as https://test.domain.com/lakefs. This way, all requests to the backend API will also include /lakefs. I need to change the original /api/v1 to /lakefs/api/v1. I see that the current helm chart supports configuring
    ingress.hosts.paths
    . Is it possible to directly modify this configuration? However, I see the frontend JS has a hardcoded `export const API_ENDPOINT = '/api/v1'`; https://github.com/treeverse/lakeFS/blob/master/webui/src/lib/api/index.js#L1
    b
    • 2
    • 8
  • c

    Carlos Luque

    09/02/2025, 8:18 AM
    Hi! one question, the OSS version only supports one user?
    i
    i
    • 3
    • 4
  • k

    Kungim

    09/03/2025, 7:28 AM
    Hello Team! I am trying to make a c# client api as a library using openapi generator using /api/swagger.yml, but I noticed that the api is split into 3 files: /api/authentication.yml, /api/authorization.yml and /api/swagger.yml. Do I need to combine them somehow to get full API? Building with just /api/swagger.yml seems to be missing some API functionality. How do I build the full API? Looking forward to any response!
    i
    • 2
    • 7
  • j

    Jose Ignacio Gascon Conde

    09/03/2025, 8:12 AM
    Hi team, I'm having a persistent issue trying to deploy LakeFS to an EKS cluster using the Terraform
    helm_release
    resource, and I'm hoping someone might have some insight. Passing configuration via `values`: I've tried passing the configuration using both the
    lakefsConfig
    key and the
    config
    key (as shown on Artifact Hub). In both cases,
    helm get values lakefs
    confirms that Helm receives the correct values from Terraform. However, the resulting
    ConfigMap
    in the cluster is still the default one.
    o
    • 2
    • 2
  • m

    Mingke Wang

    09/03/2025, 3:01 PM
    Hi guys, I'm a student in ML and want to use lake mount to mount the dataset since the dataset I have is about 3TB. Is there any cheap option instead of buying the enterprise version?
    i
    h
    • 3
    • 3
  • c

    Carlos Luque

    09/04/2025, 8:18 AM
    Hey everyone, just wanted to share some concerns about LakeFS (Version 1.29.0) 1. Is LakeFS removing from S3 the folder created when a repository is deleted? a. If not, why? I mean, LakeFS is a data versioning tool, if we are keeping data that was potentially removed by the user why are we keeping that in S3 2. Removing a repository make that name not usable anymore (I suppose this is coming by the concern explained above) 3. When I upload the same object to LakeFS (without any change), store the object again, taking up storage space (for small object this is not a big deal but since people normally saves here data and the common usage is to upload the folder directly, not the edited files only) 4. Creating Tags consume storage space?
    i
    • 2
    • 4
  • j

    Jiadong Bai

    03/16/2025, 9:29 PM
    Hi there, I am wondering if there is a native API to download the whole branch/commit as a zip file? I looked through the open API specification but seems that there is no such API.
  • i

    Ion

    09/16/2025, 12:45 PM
    I am seeing random failures
    SignatureDoesNotMatch
    the request signature we calculated does not match the signature you provided. Check your key and signing method.
    Any ideas, I found a issue in the repo that also points to boto, but I am using obstore (object-store-rs)
    j
    o
    • 3
    • 6
  • c

    Carlos Luque

    09/17/2025, 3:12 PM
    Hi! one question, are you going to introduce templates or any way to include a template in the Compare (Pull Request)?
    o
    • 2
    • 2
  • u

    薛宇豪

    09/18/2025, 6:33 AM
    Hey, I'm trying to build a customized LakeFS server. After modifying the code, running
    make build-docker
    doesn't seem to generate a docker image with my local code. Is it still pulling the GitHub code for the build?
    b
    • 2
    • 11
  • c

    Carlos Luque

    09/22/2025, 10:36 AM
    Hey, is there any way to restrict access to the repos to specific users? with the custom implementation of RBAC using your code or the up-to-date version of LakeFS, that would be a nice feature to have 😉
    e
    • 2
    • 1
  • h

    HT

    09/25/2025, 10:35 AM
    What is a fast way to retrieve physicalAddress ? Currently:
    Copy code
    client = lakefs_sdk.client.LakeFSClient(lakefs_conf)
    
    res = []
    for object_path in paths:
        response = client.objects_api.stat_object(repository=repo,
                                                    ref=commit,
                                                    path=object_path,
                                                    presign=presign)
    
        res.append(response.physical_address)
    Can client be used in multi processing ?
    Copy code
    import concurrent.futures
    import os
    
    def stat_object_path(args):
        client, repo, commit, presign, object_path = args
        response = client.objects_api.stat_object(
            repository=repo,
            ref=commit,
            path=object_path,
            presign=presign
        )
        return response.physical_address
    
    def get_physical_addresses(client, repo, commit, presign, paths):
        with concurrent.futures.ThreadPoolExecutor(max_workers=alot) as executor:
            res = list(executor.map(
                stat_object_path,
                [(client, repo, commit, presign, p) for p in paths]
            ))
        return res
    
    client = lakefs_sdk.client.LakeFSClient(lakefs_conf)
    res = get_physical_addresses(client, repo, commit, presign, paths)
    n
    b
    • 3
    • 2
  • a

    Amihay Gonen

    09/29/2025, 11:00 PM
    I'm try to connect to icberge reset catalog using duckdb 1.4 following this guide https://docs.lakefs.io/latest/integrations/iceberg/ (example duckdb). got this error
    Copy code
    D ATTACH 'lakefs' AS main_branch (
          TYPE iceberg,
          SECRET lakefs_credentials,
          -- notice the "/relative_to/.../" part:
          ENDPOINT 'https://.../relative_to/repo.main/api'
      );
    Invalid Input Error:
    CatalogConfig required property 'defaults' is missing
    this error is mislead (https://github.com/duckdb/duckdb-iceberg/issues/297#issuecomment-2973232577) it seems the problem is with endpoint , but I can't understand what is the issue
    o
    i
    • 3
    • 5
  • m

    Manuele Nolli

    10/01/2025, 2:29 PM
    Hello everyone, I’m experiencing an issue with my AWS-hosted LakeFS. I successfully imported an S3 bucket into LakeFS, but whenever I try to view the file overview, download a file, or generate a presigned URL, I get the following error:
    AccessDenied arn:aws:sts::XXXX:assumed-role/YYYY/i-AAAA is not authorized to perform: s3:GetObject on resource: ...
    My bucket policy includes:
    {
    "Sid": "lakeFSObjects",
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::XXX:role/[ROLE_NAME]"
    },
    "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:AbortMultipartUpload",
    "s3:ListMultipartUploadParts"
    ],
    "Resource": "arn:aws:s3:::[BUCKET NAME]/*"
    },
    I’m not sure if it’s related, but I also cannot download the file using the LakeFS Python library. For reference, my bucket is located in eu-central-1. Does anyone have suggestions on how to resolve this issue? Thank you in advance!
    i
    • 2
    • 4
  • j

    John McCloud

    10/01/2025, 4:19 PM
    Hello there! I am playing around with the quickstart and was trying to figure out how to add object-level metadata to files uploaded from a local filesystem. I understand that I can add arbitrary key-value pairs as part of a commit, but what about object information? Is there any way to do this with lakefs? As an example, here's a file uploaded from my local filesystem and the "Object Information" as it exists within LakeFS. How do I add key-value pairs to this object? Thank you!
    i
    o
    • 3
    • 5
  • a

    Amit Varde

    10/08/2025, 9:44 PM
    Hello, I am getting the following error Is there a way I could run lakefs in debug mode..
    Copy code
    [ec2-user@ip-xxx-xxx-xxx-xxx ~]$ /opt/lakefs/latest/lakefs --config /etc/lakefs/poc-01-newton-config.yaml run
    INFO[0000]/home/runner/work/lakeFS/lakeFS/cmd/lakefs/cmd/root.go:130 <http://github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig()|github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig()> Configuration file                            fields.file=/etc/lakefs/poc-01-newton-config.yaml file=/etc/lakefs/poc-01-newton-config.yaml phase=startup
    FATA[0000]/home/runner/work/lakeFS/lakeFS/cmd/lakefs/cmd/root.go:114 <http://github.com/treeverse/lakefs/cmd/lakefs/cmd.LoadConfig()|github.com/treeverse/lakefs/cmd/lakefs/cmd.LoadConfig()> Load config                                   error="decoding failed due to the following error(s):\n\n'database' has invalid keys: dynamodb_table_name" phase=startup
    n
    • 2
    • 4