https://lakefs.io/ logo
Join Slack
Channels
help
dev
Powered by
# help
  • a

    Aayush Bhasin

    10/10/2024, 9:27 PM
    Hey! I'm using
    lakectl local
    , i.e. my local directory has a
    .lakefs_ref.yaml
    file in a directory that's linked to a corresponding lakefs repo, branch and commit. Sometimes i want to run
    lakectl local checkout <path>
    but I do not want it to overwrite untracked files in that directory. Is this something that is supported / on the roadmap? Similar to how git does not remove untracked paths when doing a git pull. I was able to do this by using
    lakectl fs download <lakefs url> <path>
    , so just wondering if its possible to implement with
    lakectl local
    as well. Thanks in advance!
    o
    • 2
    • 2
  • m

    mpn mbn

    10/12/2024, 11:54 AM
    Hello Team, I still can't get ACL to work. Here is my Postgres `docker-compose.yaml`:
    Copy code
    services:
      postgres:
        container_name: pg-lakefs
        image: postgres:13
        ports:
          - "5432:5432"
        environment:
          POSTGRES_DB: lakefs
          POSTGRES_USER: lakefs
          POSTGRES_PASSWORD: lakefs
        volumes:
          - pg_data:/var/lib/postgresql/data
    volumes:
      pg_data:
    Here is my ACL `config.yaml`:
    Copy code
    listen_address: ":8001"
    
    database:
      type: "postgres"
      postgres:
        connection_string: "<postgres://lakefs:lakefs@localhost:5432/lakefs?sslmode=disable>"
    
    encrypt:
      secret_key: "secret"
    Here is my `.lakefs.yaml`:
    Copy code
    database:
      type: "postgres"
      postgres:
        connection_string: "<postgres://lakefs:lakefs@localhost:5432/lakefs?sslmode=disable>"
    
    blockstore:
      type: local
      local:
        path: ~/Code/lakefs/data
    
    auth:
      remote_authenticator:
        enabled: true
        endpoint: <http://localhost:8001/api/v1/auth>
        default_user_group: "Developers"
      ui_config:
        RBAC: simplified
      encrypt:
        secret_key: "secret"
    The logs I get when I `lakefs run`:
    Copy code
    INFO   [2024-10-12T14:51:19+03:00]pkg/auth/service.go:953 pkg/auth.(*APIAuthService).CheckHealth Performing health check, this can take up to 20s
    FATAL  [2024-10-12T14:51:37+03:00]cmd/lakefs/cmd/run.go:123 cmd/lakefs/cmd.NewAuthService Auth API health check failed                  error="Get \"/healthcheck\": unsupported protocol scheme \"\""
    n
    • 2
    • 3
  • v

    Vibhath

    10/14/2024, 8:45 AM
    Hello team, Does the embedded Lua VM consume resource when it is not executing any Lua hook? Does it cost anything to have this VM without ever using it?
    a
    • 2
    • 4
  • g

    Gard Drasil

    10/15/2024, 8:03 AM
    hi guys, I have following error when trying to create first repo in lakefs, I used this docker-compose file: https://github.com/treeverse/lakeFS-samples/blob/main/docker-compose.yml any help is appreciated!! here is the logs output:
    Copy code
    time="2024-10-15T07:59:08Z" level=warning msg="Could not access storage namespace" func="pkg/api.(*Controller).CreateRepository" file="build/pkg/api/controller.go:2007" error="operation error S3: GetObject, https response error StatusCode: 400, RequestID: 0, HostID: , api error InvalidArgument: S3 API Requests must be made to API port." reason=unknown service=api_gateway storage_namespace="<s3://test>"
    time="2024-10-15T08:01:40Z" level=error msg="Failed to get region for bucket, falling back to default region" func="pkg/block/s3.(*ClientCache).refreshBucketRegion" file="build/pkg/block/s3/client_cache.go:151" default_region=us-east-1 error="operation error S3: HeadBucket, https response error StatusCode: 400, RequestID: , HostID: , api error BadRequest: Bad Request" host="localhost:47098" method=POST operation_id=CreateRepository path=/api/v1/repositories user=everything-bagel
    time="2024-10-15T08:01:40Z" level=error msg="failed to get S3 object bucket test-repo key dummy" func="pkg/logging.(*logrusEntryWrapper).Errorf" file="build/pkg/logging/logger.go:339" error="operation error S3: GetObject, https response error StatusCode: 400, RequestID: 0, HostID: , api error InvalidArgument: S3 API Requests must be made to API port." host="localhost:47098" method=POST operation=GetObject operation_id=CreateRepository path=/api/v1/repositories user=everything-bagel
    time="2024-10-15T08:01:40Z" level=warning msg="Could not access storage namespace" func="pkg/api.(*Controller).CreateRepository" file="build/pkg/api/controller.go:2007" error="operation error S3: GetObject, https response error StatusCode: 400, RequestID: 0, HostID: , api error InvalidArgument: S3 API Requests must be made to API port." reason=unknown service=api_gateway storage_namespace="<s3://test-repo>"
    o
    a
    • 3
    • 2
  • v

    Vibhath

    10/15/2024, 5:03 PM
    Hello team, Is there a way to specify the expiry duration of a lakefs presigned url when creating one?
    n
    o
    • 3
    • 5
  • t

    taylor schneider

    10/15/2024, 7:59 PM
    Hey folks. I am trying to mount lakefs using s3fs-fuse. I am having a few issues though. Is anyone familiar with using this component? Also open to suggestions about alternate approaches for accessing the data in a branch remotely.
    n
    h
    • 3
    • 10
  • a

    Aaron Taylor

    10/17/2024, 7:03 PM
    Is there a way to rename a file in a LakeFS repo, or do you need to delete and then re-create it?
    o
    • 2
    • 1
  • d

    Davi Gomes

    10/18/2024, 3:35 PM
    Hello everyone, can anyone explain to me why my lakefs files are not reflected in MinIO? I'm using Trino with Delta tables, the entire versioning process works, only the files that are not reflected in MinIO. The branch
    <lakefs://data-platform-silver/main/customers/>
    is not reflected in
    <s3://data-platform-silver/main/customers>
    n
    h
    • 3
    • 16
  • j

    Jérôme Viveret

    10/21/2024, 12:32 PM
    Hello team, Is there a way for me to know the root commit of a branch ? The use case would be for me to be able to know the age of a branch based on the default branch (main). I would take creation_date of the first commit
    i
    • 2
    • 23
  • m

    Matthew Butler

    10/21/2024, 6:27 PM
    Good morning team! I am running into an issue where LakeFS login creds (
    access_key_id
    and
    secret_access_key
    ) are changing somehow without my knowledge. I'll set up LakeFS and successfully log in one day, then a few days or weeks later the creds no longer work. I'm deploying LakeFS on Kubernetes
    o
    • 2
    • 1
  • a

    Akshar Barot

    10/22/2024, 7:59 AM
    Good day! Is that possible to use multiple azure storage accounts? I do not find the way to mention in values.yaml using helm.
    o
    i
    n
    • 4
    • 5
  • v

    Vibhath

    10/22/2024, 1:43 PM
    Hello team, I have a LakeFs server running in a AWS ECS cluster behind an ALB. If two lambada function tries to write the same file in same branch in same repository, using lakefs python clinet library, is lakeFs able to store writes from both lambdas. Will I loose the data written from one lambda as the other lambda overrides them.
    i
    a
    • 3
    • 3
  • b

    Benoit Putzeys

    10/23/2024, 9:29 AM
    Hello team, I want to install lakeFS on AWS. I followed the instructions you provided here, where I created an IAM with the DynamoDB permissions you specified. I linked this IAM to a new EC2 instance, created the
    config.yaml
    and ran the
    lakefs
    command. However, I get an error:
    Copy code
    WARNING[2024-10-23T09:07:01Z]lakeFS/pkg/kv/dynamodb/store.go:199 pkg/kv/dynamodb.setupKeyValueDatabase Failed to create or detect KV table           error="operation error DynamoDB: CreateTable, https response error StatusCode: 0, RequestID: , request send failed, Post \"<https://dynamodb>..<http://amazonaws.com/\|amazonaws.com/\>": dial tcp: lookup dynamodb..<http://amazonaws.com|amazonaws.com>: no such host" table_name=kvstore
    INFO   [2024-10-23T09:07:01Z]lakeFS/pkg/kv/dynamodb/store.go:165 pkg/kv/dynamodb.setupKeyValueDatabase.func1 Setup time                                    table_name=kvstore took=7.253785ms
    FATAL  [2024-10-23T09:07:01Z]lakeFS/cmd/lakefs/cmd/run.go:159 cmd/lakefs/cmd.init.func9 Failed to open KV store                       error="setup failed: operation error DynamoDB: CreateTable, https response error StatusCode: 0, RequestID: , request send failed, Post \"<https://dynamodb>..<http://amazonaws.com/\|amazonaws.com/\>": dial tcp: lookup dynamodb..<http://amazonaws.com|amazonaws.com>: no such host"
    I wanted to ask if you can reproduce it and help me resolve this? Thanks in advance!
    i
    • 2
    • 6
  • v

    Vibhath

    10/23/2024, 7:39 PM
    Hello team, I noticed that LakeFs allows creating read-only repositories. I'm just curious to understand whether we can add data into the repository when creating a readonly repository. Further, can we update an existing writable repository into a readonly repository?
    n
    a
    • 3
    • 3
  • b

    Benoit Putzeys

    10/24/2024, 12:12 PM
    Hello again, I'm currently working with a setup where I use lakeFS to version control TileDB data stored in S3. Specifically, I'm working with TileDBSOMA data arrays, which are designed for single-cell data analysis. I successfully created a main branch where I loaded TileDB data (which are multiple files in a particular format) in AWS EC2. But when I try to load this "experiment" (ie. an instance of a TileDB dataset) via LakeFS in python, I get an error that the data does not exist. I'm still new to LakeFS and am thus not sure if this is a solvable issue on my side or if an inherent incompatibility between TileDB and LakeFS might be the problem. I was wondering if you plan on supporting TileDB in the near furute or how I could go about solving this. Thank you in advance!
    • 1
    • 1
  • m

    mpn mbn

    10/24/2024, 6:41 PM
    Hello team, It's sad that you can't use '@' in tags for something like "model-name@v0.1.2" You can create such tag, but it will give you "ref: invalid value: validation error" when you try to do something with it.
    n
    a
    • 3
    • 4
  • h

    Haoming Jiang

    10/24/2024, 9:37 PM
    Hi team I tried the latest 1.39.2 version https://github.com/treeverse/lakeFS/releases/tag/v1.39.2 The admin site seems not functional -- I cant click on Users / Groups
    a
    • 2
    • 4
  • m

    mpn mbn

    10/25/2024, 8:03 AM
    Hello team, Here is my X problem: I want to implement staging using branches (latest, devel, release etc.) in lakeFS. Each branch contains some folders with models and a single yaml file, describing where to find each model. When new model or new version of model is being pushed, it updates (1) model file itself, (2) yaml file (with metadata - new file hash and version). For example in my release stage I have model version v0.0.1, and I want to promote my devel model version v0.0.33 to release. I can neither use purely devel version of yaml file (because it will potentially update other models metadata), nor cherry-pick latest change to yaml file from devel to release (because there were a lot of changes since when devel and release were the same). But for the model file, I can just update file pointer in release stage. So here is my Y problem: I want to simply update model file pointer in new stage, and update yaml file manually (by downloading it, changing model metadata and pushing it back in some script). The question is: Can I simply update file pointer in lakeFS? If yes - how? UPD: Because I don't want to download and push models each time on promotion - they may be large. So I just want to change file pointers.
    a
    i
    • 3
    • 3
  • v

    Vincent Caldwell

    10/26/2024, 5:13 AM
    Are there any videos tutorials or resources for connecting a gcp bucket like there are for aws? I don't necessarily want to use a postgres db (the currently gcp get started instructions) for a variety of reasons. There is an aws connection video on youtube (great video btw ->

    https://www.youtube.com/watch?app=desktop&amp;v=lr6ou-Vvy_A▾

    ) but nothing for gcp. Unfortunately, I don't know gcp well enough to figure out the finer points myself, but I need to learn asap. Can anyone help - even pointing me to docs, sites, etc? I would greatly appreciate it.
    a
    • 2
    • 4
  • r

    Rudy Cortembert

    10/27/2024, 9:18 PM
    Hello, is there any plan to support Azurite? I am working on a .NET Aspire custom integration and I would like to test Azure Storage locally running Azurite as an Azure Storage emulator. So far, it looks like lakeFS requires an Azure hosted blob storage and using the Azurite emulator is not possible. Thanks a lot in advance for your guidance!
    o
    a
    • 3
    • 6
  • p

    Parth Ghinaiya

    10/28/2024, 8:14 PM
    Hello Team, Is there any plan or focus to have connection between LakeFS and DLTHub? I'm using Dremio as search engine and LakeFS as DataLake versioning. I want to load a data using DLTHub. I have tried to find solutions but I couldn't. Thank you
    👀 1
    o
    • 2
    • 2
  • a

    Andrij David

    10/29/2024, 8:33 PM
    Hello, I know that it is possible to clone a repository using the command lakectl local clone. Are there any other ways to clone a given repository? For example, using the S3 endpoint, Python library: lakefs, or lakefs-specs?
    h
    n
    • 3
    • 2
  • a

    Andrij David

    10/29/2024, 8:46 PM
    Also Is there any way to make a repository completely public?
    n
    • 2
    • 10
  • h

    Haoming Jiang

    10/30/2024, 1:11 AM
    In the lua hooks, is there anyway we can write data based on the reference here: https://docs.lakefs.io/howto/hooks/lua.html#lua-library-reference I see we can read data by
    lakefs/get_object(repository_id, reference_id, path)
    , but I dont see how to write data
    o
    • 2
    • 4
  • m

    mpn mbn

    10/31/2024, 12:05 PM
    Hello team, I want to upload datasets to lakefs and version them. Each dataset is a separate folder with random files. For example I have folders (datasets) A and B. Files in A: a1, aa1 Files in B: b1, bb1 datasets-versions.yaml: A: v0.0.1 B: v0.0.1 I want to update dataset A - rewrite folder A contents. So after uploading new dataset, folder A contents are the following: Files in A: a2, aa2, aaa2 datasets-versions.yaml: A: v0.0.2 B: v0.0.1 I can do this by using commands:
    lakectl fs rm -r <lakefs://repo/branch/A>
    lakectl fs upload -r <lakefs://repo/branch/A> -s A
    My question is: How can I do this using Python lakefs package?
    o
    a
    h
    • 4
    • 7
  • o

    Ocean Chang

    11/06/2024, 8:12 AM
    How to setup remote authenticator that can have the lakeFS client to pass in additional values in headers and body? config.yaml
    Copy code
    auth:
      remote_authenticator:
        enabled: true
        endpoint: <https://testendpoint.com>
        default_user_group: "Developers"
      ui_config:
        logout_url: /logout
        login_cookie_names:
          - Authorization
    i
    a
    • 3
    • 6
  • b

    Boris

    11/07/2024, 1:01 PM
    Hello! I am trying to make a POST request to listPullRequests via the lakefs UI, but I get a 401 error "insufficient permissions". I used the lakefs demo environment and the standard repository. What didn't I do?
    i
    • 2
    • 3
  • o

    Ocean Chang

    11/08/2024, 2:23 AM
    Context: using the LakeFS API or Python SDK to fetch list of repositories and other API's Problem: First, making the
    v1/auth/login
    API call or the
    Client
    from SDK. They are successful with 200. Login API call returns the
    token
    and
    token_expiration
    However, when subsequently trying to call
    /api/v1/repositories
    , I m getting 401
    error authenticating request
    Question: Do I need to attach the login token being returned in order to make subsequent calls? If so, how?
    i
    m
    i
    • 4
    • 24
  • m

    Mike Fang

    11/08/2024, 7:09 PM
    Is there a way to overridde the default authentication for all API requests from the lakefs_sdk python? I found this :param _request_auth: set to override the auth_settings for an a single request; this effectively ignores the authentication in the spec for a single request. but this is only for every single API call. Is there a way to set it on the actual api_client? I am trying to do sigv4 auth on all requests from lakefs SDK. I am trying to proxy the LakeFS API through API Gateway with IAM authorization.
    i
    • 2
    • 2
  • m

    Mike Fang

    11/09/2024, 1:42 AM
    When I try to create repository from the UI I get this issue with S3:
    Copy code
    time="2024-11-09T01:33:57Z"
     level=warning msg="Could not access storage namespace" 
    func="pkg/api.(*Controller).CreateRepository" 
    file="lakeFS/pkg/api/controller.go:2016" error="operation error S3: 
    PutObject, https response error StatusCode: 400, RequestID: 
    GV2RCD8F49KSN5K3, HostID: 
    P2Te8QubRyKCczc2nt/cJ3YnGfIJFDD2vJRKYoKC7JuDkMkEgN6woYVtsfChFfRhkO2HvM10uYE=,
     api error InvalidRequest: Content-MD5 OR x-amz-checksum- HTTP header is
     required for Put Object requests with Object Lock parameters" 
    reason=unknown service=api_gateway 
    storage_namespace="<s3://nile-data-catalog-storefangmik-406016533510-dev/test-lakefs/>"
    is there something i am missing with setting up s3 wiht lakeFS? I believe the bucket permissions should be set up correctly object lock is usualy default for s3 buckets, do they need ot be turned off now for lakefs?