lakeFS #help

taylor schneider

10/15/2024, 7:59 PM

Hey folks. I am trying to mount lakefs using s3fs-fuse. I am having a few issues though. Is anyone familiar with using this component? Also open to suggestions about alternate approaches for accessing the data in a branch remotely.

Aaron Taylor

10/17/2024, 7:03 PM

Is there a way to rename a file in a LakeFS repo, or do you need to delete and then re-create it?

Davi Gomes

10/18/2024, 3:35 PM

Hello everyone, can anyone explain to me why my lakefs files are not reflected in MinIO? I'm using Trino with Delta tables, the entire versioning process works, only the files that are not reflected in MinIO. The branch

<lakefs://data-platform-silver/main/customers/>

is not reflected in

<s3://data-platform-silver/main/customers>

Jérôme Viveret

10/21/2024, 12:32 PM

Hello team, Is there a way for me to know the root commit of a branch ? The use case would be for me to be able to know the age of a branch based on the default branch (main). I would take creation_date of the first commit

Matthew Butler

10/21/2024, 6:27 PM

Good morning team! I am running into an issue where LakeFS login creds (

access_key_id

and

secret_access_key

) are changing somehow without my knowledge. I'll set up LakeFS and successfully log in one day, then a few days or weeks later the creds no longer work. I'm deploying LakeFS on Kubernetes

Akshar Barot

10/22/2024, 7:59 AM

Good day! Is that possible to use multiple azure storage accounts? I do not find the way to mention in values.yaml using helm.

Vibhath

10/22/2024, 1:43 PM

Hello team, I have a LakeFs server running in a AWS ECS cluster behind an ALB. If two lambada function tries to write the same file in same branch in same repository, using lakefs python clinet library, is lakeFs able to store writes from both lambdas. Will I loose the data written from one lambda as the other lambda overrides them.

Benoit Putzeys

10/23/2024, 9:29 AM

Hello team, I want to install lakeFS on AWS. I followed the instructions you provided here, where I created an IAM with the DynamoDB permissions you specified. I linked this IAM to a new EC2 instance, created the

config.yaml

and ran the

lakefs

command. However, I get an error:

Copy code

WARNING[2024-10-23T09:07:01Z]lakeFS/pkg/kv/dynamodb/store.go:199 pkg/kv/dynamodb.setupKeyValueDatabase Failed to create or detect KV table           error="operation error DynamoDB: CreateTable, https response error StatusCode: 0, RequestID: , request send failed, Post \"<https://dynamodb>..<http://amazonaws.com/\|amazonaws.com/\>": dial tcp: lookup dynamodb..<http://amazonaws.com|amazonaws.com>: no such host" table_name=kvstore
INFO   [2024-10-23T09:07:01Z]lakeFS/pkg/kv/dynamodb/store.go:165 pkg/kv/dynamodb.setupKeyValueDatabase.func1 Setup time                                    table_name=kvstore took=7.253785ms
FATAL  [2024-10-23T09:07:01Z]lakeFS/cmd/lakefs/cmd/run.go:159 cmd/lakefs/cmd.init.func9 Failed to open KV store                       error="setup failed: operation error DynamoDB: CreateTable, https response error StatusCode: 0, RequestID: , request send failed, Post \"<https://dynamodb>..<http://amazonaws.com/\|amazonaws.com/\>": dial tcp: lookup dynamodb..<http://amazonaws.com|amazonaws.com>: no such host"

I wanted to ask if you can reproduce it and help me resolve this? Thanks in advance!

Vibhath

10/23/2024, 7:39 PM

Hello team, I noticed that LakeFs allows creating read-only repositories. I'm just curious to understand whether we can add data into the repository when creating a readonly repository. Further, can we update an existing writable repository into a readonly repository?

Benoit Putzeys

10/24/2024, 12:12 PM

Hello again, I'm currently working with a setup where I use lakeFS to version control TileDB data stored in S3. Specifically, I'm working with TileDBSOMA data arrays, which are designed for single-cell data analysis. I successfully created a main branch where I loaded TileDB data (which are multiple files in a particular format) in AWS EC2. But when I try to load this "experiment" (ie. an instance of a TileDB dataset) via LakeFS in python, I get an error that the data does not exist. I'm still new to LakeFS and am thus not sure if this is a solvable issue on my side or if an inherent incompatibility between TileDB and LakeFS might be the problem. I was wondering if you plan on supporting TileDB in the near furute or how I could go about solving this. Thank you in advance!

mpn mbn

10/24/2024, 6:41 PM

Hello team, It's sad that you can't use '@' in tags for something like "model-name@v0.1.2" You can create such tag, but it will give you "ref: invalid value: validation error" when you try to do something with it.

Haoming Jiang

10/24/2024, 9:37 PM

Hi team I tried the latest 1.39.2 version https://github.com/treeverse/lakeFS/releases/tag/v1.39.2 The admin site seems not functional -- I cant click on Users / Groups

mpn mbn

10/25/2024, 8:03 AM

Hello team, Here is my X problem: I want to implement staging using branches (latest, devel, release etc.) in lakeFS. Each branch contains some folders with models and a single yaml file, describing where to find each model. When new model or new version of model is being pushed, it updates (1) model file itself, (2) yaml file (with metadata - new file hash and version). For example in my release stage I have model version v0.0.1, and I want to promote my devel model version v0.0.33 to release. I can neither use purely devel version of yaml file (because it will potentially update other models metadata), nor cherry-pick latest change to yaml file from devel to release (because there were a lot of changes since when devel and release were the same). But for the model file, I can just update file pointer in release stage. So here is my Y problem: I want to simply update model file pointer in new stage, and update yaml file manually (by downloading it, changing model metadata and pushing it back in some script). The question is: Can I simply update file pointer in lakeFS? If yes - how? UPD: Because I don't want to download and push models each time on promotion - they may be large. So I just want to change file pointers.

Vincent Caldwell

10/26/2024, 5:13 AM

Are there any videos tutorials or resources for connecting a gcp bucket like there are for aws? I don't necessarily want to use a postgres db (the currently gcp get started instructions) for a variety of reasons. There is an aws connection video on youtube (great video btw ->

https://www.youtube.com/watch?app=desktop&v=lr6ou-Vvy_A▾

) but nothing for gcp. Unfortunately, I don't know gcp well enough to figure out the finer points myself, but I need to learn asap. Can anyone help - even pointing me to docs, sites, etc? I would greatly appreciate it.

Rudy Cortembert

10/27/2024, 9:18 PM

Hello, is there any plan to support Azurite? I am working on a .NET Aspire custom integration and I would like to test Azure Storage locally running Azurite as an Azure Storage emulator. So far, it looks like lakeFS requires an Azure hosted blob storage and using the Azurite emulator is not possible. Thanks a lot in advance for your guidance!

Parth Ghinaiya

10/28/2024, 8:14 PM

Hello Team, Is there any plan or focus to have connection between LakeFS and DLTHub? I'm using Dremio as search engine and LakeFS as DataLake versioning. I want to load a data using DLTHub. I have tried to find solutions but I couldn't. Thank you

👀 1

Andrij David

10/29/2024, 8:33 PM

Hello, I know that it is possible to clone a repository using the command lakectl local clone. Are there any other ways to clone a given repository? For example, using the S3 endpoint, Python library: lakefs, or lakefs-specs?

Andrij David

10/29/2024, 8:46 PM

Also Is there any way to make a repository completely public?

Haoming Jiang

10/30/2024, 1:11 AM

In the lua hooks, is there anyway we can write data based on the reference here: https://docs.lakefs.io/howto/hooks/lua.html#lua-library-reference I see we can read data by

lakefs/get_object(repository_id, reference_id, path)

, but I dont see how to write data

mpn mbn

10/31/2024, 12:05 PM

Hello team, I want to upload datasets to lakefs and version them. Each dataset is a separate folder with random files. For example I have folders (datasets) A and B. Files in A: a1, aa1 Files in B: b1, bb1 datasets-versions.yaml: A: v0.0.1 B: v0.0.1 I want to update dataset A - rewrite folder A contents. So after uploading new dataset, folder A contents are the following: Files in A: a2, aa2, aaa2 datasets-versions.yaml: A: v0.0.2 B: v0.0.1 I can do this by using commands:

lakectl fs rm -r <lakefs://repo/branch/A>

lakectl fs upload -r <lakefs://repo/branch/A> -s A

My question is: How can I do this using Python lakefs package?

Ocean Chang

11/06/2024, 8:12 AM

How to setup remote authenticator that can have the lakeFS client to pass in additional values in headers and body? config.yaml

Copy code

auth:
  remote_authenticator:
    enabled: true
    endpoint: <https://testendpoint.com>
    default_user_group: "Developers"
  ui_config:
    logout_url: /logout
    login_cookie_names:
      - Authorization

Boris

11/07/2024, 1:01 PM

Hello! I am trying to make a POST request to listPullRequests via the lakefs UI, but I get a 401 error "insufficient permissions". I used the lakefs demo environment and the standard repository. What didn't I do?

Ocean Chang

11/08/2024, 2:23 AM

Context: using the LakeFS API or Python SDK to fetch list of repositories and other API's Problem: First, making the

v1/auth/login

API call or the

Client

from SDK. They are successful with 200. Login API call returns the

token

and

token_expiration

However, when subsequently trying to call

/api/v1/repositories

, I m getting 401

error authenticating request

Question: Do I need to attach the login token being returned in order to make subsequent calls? If so, how?

Mike Fang

11/08/2024, 7:09 PM

Is there a way to overridde the default authentication for all API requests from the lakefs_sdk python? I found this :param _request_auth: set to override the auth_settings for an a single request; this effectively ignores the authentication in the spec for a single request. but this is only for every single API call. Is there a way to set it on the actual api_client? I am trying to do sigv4 auth on all requests from lakefs SDK. I am trying to proxy the LakeFS API through API Gateway with IAM authorization.

Mike Fang

11/09/2024, 1:42 AM

When I try to create repository from the UI I get this issue with S3:

Copy code

time="2024-11-09T01:33:57Z"
 level=warning msg="Could not access storage namespace" 
func="pkg/api.(*Controller).CreateRepository" 
file="lakeFS/pkg/api/controller.go:2016" error="operation error S3: 
PutObject, https response error StatusCode: 400, RequestID: 
GV2RCD8F49KSN5K3, HostID: 
P2Te8QubRyKCczc2nt/cJ3YnGfIJFDD2vJRKYoKC7JuDkMkEgN6woYVtsfChFfRhkO2HvM10uYE=,
 api error InvalidRequest: Content-MD5 OR x-amz-checksum- HTTP header is
 required for Put Object requests with Object Lock parameters" 
reason=unknown service=api_gateway 
storage_namespace="<s3://nile-data-catalog-storefangmik-406016533510-dev/test-lakefs/>"

is there something i am missing with setting up s3 wiht lakeFS? I believe the bucket permissions should be set up correctly object lock is usualy default for s3 buckets, do they need ot be turned off now for lakefs?

Akshar Barot

06/29/2025, 5:48 AM

Sure. Thank you.

👍 1

A. Katsikarelis

07/09/2025, 7:07 AM

Thank you very much for the reply @Offir Cohen. Is garbage collection part of the OSS version?

TsuHao Wang

07/10/2025, 9:54 PM

Hello team, I have questions about the permission management. We have an enterprise LakeFS setup on AWS cloud. 1. For a user to download data from a repo, said programmatically, what are the least permissions to succeed the operations? Are they

Get Repository

Get Commit

Get Branch

Get Object

? 2. Can we limit users to access specific commit only? On the RBAC documentation, the Get Commit is only at the repo level (

arn:lakefs:fs:::repository/{repositoryId}

) but not commit level. Thank you

Jason Trinidad

07/16/2025, 2:53 PM

Hi all - I'm new to lakefs and hoping to find a way to squash commits during merge. My thinking is that our commit history will also be the version history for the data. Ie I'd like a repo's

main

branch to show just the merge commits, which would reflect the final released data for each version. I don't see a squash functionality either on the GUI or in the docs. Does anyone know if one is available? Thanks!

Mark

07/17/2025, 2:19 PM

Hi all, I merged multiple branches into the main branch (the default branch), but due to dirty data, I attempted to use

lakectl revert

to roll back the main branch to the initial commit (with the message "Repository created"). However, this operation did not succeed. Could you advise me on how to achieve this? Are there alternative methods to revert the branch to its original state?

./lakectl branch revert <lakefs://e2e-dt/main> f66e8092ece39d11e2f3a10fab5342cb3a65cf881e237fcd4321eaedd4792dcf -y

Branch: <lakefs://e2e-dt/main>

update branch: no changes

400 Bad Request