lakeFS #help

Channels

Alexey Shchedrin

12/05/2023, 2:16 PM

May I just ask for one confirmation: whenever you create a new repository, does it matter how I name my repo and whether that name is in sync with my S3 bucket name?

Alexey Shchedrin

12/05/2023, 2:16 PM

Any specific naming convention?

Alexey Shchedrin

12/05/2023, 2:18 PM

It seems that my bucket name does not get mentioned/used (according to that error message(

Alexey Shchedrin

12/05/2023, 2:27 PM

What are my next steps? Maybe using Docker verion?

Danut Prisacaru

12/07/2023, 10:07 PM

Does LakeFS integrate with GitLab? We would like to keep our Python code and CI/CD in GitLab and use LakeFS for data versioning with “links” between the two systems?

Ariel Shaqed (Scolnicov)

12/10/2023, 11:18 AM

Hi @Giacomo Matrone, I improved documentation in order to help with this issue that you had: it was probably impossible to understand what "id" and "ref" meant. Could you please go over the PR or over the TagCreation schema in the generated docs, and tell us if this would have helped you? I really hope we resolved the issue.

Tal Sofer

12/11/2023, 1:59 PM

I’m experimenting with

lakectl local

and ran into a problem during branch checkout - I first cloned my repo with -

Copy code

lakectl local clone <lakefs://tal-samples/main/>

Then, I created branch b1 on my repo, and tried to track it locally with

Copy code

lakectl local checkout . --ref b1

I’m getting the following error

Copy code

ref malformed lakefs URI
Error executing command.

Can you please help understanding the problem?

Florentino Sainz

12/13/2023, 12:12 PM

Hi, are the examples https://github.com/treeverse/charts/blob/master/examples/lakefs/enterprise/values-oidc.yaml still valid as of today? found some differences with https://docs.lakefs.io/reference/security/sso.html aswell trying to deploy with the following and I'm getting YAML parse error on

Copy code

lakefs/templates/fluffy-deployment.yaml: error converting YAML to JSON: yaml: line 85: did not find expected '-' indicator (sadly I can't see the fluffy-deployment.yaml :disappointed: )

Adrian Rumpold

12/14/2023, 8:48 AM

Congrats on launching the revamped Python SDK! 🐍 A small heads-up regarding the blog post: the code snippet copy button UI seems broken on Chrome (v120 on Linux x64), screenshots are before and after clicking the button. The code snippet itself is copied to the clipboard just fine, it's just the appearance of the button that's off.

👍🏽 1

Giuseppe Barbieri

12/20/2023, 11:18 AM

setting up LakeFS, what kind of string is expected here at

Copy code

LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING

something like this?

Giuseppe Barbieri

12/20/2023, 12:12 PM

can you run LakeFS with Podman?

Emil Ingerslev

01/04/2024, 11:51 AM

Hi there, I'm very new at lakeFS and is looking for a way to implement the following: • we got a streaming price input, coming from a message bus with GB of data each day • we want to read those messages into lakeFS using python in a running service • Stored lakeFS data should get read by Dagster Assets and into new "tables" in an ETL manner • Eventually the data get's into future price estimations and stored in lakeFS again (or somewhere else) for use in a production service In such a setup, what would be the easiest integration to use for lakeFS? • How do I partition the data? Daily/hourly? I mean so it does not become to big? • How do I read data into Dagster Assets across partitions? • What would be the best way to store the results for querying in a service? Is it possible just to use lakeFS? Sorry for the noob'y question. I know I'm very early on the lakeFS/data journey 🙈 Any input is appreciated.

Eldar Sehayek

01/05/2024, 12:32 AM

Hi there, I'm looking into lakeFS with Delta Lake integration and wanted to know if there's a way to use Delta Lake Python SDK together with lakeFS Python SDK using the pre-signed URL mode, where my S3 files will be read/write only by the SDKs running in my code and not the lakeFS server itself?

jwshin

01/05/2024, 6:00 AM

Hi there, I encountered an issue, and I was wondering if you have a moment to help? The problem I'm facing is as follows: I uploaded sample data of more than 5MB to the LakeFS Repository, and I encountered an "s3 error: missing etag" message. It seems that adding the etag option to the CORS header should resolve this, but I'm not quite sure how to do it. Could you provide some guidance or a guide on how to address this?

唐治喜

01/07/2024, 1:27 PM

Hi there, can lakefs use mariadb? And how to setup which the docker-compose.yaml file? version: '3.9' services: minio: image: treeverse/lakefs:1.6 container_name: lakefs ports: - "18000:8000" - "18080:8080" environment: - LAKEFS_BLOCKSTORE_TYPE=s3 - LAKEFS_BACKEND_TYPE=s3 - LAKEFS_S3_ENDPOINT=http://192.168.120.22:19000 - LAKEFS_S3_ACCESS_KEY_ID=oxJU - LAKEFS_S3_SECRET_ACCESS_KEY=sZlYWvhF5 - LAKEFS_METADATA_STORE_TYPE=mysql - LAKEFS_DATABASE_TYPE=mysql - LAKEFS_DATABASE_HOST=192.168.2.22 - LAKEFS_DATABASE_PORT=3306 - LAKEFS_DATABASE_NAME=lakefs_db - LAKEFS_DATABASE_USERNAME=qft - LAKEFS_DATABASE_PASSWORD=oB2jJq5oWUIIxkS - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=hoz2W4tSmI5N volumes: - /lakefs_data:/lakefs/data

Yaphet Kebede

01/10/2024, 3:43 PM

Hi all i wsa trying to use lakefs free version paired with an existing s3 bucket, I had a bucket setup in the following way

Copy code

blockstore:
      type: s3
      s3:
        endpoint: "<https://na-s3.somehost.com/bucket>"
        discover_bucket_region: false
        credentials:
          access_key_id: "######"
          secret_access_key: "########"

the s3 bucket interface i am using is the following (https://docs.netapp.com/us-en/ontap/s3-config/ontap-s3-supported-actions-reference.html#bucket-operations) I was wondering if anyone here is using netapp s3 buckets as storage backend for lakefs? I was getting errors like this when i was using it

Copy code

error="operation error S3: GetObject, https response error StatusCode: 501, RequestID: , HostID: , api error NotImplemented: A header or query you provided implies functionality that is not implemented."

which i believe is coming from my netapp s3 instance, but was wondering what headers were being sent by lakefs ?

Nacho Corcuera

01/11/2024, 9:45 AM

Hi all! I am trying to set up some CI/CD pipelines. The idea is to use lakectl for creating the branches and doing the commits. I have seen that there is --config for passing a YAML file with the configuration avoiding to do a lakectl config in the initial set up. Is there a way to pass the access_key, secret_key and endpoint directly to lakectl without constructing a YAML file? Thanks!

Giuseppe Barbieri

01/11/2024, 10:47 AM

hi, what's the

LAKEFS_AUTH_ENCRYPT_SECRET_KEY

useful for? Should be random generated?

Giuseppe Barbieri

01/12/2024, 10:21 AM

running

Copy code

podman run \
  --name lakefs2 \
  ...
  <http://docker.io/treeverse/lakefs:latest|docker.io/treeverse/lakefs:latest> run

launch lakeFS 1.3.1, browsing to the web server at 8000, it notifies a new version available, which will forward me to the Github page where I can see the latest being 1.7.0 How can I get that?

Sam Carter

01/13/2024, 11:40 AM

## Repository within a repository,

lakectl local checkout/clone

Can a lakeFS repository contain another (or multiple) lakefs repository, tracked with the

*.yaml

file generated by

lakectl local checkout/clone

? I'm looking for a way a repository can "require" other repositories, Similar to a

git submodule

. See the rough attached mermaid diagram for what I'm trying to achieve. This is because I make 3D animations. A 3d model asset often requires a model, which requires a model, which requires a model, and so on. A whole scene typically contains tens or hundreds of models like this, so a dependency tree of models grows. Why can't you use just one repository: I think it is the case that when creating a repository (A), it can contain any set of objects from your object storage (minio in my case) including all the objects from some other repository (B), but not the knowledge that the objects in B make up a repository called B. But next, if you separately update the objects in repo B (check it out > update some files > commit back to B), will A get the updated objects? If my understanding is correct, the answer is no. A would retain the versions it originally pointed to (not my desired behaviour). Leading to the original question above... It is "not my desired behaviour" because (again, if my understanding is correct) in order to update A with a new set of objects from B without knowing that those objects are a collection making up a repository called B, I would have to manually request each object be updated to a new version individually, which would mean manually keeping lists of what objects represented some "module" that I would like to depend on.

Ion

01/17/2024, 3:52 PM

Hey @Oz Katz, Thought to join the slack because I also had some questions on the S3 emulation.. (I'm ion-elgreco from GitHub btw)

jumping lakefs 1

Ion

01/17/2024, 3:54 PM

So S3 requires a locking cliënt for concurrent writers to write to the same delta table but I'm wondering whether that is also required for the S3 emulation in LakeFS if the actual storage backend is Azure adls? Are you able to shed some light on this? :)

Giuseppe Barbieri

01/18/2024, 5:51 PM

so, we managed to set up successfully lakeFS and now we would like to have the podman container started on boot.. what's the best way for that? A systemd service?

Giuseppe Barbieri

01/18/2024, 5:52 PM

what's the relationship between minIO buckets and lakeFS repos? Should every repo have its own bucket or can multiple repos sharing the same bucket?

唐治喜

01/22/2024, 1:40 PM

Hi, I deleted a repository, and then I checked the bucket in minio, the files remained, but these files can't be previewed, then I tried to create the repository again, the action blocked by lakefs, and error as follow: failed to create repository: found lakeFS objects in the storage namespace(s3://ne/) key(_lakefs/dummy): storage namespace already in use what can I do if I want to reuse the deleted repository and the files in it?

mohamed islam

01/22/2024, 4:36 PM

hello I want to create role based policies to restrict users in specific group to certain projects. So for example I have multiple users working on project A I want them to only modify repo A while other users working on project B only able to modify B. Do I need to create developer policy for each repo A and B and attach them to group A and group B or does someone have a smarter way

Giuseppe Barbieri

01/24/2024, 11:25 AM

is anyone using Quadlet here to start lakeFS in Podman?

Lior Resisi

01/28/2024, 3:08 PM

Hi there, is there any compatibility matrix of Java version, Spark 3.x, Hadoop, AWS libs and lakeFS?

Bill Li

01/30/2024, 9:11 PM

If I'm adding a file to LakeFS without commiting it. Where is this uncommited file sitting?

Selva

02/01/2024, 2:33 PM

Hi. I am new to LakeFS. Can I mount the LakeFS branch in my Windows File Explorer similar to how we can mount onedrive? I could see that Microsoft Fabric has something called OneLake which can be mounted in the file explorer. My use case, A spark job creates 30k files in the LakeFS branch and user should be able to navigate and open the file using our custom desktop based executable