https://lakefs.io/ logo
Docs
Join the conversationJoin Slack
Channels
announcements
blockers_for_windward
career-opportunities
celebrations
cuddle-corner
data-discussion
data-events
dev
events
general
help
iceberg-integration
lakefs-for-beginners
lakefs-hubspot-cloud-registration-email-automation
lakefs-releases
lakefs-suggestions
lakefs-twitter
linen-dev
memes-and-banter
new-channel
new-channel
say-hello
stackoverflow
test
Powered by Linen
lakefs-for-beginners
  • r

    Ronnie Ning

    11/23/2022, 6:28 PM
    if we run lakefs local settings, it will create a folder called
    lakefs
    under the current user. Can we customize where
    lakefs
    can be?
    y
    i
    +2
    44 replies · 5 participants
  • f

    Fizza Abid

    11/25/2022, 8:12 AM
    Hello, when in storage namespace I am giving s3 path, it says
    Can only create repository with storage type: local
    , how should I give s3 path?
    y
    o
    +1
    4 replies · 4 participants
  • w

    Walter Johnson

    11/29/2022, 7:44 PM
    I have lakeFS instance running at https://lakefs.quecall.biz/ . I have created a repo but I can't figure out how to make files public. Can anyone help with that? It is my test instance so I have no issue providing credentials for it.
    b
    18 replies · 2 participants
  • o

    Omkar Patil

    11/30/2022, 1:01 PM
    Hi Everyone! I want to bootstrap my LakeFS instance programmatically in Java. I don't want to install LakeFS before but when I run my application it should bootstrap LakeFS instance and I should able to do all the operations. Is it possible in Java?? Please advice . Thank you
    b
    2 replies · 2 participants
  • r

    Ronnie Ning

    12/01/2022, 9:35 PM
    Is there any setting in config for sync? For example, if the repo is deleted in LakeFS, it's also removed from the storage.
    b
    6 replies · 2 participants
  • a

    Alessandro Mangone

    12/05/2022, 12:30 PM
    Hello, I am trying to use the Java API in a Scala codebase, and I am getting an error that was already reported by a user: https://lakefs.slack.com/archives/C02CV7MUV4G/p1666253775100359?thread_ts=1666251707.039089&cid=C02CV7MUV4G In my case I am using SBT instead of maven and don’t have any other dependencies using okhttp3. What could be causing this issue? I am using the SBT Assembly plugin to create a uber-jar, I don’t know if I need to be careful with some merging rule, but I don’t see any dependency conflict
    👀 1
    g
    b
    +1
    109 replies · 4 participants
  • s

    Selva

    12/06/2022, 5:20 AM
    Hi guys. I have two questions on LakeFS and thought you can help me. 1. How can I use the LakeFS url in matlab and C# to read the content? 2. Is there any GUI to checkout file, then edit (in notepad) and then check in. We are used to using sourcetree and would like to know if LakeFS has one similar.
    👀 1
    g
    l
    7 replies · 3 participants
  • a

    Alessandro Mangone

    12/06/2022, 10:59 AM
    Final question (I think, sorry 😅) It’s the first time that I have physical access to the data on s3 and I am seeing that lakefs is generating some binary files, I cannot see directly delta files, branch paths, etc. If that’s the case, does this mean that the only way to integrate with tools like athena and snowflake is by generating manifest files? Also, do the files referenced in the symlink files maintain the original columnar compression format (e.g. orc, parquet)?
    g
    1 reply · 2 participants
  • f

    Fizza Abid

    12/06/2022, 12:46 PM
    Hi what do we pass here in domain
    gateways:
        s3:
          domain_name: <http://s3.lakefs.example.com|s3.lakefs.example.com>
    g
    3 replies · 2 participants
  • r

    Raphaël

    12/20/2022, 4:20 PM
    Hi everyone, I've running lakefs locally with docker to try to make it communicate with a minio s3 bucket. No problem for the installation and the creation of the bucket but when i want to create a new repo on lakefs to retrieve data i get the error : ' failed to create repository: failed to access storage '. Do you have any ideas that may be the cause of this problem ? Thanks in advance.
    j
    d
    +1
    18 replies · 4 participants
  • r

    Raphaël

    12/22/2022, 2:17 PM
    Hi everyone, I try to launch lakefs serveur again after kill it. For the instalaltion, I launched it locally with docker and --local-settings parameter at the end. So I try now : lakefs run --local-settings but i've the error below : "Tried to to get AWS account ID for BI" Does anyone have any idea to solve ?
    j
    8 replies · 2 participants
  • c

    C. bon

    12/28/2022, 9:20 PM
    Hey, I need a way to use branches to access files on s3 and to be sure that files are shared between branches without having to do any copy. However is there a way to do the same without having to create commits every time I add a file ? In short could I do the same with a dirty work tree like I would do in git ? I guess I would have to create a commit before I create any new branch, right ?
    i
    o
    +2
    18 replies · 5 participants
  • e

    edwardlol Zhou

    01/13/2023, 2:03 AM
    Hi everyone I'm new to lakefs, and I have some problem running python-client quickstart. I deployed lakefs on kubernetes with minio as object store and postgres as metadata store. When I try to create a repo using python:
    repo = models.RepositoryCreation(name='example-repo', storage_namespace='<s3://my-bucket/example-repo>', default_branch='main')
    The request fails with error message:
    HTTP response body: {"message":"failed to create repository: failed to access storage"}
    And the log of lakefs pod shows:
    time="2023-01-13T01:56:54Z" level=error msg="failed to get S3 object bucket my-bucket key example-repo/dummy" func="pkg/logging.(*logrusEntryWrapper).Errorf" file="build/pkg/logging/logger.go:258" error="RequestError: send request failed\ncaused by: Get \"<http://my-bucket.minio-service.kubeflow.svc.cluster.local:9000/example-repo/dummy>\": dial tcp: lookup my-bucket.minio-service.kubeflow.svc.cluster.local: no such host" host="localhost:8000" method=POST operation=GetObject operation_id=CreateRepository path=/api/v1/repositories request_id=38ff0f56-99a6-4f4a-b6bd-bba8bf7e7c87 service_name=rest_api user=admin
    time="2023-01-13T01:56:54Z" level=warning msg="Could not access storage namespace" func="pkg/api.(*Controller).CreateRepository" file="build/pkg/api/controller.go:1393" error="RequestError: send request failed\ncaused by: Get \"<http://my-bucket.minio-service.kubeflow.svc.cluster.local:9000/example-repo/dummy>\": dial tcp: lookup my-bucket.minio-service.kubeflow.svc.cluster.local: no such host" reason=unknown service=api_gateway storage_namespace="<s3://my-bucket/example-repo>"
    It seems that 'my-bucket' was parsed as a prefix of my minio endpoint. Here's my lakefs deploy configuration:
    blockstore:
      type: s3
      s3:
        endpoint: <http://minio-service.kubeflow.svc.cluster.local:9000>
        credentials:
          access_key_id: access_key_id
          secret_access_key: secret_access_key
        discover_bucket_region: false
    a
    6 replies · 2 participants
  • b

    Beegee Alop

    01/15/2023, 1:05 PM
    I’ve been chatting with data engineers about the dead letter queue concept applied to data pipelines. Very related to @Adi Polak ‘s post about circuit breakers https://lakefs.slack.com/archives/C020N7X2Y0H/p1673514224119879. If data that’s being brought in has defects and I’d prefer it to be in the “penalty box” until inspection, is lakeFS a good option to hold it?
  • a

    Aapta Bhatt

    01/16/2023, 12:02 PM
    Hi Folks, How to install lakectl via command line ?
    e
    o
    6 replies · 3 participants
  • o

    Omkar Patil

    01/20/2023, 6:46 AM
    Hi Team, Can we upload or create a folder inside lakefs repo?
    e
    1 reply · 2 participants
  • c

    Conor Simmons

    01/27/2023, 4:55 PM
    Hey! I have a couple questions: 1. Does the export functionality work for zero-copy repositories? 2. Is there any way to export via a Python API? I was looking at the spark-submit pip package
    o
    i
    +3
    108 replies · 6 participants
  • c

    Conor Simmons

    01/31/2023, 3:03 PM
    Hey, I have another question: is there any way to get the branch associated with a commit number in the Python SDK? I was hoping get_commit might have this in the response data but I haven't seen it in there
    e
    9 replies · 2 participants
  • c

    Conor Simmons

    01/31/2023, 6:38 PM
    Hey, I have another more general question: do you have some recommendation for the most performant way to upload objects? Right now I am looping through the desired files and it's estimating COCO 2017 val set (with 5000 images, 5000 small JSON) to take 3 hours to upload
    i
    e
    +1
    76 replies · 4 participants
  • a

    Anna Schooneveld

    02/17/2023, 5:34 PM
    Hello, I'm new to lakeFS, and trying it out for my work. The infrastructure team has set up an instance (I believe the production instance) of lakeFS but they couldn't get s3 to work for now so they said I should use local storage.
    i
    i
    13 replies · 3 participants
  • a

    Anna Schooneveld

    02/17/2023, 5:35 PM
    However, when I try to import from local into lakeFS I get this error: creating object-store walker: no storage adapter found: for scheme: local In the UI I see the warning: Block adapter local not usable in production What does this mean?
  • p

    Paul

    02/27/2023, 2:18 PM
    Hello, I need help to config a kv store to avoid having to setup the lakeFS server each time. I want to use postgres so i edit config.yaml file with environnement variables corresponding. Are there any additional steps, on postgres perhaps because it doesn't work at the moment... ?
    i
    2 replies · 2 participants
  • r

    Robert Angeli

    02/28/2023, 4:32 PM
    Hello. I wanted to understand the ADFS integration available for LakeFS.
  • r

    Robert Angeli

    02/28/2023, 4:32 PM
    I don't see much documentation around this
    i
    o
    9 replies · 3 participants
  • r

    Robin Moffatt

    03/01/2023, 9:59 AM
    So I'm starting to really dig into lakeFS now, so have probably a bunch of questions coming up 😅 The first is to check my understanding: there's no equivalent of
    git add
    , is that right? Once I've written files to a branch, I either commit them all, or none? e.g. here I can't just commit
    sensitive/flyers.csv
    and leave the
    drone-registrations/*
    uncommitted?
    diff <lakefs://drones03/main>
    Ref: <lakefs://drones03/main>
    + added drone-registrations/Registations-P107-Active-2016.parquet
    + added drone-registrations/Registations-P107-Active-2017.parquet
    + added drone-registrations/Registations-P107-Active-2018.parquet
    + added drone-registrations/Registations-P107-Active-2019.parquet
    + added drone-registrations/Registations-RecFlyer-Active-2020.parquet
    + added drone-registrations/Registations-RecFlyer-Active-2021.parquet
    + added sensitive/flyers.csv
    y
    3 replies · 2 participants
  • a

    Anna Schooneveld

    03/01/2023, 2:56 PM
    I have a question about LakeFS import. If data is imported/ingested into LakeFS from say an s3 bucket, and the data in changes in that s3 bucket, does it also change in LakeFS? Or does it only if we re-upload it?
    b
    o
    6 replies · 3 participants
  • m

    Mohammad Eslami

    03/05/2023, 5:18 PM
    Hi there. Looking at LakeFS installation on EKS, it only mentions dynamodb as the backend db. Can we instead use a postgres RDS instance instead?
    👀 1
    i
    3 replies · 2 participants
  • b

    Brad

    03/10/2023, 3:23 AM
    HI team - for a lakefs cloud deployment, how do I setup my auth for an new s3 bucket? I realise my organisation didn't go through the full setup procedure.
    g
    2 replies · 2 participants
  • a

    Aviator

    03/14/2023, 8:47 PM
    Having read previous posts, I still feel I need clarification in a particular use case: constant data change in s3 bucket I have data uploaded to my s3 bucket, and I used lakeFS to import this data into my repository. Connected to Pyspark, I performed some data transformation on this data which I finally commit back to my repository. Now there is a change in data uploaded to my s3 bucket, how do I read this new data and compare it with the historical data already committed to my repository. Will lakeFS take note of this data change that took place in my s3 bucket ?
    b
    8 replies · 2 participants
  • a

    Aviator

    03/14/2023, 8:49 PM
    Again, have anyone being able to implement dbt with lakeFS locally. Can you point me to a resource I am going through the docs, but I am yet to understand a thing. Or is it a feature that comes with lakeFS cloud I think these are the two I need clarification on for now Cheers
    g
    r
    3 replies · 3 participants
Powered by Linen
Title
a

Aviator

03/14/2023, 8:49 PM
Again, have anyone being able to implement dbt with lakeFS locally. Can you point me to a resource I am going through the docs, but I am yet to understand a thing. Or is it a feature that comes with lakeFS cloud I think these are the two I need clarification on for now Cheers
g

Guy Hardonag

03/14/2023, 9:21 PM
Hi @Aviator, the DBT feature is part of the open source, running a local environment with DBT can be complicated ( regardless to lakeFS ) to run everything locally you will also need to run trino or spark locally The documentation can be a bit confusing, in https://docs.lakefs.io/integrations/dbt.html it describes how to manage branches with DBT and lakeFS assuming everything is already running. the lakeFS repository contains our “Everything Bagel” which is a docker-compose that runs many tools on your docker machine (including lakeFS, Spark, Trino, and DBT). it could be a good resource to understand the configurations you need Adding some resources about the everything bagel: https://lakefs.io/blog/the-docker-everything-bagel-spin-up-a-local-data-stack/ https://lakefs.io/blog/the-everything-bagel-ii-versioned-data-lake-tables-with-lakefs-and-trino/
❤️ 1
r

Robin Moffatt

03/14/2023, 10:02 PM
@Aviator the links that Guy provides are good ones. We should update the docs though to make all this easier! Are there any particular bits that you're struggling with that we can make sure to cover?
a

Aviator

03/15/2023, 6:25 AM
Thank you @Robin Moffatt Thats all for now
View count: 21