• Harris Vijayagopal

    Harris Vijayagopal

    2 months ago
    In https://docs.lakefs.io/integrations/boto.html, I see that endpoint_url is
    '<https://lakefs.example.com>'
    . If I am running lakefs locally and am connected to a AWS S3 and RDS database, I am wondering what my endpoint_url would be.
    Harris Vijayagopal
    1 replies
    Copy to Clipboard
  • j

    JuanY

    2 months ago
    Hello, could I create a policy to disallow a group to read uncommitted changes?
    j
    Ariel Shaqed (Scolnicov)
    8 replies
    Copy to Clipboard
  • j

    JuanY

    2 months ago
    I tried to create a read policy but somehow instead of a new policy, the change is merged to an existing policy. is this a known issue?
    j
    Ariel Shaqed (Scolnicov)
    11 replies
    Copy to Clipboard
  • s

    setu suyagya

    1 month ago
    Hi team , while setting up lakectl i am getting this error, even if i configured the .yaml file with access key and secret key.
    s
    Or Tzabary
    54 replies
    Copy to Clipboard
  • g

    Georg Heiler

    1 month ago
    Hi, https://docs.lakefs.io/ suggests to version control files like:
    df = spark.read.parquet("<s3a://my-repo/main-branch/collections/foo/>")
    I wonder what are the implications of having one branch per per asset (table) vs. one centralized prefix per branch. What would you recommend? How does the branch prefix map to a database schema (for reasons of discoverability) i.e. when someone tries to read the data with plain spark-sql from i.e. perhaps databricks`s catalog?
    g
    Or Tzabary
    +1
    11 replies
    Copy to Clipboard
  • i

    Isabela Angelo

    1 month ago
    Hello! In my company, we have a data lake with Hudi and tried to use LakeFS. However we realized it doesn't work well with Apache Hudi because many files were duplicated in our tests. Have anyone tried LakeFS with Hudi?
    i
    Or Tzabary
    +2
    11 replies
    Copy to Clipboard
  • s

    setu suyagya

    1 month ago
    Can you please show how to setup Lakefs with aws?
    s
    n
    +1
    11 replies
    Copy to Clipboard
  • 王麒詳

    1 month ago
    Hi there, I am a novice studying and trying to run the MLOps env in my workstation. So far, for data version control, the lakeFS works well in my server. I integrated it with MINIO (Object Storage) and LabelStudio(For data annotation). Something like:
    data coming->|
    MINIO <--> lakeFS <--> LabelStudio <-- data annotation
                  |<--> User access data
    However, I have several questions about data access. 1. What are recommended ways to access data in lakeFS? Most of the data stored in my lakeFS server are image files. I’m currently using
    boto3
    or
    python lakeFS API
    client.object.get_object
    to get bytes and then transfer them to image files. I’m wondering if there is an efficient way to access data or not in the development stage. 2. Another question is cloning data from a repo (or a branch) to a local. I tried some tutorial examples from official docs and ran it successfully. I’d like to know if cloning a large data repo to local is a common way in practice or not. Because when the data is larger than a hundred GB, cloning all of the data to a local machine, and using it seems not the best choice. Thanks for all of your kindly help and a good patience to read my questions😃
    Eden Ohana
    +1
    15 replies
    Copy to Clipboard
  • f

    Farman Pirzada

    1 month ago
    howdy folks, im working on deploying lakefs to our GCP platform using Google Cloud Run. It will not run so I needed to check via Docker what the issue was:
    PORT=8080 && docker run -p 9090:${PORT} -e PORT=${PORT} 4239d2f8608b
    This is the error message I am seeing:
    time="2022-08-19T04:04:47Z" level=info msg="Config loaded" func=cmd/lakefs/cmd.initConfig file="cmd/root.go:103" fields.file=/home/lakefs/.lakefs.yaml file="cmd/root.go:103" phase=startup
    time="2022-08-19T04:04:47Z" level=fatal msg="Invalid config" func=cmd/lakefs/cmd.initConfig file="cmd/root.go:108" error="bad configuration: missing required keys: [auth.encrypt.secret_key blockstore.type]" fields.file=/home/lakefs/.lakefs.yaml file="cmd/root.go:108" phase=startup
    I understand what I'm missing:
    auth.encrypt.secret_key 
    blockstore.type
    However, how do i check these vales? This is my
    Dockerfile
    :
    FROM treeverse/lakefs:latest
    I'm using Cloud Run and have my values like so
    values.yaml
    from the example:
    logging:
      format: json
      level: WARN
      output: "-"
    auth:
      encrypt:
        secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
    blockstore:
      type: gs
    If I run this as suggest in the docs as
    lakefs-config.yaml
    I will not be able to proceed because I don't have a local SQL connection to create. I am newish to Docker so I appreciate your patience with me here and can try to provide as much as possible but I'm not sure how to get the
    .lakefs.yaml
    file to read the values i have in my
    values.yaml
    - if I'm reading the error message correctly.
    f
    d
    29 replies
    Copy to Clipboard
  • b

    Bart Keulen

    1 month ago
    Hi everyone! I just started playing around with LakeFS and first of all I want to say that I really like it. Currently I am trying to ingest data from LakeFS
    repo-a
    into LakeFS repo-b:
    lakectl ingest --s3-endpoint-url <http://lakefs:8000> --from <s3://repo-a/main> --to <lakefs://repo-b/main/> --dry-run
    But I get the following error:
    error walking object store: NoCredentialProviders: no valid providers in chain. Deprecated.
    	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
    Error executing command.
    Is there an easier/more direct way to ingest/import data from another lakefs repo? If not, I would like some help in getting this working.
    b
    Eden Ohana
    +1
    6 replies
    Copy to Clipboard