• Chaim Turkel

    Chaim Turkel

    10 months ago
    Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.io.FileNotFoundException PUT 0-byte object on main/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found; request: PUT https://lakefs.k8s.yotpo.xyz dbt-chaim/main/ {} Hadoop 2.7.4, aws-sdk-java/1.11.655 Linux/5.4.0-1058-aws OpenJDK_64-Bit_Server_VM/25.275-b01 java/1.8.0_275 scala/2.12.10 kotlin/1.2.71 vendor/Azul_Systems,_Inc. com.amazonaws.services.s3.model.PutObjectRequest; Request ID: null, Extended Request ID: null, Cloud Provider: AWS, Instance ID: i-02fde599ab86b480c (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:404 Not Found);
    Chaim Turkel
    Oz Katz
    +2
    44 replies
    Copy to Clipboard
  • m

    Murilo Mendonca

    10 months ago
    Hey! I want to build a monorepo that will be responsible to manage all of my data pipelines' code and dag definitions. How "safe" is it to scale out the usage of lakeFS to 50-150people branching from the same production container?
    m
    1 replies
    Copy to Clipboard
  • Ananth Gundabattula

    Ananth Gundabattula

    10 months ago
    Hello All, I was wondering if there is a golang client for LakeFS .. Here is a little bit more information regarding our use case: • We are trying to use LakeFS as a foundation for our MLOps thinking for our platform (Versioned data sets will solve a lot of problems for us in the near future). • We would also like to support Multi-Cloud capability (as our customers can have the flexibility of the cloud provider) . We have our data pipelines written in GO ... • One of the workflows in our data pipelines is to ingest data from pulsar and generate them as parquet files into a particular repository in Lakefs. I see there is a python client which supports object uploads as a first class API... Is there a client SDK for go ? I would like to upload the parquet files via the LakeFS api as opposed to importing from S3 as that might be cleaner approach for multi-cloud support ... Could you please let me know if HTTP API is the only pathway for golang clients or you would recommend any other approach for our use case?
    Ananth Gundabattula
    Lynn Rozen
    +1
    4 replies
    Copy to Clipboard
  • Yusuf Khan

    Yusuf Khan

    10 months ago
    Hey all, I don't want to setup a dedicated postgres database in azure just yet and instead as a sandbox wanted to use an azure vm and download postgres and lakfes on there. Any sizing suggestions? Temp storage, ram, cores etc?
    Yusuf Khan
    Yoni Augarten
    +1
    9 replies
    Copy to Clipboard
  • n

    Nicola Corda

    10 months ago
    ah perfect, this is exactly what I was missing. Adding the --path then will be fast operation, and when doing so the original table stay untouched.
    n
    Yoni Augarten
    +1
    7 replies
    Copy to Clipboard
  • Jiao Yizheng

    Jiao Yizheng

    9 months ago
    Hi, I got the error below when building the latest lakeFS code.
    Jiao Yizheng
    Tal Sofer
    17 replies
    Copy to Clipboard
  • n

    Nicola Corda

    9 months ago
    Hey all, I have data landing to s3 from aws firehose, where I cannot use lakefs. What will be the best approach to ingest data to lakefs continuously?
    n
    Barak Amar
    18 replies
    Copy to Clipboard
  • Yusuf Khan

    Yusuf Khan

    9 months ago
    I'm looking over some of the integrations for databricks and deltalake. The documentation references some hardcoded s3 variables, as such
    spark.hadoop.fs.s3a.bucket.<repo-name>.access.key AKIAIOSFODNN7EXAMPLE
    spark.hadoop.fs.s3a.bucket.<repo-name>.secret.key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    spark.hadoop.fs.s3a.bucket.<repo-name>.endpoint <https://lakefs.example.com>
    spark.hadoop.fs.s3a.path.style.access true
    If I'm using Azure Databricks, would it work to just change these to the azure specific references? , and then same question for the delta lake integration
    Yusuf Khan
    Barak Amar
    7 replies
    Copy to Clipboard
  • v

    Vibhor Gupta

    9 months ago
    Does lakefs support versioning multiple buckets(gcs and aws)/containers(azure) as a single unit ? I have multiple tables with each table being stored in a separate bucket to get better throughputs.
    v
    Barak Amar
    3 replies
    Copy to Clipboard
  • Yusuf Khan

    Yusuf Khan

    9 months ago
    Hey, I downloaded the lakefs binary and postgres v14 on an azure vm. I created a database called lakefs db. Set the connection string for postgres and added my storage account name and key to the config file. When I run lakefs --config config.yaml the program starts up but on the localhost:8000/setup I get an error when I click setup, see attached image. 'postgres' is just the default name of the superuser in the postgres db, I also tried other admin names but still got that error. I wonder if I have the config file wrong. Does anyone have a sample config file that works for azure minus the key ofc?
    Yusuf Khan
    Barak Amar
    7 replies
    Copy to Clipboard