• Yoni Augarten

    Yoni Augarten

    2 months ago
    So if I understand correctly, some groups should be able to access specific branch names (corresponding to data domains), across different repositories?
    Yoni Augarten
    3 replies
    Copy to Clipboard
  • Gideon Catz

    Gideon Catz

    2 months ago
    Hi @Itai Admi, @Jonathan Rosenberg. Continuing our previous thread, here’s a status update:. I’m currently getting “bad request” amazon S3 exception, during a
    doesBucketExist
    request, as part of an attempt to write data to lakeFS:
    com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 5A4G0DYMDXP6H486, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: RlgVlqdoXIpa4ieQL/mUU4kaRthFB4HrwvS7RpYawg2MYG2laCbapsgmrEog7L5+YBOsjRL2QwE=
    	at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
    	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
    	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
    	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
    	at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
    	at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
    	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
    	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    	at io.lakefs.LakeFSFileSystem.initializeWithClient(LakeFSFileSystem.java:93)
    	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:67)
    	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    ...
    Might it be that there’s an incompatibility in some versions between hadoop/lakeFS/amazon jars? I’m using the pre-bundled environment with spark 3.2.1 and hadoop 2.7. Thanks
    Gideon Catz
    n
    +3
    29 replies
    Copy to Clipboard
  • Harris Vijayagopal

    Harris Vijayagopal

    2 months ago
    Hey everyone, I am a little confused with the instructions on deploying lakeFS on AWS. I was able to follow through with the quickstart tutorial and having lakefs and lakectl run locally. When following https://docs.lakefs.io/deploy/aws.html#on-ec2, I am a bit confused as to what the connection_string and secret_key would be. The secret key can be any randomly-generated string that I make? I am also a complete beginner on AWS and was wondering what it means to 'run the binary on the EC2 instance'. I would have to connect to an EC2 instance and run the command there?
    Harris Vijayagopal
    n
    9 replies
    Copy to Clipboard
  • y

    Yaphet Kebede

    2 months ago
    Hi, i recently started using lakefs with python client. I was a little confused on how to do change detection. The use case i had was there are some data transformation tasks. Each task would process some files in some dir. and commit only changed files. I tried globbing all files in a dir and uploading and commiting. Even though some files are unchanged, the commit would still have the files as part of the commit, this affects the downstream tasks. Is there a way to detect files that have changed from lakefs, so that last commit would only include changed files ?
    y
    Idan Novogroder
    15 replies
    Copy to Clipboard
  • Harris Vijayagopal

    Harris Vijayagopal

    2 months ago
    I see on https://docs.lakefs.io/deploy/aws.html#on-ec2 the command is
    lakefs --config config.yaml run
    However you would use
    lakectl config
    to set up the cli. Wondering what are the key differences in these two commands
    Harris Vijayagopal
    1 replies
    Copy to Clipboard
  • Harris Vijayagopal

    Harris Vijayagopal

    2 months ago
    I have created a S3 bucket to use for https://docs.lakefs.io/setup/create-repo.html#create-the-repository. I am wondering on how I would find the address of the bucket
    Harris Vijayagopal
    Idan Novogroder
    8 replies
    Copy to Clipboard
  • j

    Jude

    2 months ago
    I am running lakefs version 0.63.0 which is running on Postgres DB, but we switched to Yugabyte DB for some reason. Now when I try to start lakefs I keep getting this error
    " schema version not compatible with the latest version"
    j
    Idan Novogroder
    17 replies
    Copy to Clipboard
  • Harris Vijayagopal

    Harris Vijayagopal

    2 months ago
    When tagging a commit, I see that the tag is also applied to the parents of that commit ID. Is there anyway to tag the files in that commit and not files modified in parent commits?
    Harris Vijayagopal
    Idan Novogroder
    +2
    10 replies
    Copy to Clipboard
  • Harris Vijayagopal

    Harris Vijayagopal

    2 months ago
    I have some questions regarding the internals of LakeFS1. I see that creating a branch does not perform data duplication but is a mutable pointer to the commit that the branch is based out of, and is a metadata operation. I have a general understanding of the versioning internals but am confused as to how these changes are stored in the metadata. Is it similar to Git by doing delta compression. I have relatively basic understanding of how the git storage system works which may be the cause of my confusion for which I apologize in advance. 2. On my object store, I see that files names are a random string. I'm assuming this is a type of representation of my file and am confused as to what this represents and how it relates to the physical path as objects in the physical are never modified. 3. How are file differences stored and retrieved when switching between branches. 4. When committing, I can add a metadata key-value pair to the commit. How can this functionality help with tracking commit changes and/or grouping a certain subset of commits together?
    Harris Vijayagopal
    Idan Novogroder
    2 replies
    Copy to Clipboard
  • Gideon Catz

    Gideon Catz

    2 months ago
    Hi guys, I would like to create a new branch from a scala (Spark) code, and to later commit changes in it. Do I need to explicitly invoke these endpoints, or are there some wrapper functions I can use?
    Gideon Catz
    Idan Novogroder
    3 replies
    Copy to Clipboard