• Tal Sofer

    Tal Sofer

    1 month ago
    Anyone have an idea why automated tests are not running on https://github.com/treeverse/lakeFS/pull/3779?
    Tal Sofer
    Ariel Shaqed (Scolnicov)
    +1
    6 replies
    Copy to Clipboard
  • Guy Hardonag

    Guy Hardonag

    1 month ago
    Hi, I would like to release a new minor version of lakeFS Any objections?
  • Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    1 month ago
    Do we need/want dependabot running on docs-lakefs? Every time it gets busy, it issues PRs for every version of docs. this number alone grows linearly with releases.
    Ariel Shaqed (Scolnicov)
    1 replies
    Copy to Clipboard
  • Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    1 month ago
    I'm having some difficulty bringing up DBT :dbt: on the lakeFS :lakefs: Everything Bagel 🥯 . I've updated some things to use up-to-date DBT, avoid docker-compose-prefixes, and bump
    ulimit -n
    . Now my hive-metastore container fails to start; AFAICT the issue is
    hive                      | MetaException(message:Required table missing : "`DBS`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables")
    (full logs are longer). It looks like Hive needs some startup script (or to set that value). Has anyone run into this?
    Ariel Shaqed (Scolnicov)
    Jonathan Rosenberg
    4 replies
    Copy to Clipboard
  • n

    Niro

    1 month ago
    Hi Everyone, I have a question about the database storage sizing guide in the lakeFS documentation According to the documentation, the storage requirements are about 150MiB per every 100,000 uncommitted writes - which is roughly around 1500 bytes per write. Looking at the code - I see that the lakeFS is writing the following
    Entry
    struct per write:
    ent := &Entry{
    		Address:      entry.PhysicalAddress,
    		AddressType:  addressTypeToProto(entry.AddressType),
    		Metadata:     entry.Metadata,
    		LastModified: timestamppb.New(entry.CreationDate),
    		ETag:         entry.Checksum,
    		Size:         entry.Size,
    		ContentType:  ContentTypeOrDefault(entry.ContentType),
    	}
    Creating a gross calculation taking into account field limits:Address - according to AWS guidelines does not exceed -- 1024 bytes AddressType - int32 -- 4 bytes Metadata - according to AWS limited to 2KB user data -- 2048 bytes LastModified - int64 -- 8bytes Etag - AWS limitation -- 1024 bytes Size - int64 -- 8 bytes ContentType - Lets use the worst case scenario -- 1024 bytes Summing this up we get over 5000 bytes which is far from the given estimation, and this is without taking into consideration other data which is saved such as entry key and checksum Am I missing something here?? (Keep in mind that these are general approximation - not trying to do exact math here but rather get a sense of the size)
    n
    Ariel Shaqed (Scolnicov)
    2 replies
    Copy to Clipboard
  • Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    1 month ago
    Calling all Java experts 🙂 I'm working to support lakeFSFS on Hadoop3. This will be a distinct package on Maven central -- it possesses different semantics (hadoop3 FileSystem has a subtly different spec than hadoop2 FileSystem) and different (provided!) dependencies. I'm trying to figure out the Maven Way to name this package. Does anyone have any suggestions, or ideally examples of how to do this? This is particularly important if you are a developer who uses LakeFSFileSystem: it will directly impact how you build your code (and a chance to get a strange hard-to-debug error at runtime...). The only option that I can see so far is to incorporate a string (_hadoop2/_hadoop3, or maybe -hadoop2/-hadoop3) into the package name. This is similar to the way Scala packages incorporate a Scala language minor version into their name -- and for a similar reason: the Scala runtime library is an implicitly provided dependency to all Scala packages, and that changes incompatibly between Scala minor versions.
    Ariel Shaqed (Scolnicov)
    1 replies
    Copy to Clipboard
  • Guy Hardonag

    Guy Hardonag

    1 month ago
    Planning to release a fix version with :
    Fix DB serialization error during multiple writes to the same key (#3862)
    Any objections?
  • n

    Niro

    1 month ago
    Hi lakeFSers, Wanted to have your input on something: We are currently planning the KV migration procedure. As part of the migration, the lakeFS database configuration parameters change. We added a new section per driver type with its own configuration. The current flow requires the users to copy the postgres configuration into a designated 'postgres' section in the database configuration section, perform the migration and afterwards delete the old configuration parameters. A concern was raised that this might cause user friction, so it was suggested that we take the parameters from the old path and issue a deprecation warning, and remove these configuration paths in a future version. Another approach says that we already require the user to update the configuration (Need to specify the database 'type' in the configuration)
    n
    Oz Katz
    2 replies
    Copy to Clipboard
  • Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    1 month ago
    TIL about Maven Central Search. We've been searching for a tool to find Java packages that provide a class. This does pretty much that; it could be useful for debugging shading failures. Example:
    ❯ mcs class-search -f org.apache.hadoop.fs.FileSystem
    Searching for artifacts containing org.apache.hadoop.fs.FileSystem...
    Found 743 results (showing 20)
    
      Coordinates                                                                            Last updated
      ===========                                                                            ============
      org.scray:scray-hdfs-writer:1.1.1                                                      13 Sep 2018 at 10:08 (IDT)
      org.scray:scray-hdfs-writer:1.1.0                                                      12 Sep 2018 at 11:45 (IDT)
      org.apache.hadoop:hadoop-common-instrumented:0.22.0                                    10 Dec 2011 at 04:33 (IST)
      org.apache.hadoop:hadoop-common:0.22.0                                                 10 Dec 2011 at 04:33 (IST)
      org.jvnet.hudson.hadoop:hadoop-core:0.19.1-hudson-3                                    03 Sep 2009 at 00:41 (IDT)
      org.apache.mahout.hadoop:hadoop-core:0.19.1                                            03 Apr 2009 at 09:53 (IDT)
      org.jvnet.hudson.hadoop:hadoop-core:0.19.1-hudson-2                                    16 Mar 2009 at 00:54 (IST)
      org.jvnet.hudson.hadoop:hadoop-core:0.19.1-hudson-1                                    15 Mar 2009 at 22:40 (IST)
      org.jvnet.hudson.hadoop:hadoop-core:0.19.1                                             12 Mar 2009 at 07:15 (IST)
      org.jvnet.hudson.hadoop:hadoop-core:0.19.0                                             11 Mar 2009 at 08:02 (IST)
      com.tencent.bk.base.datahub:hadoop-common:2.7.3-bkbase.1                               26 Oct 2021 at 06:51 (IDT)
      org.apache.hadoop:hadoop-common:0.23.11                                                19 Jun 2014 at 17:17 (IDT)
      org.apache.hadoop:hadoop-common:0.23.10                                                03 Dec 2013 at 07:46 (IST)
      org.apache.hadoop:hadoop-common:0.23.9                                                 01 Jul 2013 at 17:45 (IDT)
      org.apache.hadoop:hadoop-common:0.23.8                                                 28 May 2013 at 18:27 (IDT)
      org.apache.hadoop:hadoop-common:0.23.7                                                 11 Apr 2013 at 21:26 (IDT)
      com.google.code.maven-play-plugin.org.apache.hadoop:hadoop-core:0.20.2-with-200-826    11 Mar 2013 at 22:15 (IST)
      org.apache.hadoop:hadoop-common:0.23.6                                                 29 Jan 2013 at 05:53 (IST)
      org.apache.servicemix.bundles:org.apache.servicemix.bundles.hadoop-core:0.20.203.0_3   21 Jan 2013 at 04:37 (IST)
      org.apache.hadoop:hadoop-common:0.23.5                                                 20 Nov 2012 at 20:29 (IST)
    Ariel Shaqed (Scolnicov)
    Oz Katz
    3 replies
    Copy to Clipboard
  • Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    1 month ago
    This blogpost makes me suspect that the generated Java API client does not reuse HTTP connections: we probably get a new SSL Socket Factory each time we create an API client, which prevents reuse.
    Ariel Shaqed (Scolnicov)
    4 replies
    Copy to Clipboard