Interested in hearing opinions about <https github com treev lakeFS #dev

Interested in hearing opinions about <https://gith...

Ariel Shaqed (Scolnicov)

03/01/2021, 2:29 PM

Interested in hearing opinions about https://github.com/treeverse/lakeFS/issues/1481 -- how to deliver protobuf changes to the Spark client and other customer code. So far I have 3 possibilities: 1. Do nothing, just copy proto files until we actually get tooling to do it. Might mean the Spark client never supports any new features (because we forget to copy them over) until a customer runs into a missing field and complains. 2. Move to a third git repo and use git submodules. Known to work, never seen it happily used. Probably lowest friction for Treeverse developers. 3. Build proto packages inside the lakefs repo (or a new proto-only repo) for each language, publish to many different package repositories (Maven Central, Go via GitHub, probably PyPI and npmjs at some stage). Adds maximal friction to core development. Happy to read your comments on the issue or here. Thanks!

Yoni Augarten

03/01/2021, 2:45 PM

It's a tough choice, really. I think in an ideal world #3 does not add a lot of friction: if the server is always backward compatible (not forever, just until a new client version is released) - then one can develop server features, publish the relevant protobuf artifacts, and then work on the client version compatible with this new version. Which in theory is a good idea. Things start to get messy when you want to be able to work on the server and client together. Submodules will be a pain for contributors. So it is with sadness that I vote for #1, until we hit a problem.

Barak Amar

03/01/2021, 4:22 PM

I'll go with option 1. Having the proto file(s) in a public repository (lakeFS). Optionally add the proto files as build artifacts when we release. Also at runtime, we can expose the endpoint with all the proto files we support as we do with swagger (/swagger.json). Projects that use the proto files are clients, as long as the proto is backward and future compatible a copy on a client project gets a guaranty to work. Having a newer version available of the proto will have an impact on the proto user, just if it accesses the new fields. It will also bring new capability that the client will need to use/support/test so usually, there is no "auto-upgrade". Any capability that uses it dynamically can work out of the box without a real upgrade of a proto. Each project got its build system and how it loads the proto and to which package it maps the structure, force a specific layout as common code in lakeFS (option 3) can have benefits for a quick start - but usually must project already have a flavor to how they want to work with interfaces, and even have their proto repository or opinionated about checking-in generated code. Also having a compiled version of the proto as a common library usually dictate also the protobuf runtime version, which some project should prefer not to use or have a different timeline to upgrade. Working with submodule or subtree is a valid option is valid but would try to avoid build complexity as much as possible. As for the development in this mode, usually, you will get to PRs where you need to merge reference of the submodule and select which version you need, while using plain files it is easier to manage changes, as they are reflected by any other source code change in diff.

Ariel Shaqed (Scolnicov)

03/02/2021, 7:43 AM

These are all workable solutions. However our decision to go multi-repo was predicated on the existence of tooling. This tooling has so far failed to materialize (apart from buf.build, which is being built as we speak...). So we added more friction than we expected to going multi-repo, and indeed failed to gain one of its advantages: clear separation of protocol buffers from client code. I set up a meeting today to discuss the remaining advantages of separation. We started paying the price when we decided to do it, so I call YAGNI.

7 Views

Open in Slack

Previous Next