I ve filled an <https github com treeverse lakeFS issues 754 lakeFS #help

I've filled an <issue> about LakeFS missing to lis...

Giuseppe Barbieri

03/08/2024, 7:02 AM

I've filled an issue about LakeFS missing to list one file among the staging files ready to be committed. I tried to: • recreating the repo from scratch • removing all the other repos • updating to the latest 1.12.1 (I had 1.10.0) • deleting the local storage (with caches and so on) in

~/lakefs

in case something got corrupted • querying the API without the webhooks is there something else I can try?

Giuseppe Barbieri

03/08/2024, 8:00 AM

solved,

maxAmount

is by default to 100..

Ariel Shaqed (Scolnicov)

03/08/2024, 9:30 AM

As a rule, any API listing must consult the has_more field on any response. I believe that the current implementation always ends up returning the requested amount if possible. But there is no such API guarantee. A listing will return at least 1 item if there is an item to return, and the caller must be prepared to handle pagination. This is usually simpler than appears at first sight.

👍 1

Giuseppe Barbieri

03/08/2024, 10:10 AM

will there be any drawback if I pass a big number (ie 1k) other than simply waiting when I perform sync calls for retrieving all the data?

Ariel Shaqed (Scolnicov)

03/08/2024, 10:14 AM

The usual rules apply. In order from must-do to maybe... • Be prepared to handle paging. • Pass 1000 if you're going to read everything, let lakeFS do its job. • Don't optimize before you have performance problems. • Pass less than that if there's a good chance you'll want to stop sooner, obviously avoiding work on lakeFS.

👍 1

Giuseppe Barbieri

03/08/2024, 10:15 AM

perfectly clear, thanks Ariel

👍 1

Giuseppe Barbieri

03/08/2024, 10:15 AM

one curiosity though: have you (as team) considered Google ProtoBuffers?

Giuseppe Barbieri

03/08/2024, 12:47 PM

one idea, will passing

Int.MAX_VALUE

be safe enough for me as a client? If the results will be still larger than that, there will be some problem/overflows within LakeFS at that point, outside my reach?

Ariel Shaqed (Scolnicov)

03/08/2024, 2:34 PM

You will not get more than 1000 results , the server protects itself. However I wouldn't pass more than 1000 or some reasonable constant: the client should protect itself too! Requesting an unlimited number means that some hypothetical future version of the server can cause your code suddenly to behave very differently. While I don't see this as likely, I would consider this a minor bug in the client code. I would pass nothing, and worry about a better value when actual performance suffers and I can look for a better value. Typically the default value works nicely.

Giuseppe Barbieri

03/08/2024, 2:36 PM

I guess you forgot a 0, Ariel it should be 10k according to this but yeah, I got your message

Ariel Shaqed (Scolnicov)

03/08/2024, 2:37 PM

Never trust someone who studied mathematics with numbers, I guess.

😁 1

Giuseppe Barbieri

03/08/2024, 2:37 PM

so this prompt me another question, if the results would ever exceed

ListEntriesLimitMax

, then they get simply cut or you will retrieve the next tranche by using the

after

field?

Giuseppe Barbieri

03/08/2024, 2:38 PM

as far as I got from that code, I'd say they get limited/cut

Ariel Shaqed (Scolnicov)

03/08/2024, 2:38 PM

You will receive however many entries the server gives you. You will receive has_more in pagination, and a value to pass as after in your next call.

👍 1

Ariel Shaqed (Scolnicov)

03/08/2024, 2:42 PM

This code from lakectl is the type of loop you'd write. It's in Go, so fairly verbose but readable. There are other examples in our codebase of course.

👍 1

Giuseppe Barbieri

03/11/2024, 8:35 AM

interesting

HTTP response body: {"message":"parameter \"amount\" in query has an error: number must be most 1000\nSchema:\n {\n \"default\": 100,\n \"maximum\": 1000,\n \"minimum\": -1,\n \"type\": \"integer\"\n }\n\nValue:\n 10000\n"}

Giuseppe Barbieri

03/11/2024, 8:36 AM

I was querying the following

Copy code

api.diffBranch(repo, branch).amount(ListEntriesLimitMax).execute() // ListEntriesLimitMax == 10_000

Isan Rivkin

03/12/2024, 8:15 AM

We limit the amount argument in paging for the api clients to be max 1000. So the error you are getting from the server is from the open api middleware that parses the request.

👍 1

Open in Slack

Previous Next