I've filled an <issue> about LakeFS missing to lis...
# help
g
I've filled an issue about LakeFS missing to list one file among the staging files ready to be committed. I tried to: • recreating the repo from scratch • removing all the other repos • updating to the latest 1.12.1 (I had 1.10.0) • deleting the local storage (with caches and so on) in
~/lakefs
in case something got corrupted • querying the API without the webhooks is there something else I can try?
solved,
maxAmount
is by default to 100..
a
As a rule, any API listing must consult the has_more field on any response. I believe that the current implementation always ends up returning the requested amount if possible. But there is no such API guarantee. A listing will return at least 1 item if there is an item to return, and the caller must be prepared to handle pagination. This is usually simpler than appears at first sight.
👍 1
g
will there be any drawback if I pass a big number (ie 1k) other than simply waiting when I perform sync calls for retrieving all the data?
a
The usual rules apply. In order from must-do to maybe... • Be prepared to handle paging. • Pass 1000 if you're going to read everything, let lakeFS do its job. • Don't optimize before you have performance problems. • Pass less than that if there's a good chance you'll want to stop sooner, obviously avoiding work on lakeFS.
👍 1
g
perfectly clear, thanks Ariel
👍 1
one curiosity though: have you (as team) considered Google ProtoBuffers?
one idea, will passing
Int.MAX_VALUE
be safe enough for me as a client? If the results will be still larger than that, there will be some problem/overflows within LakeFS at that point, outside my reach?
a
You will not get more than 1000 results , the server protects itself. However I wouldn't pass more than 1000 or some reasonable constant: the client should protect itself too! Requesting an unlimited number means that some hypothetical future version of the server can cause your code suddenly to behave very differently. While I don't see this as likely, I would consider this a minor bug in the client code. I would pass nothing, and worry about a better value when actual performance suffers and I can look for a better value. Typically the default value works nicely.
g
I guess you forgot a 0, Ariel it should be 10k according to this but yeah, I got your message
a
Never trust someone who studied mathematics with numbers, I guess.
😁 1
g
so this prompt me another question, if the results would ever exceed
ListEntriesLimitMax
, then they get simply cut or you will retrieve the next tranche by using the
after
field?
as far as I got from that code, I'd say they get limited/cut
a
You will receive however many entries the server gives you. You will receive has_more in pagination, and a value to pass as after in your next call.
👍 1
This code from lakectl is the type of loop you'd write. It's in Go, so fairly verbose but readable. There are other examples in our codebase of course.
👍 1
g
interesting
HTTP response body: {"message":"parameter \"amount\" in query has an error: number must be most 1000\nSchema:\n {\n \"default\": 100,\n \"maximum\": 1000,\n \"minimum\": -1,\n \"type\": \"integer\"\n }\n\nValue:\n 10000\n"}
I was querying the following
Copy code
api.diffBranch(repo, branch).amount(ListEntriesLimitMax).execute() // ListEntriesLimitMax == 10_000
i
We limit the amount argument in paging for the api clients to be max 1000. So the error you are getting from the server is from the open api middleware that parses the request.
👍 1