Is there an upper limit to how many files can be a...
# help
t
Is there an upper limit to how many files can be apart of a merge? Were hitting a timeout between LakeFS and AWS trying to merge a branch that has a few hundred thousand files. Or is there a way to configure the timeout between them, current it looks like it’s set to 30s. Adding and committing these changes worked no problem.
time="2021-11-10T15:33:02Z" level=trace msg="HTTP call ended" func=pkg/httputil.TracingMiddleware.func1.1 file="build/pkg/httputil/tracing.go:149" host=xxxx<<http:///|http:///>> method=POST path=/api/v1/repositories/aif-xxxx-xxxx/refs/xxxx-xxxx-processing-a6c0f/merge/main request_body="[123 125]" request_id=da585880-e027-41bb-b3cc-b64bfa08a818 response_body="{\"message\":\"merge in CommitManager: apply ns=<<s3://lakefs-data/aif-xxxx-xxxx%3Cs3://lakefs-data/xxx%3E%7Cs3://lakefs-data/aif-xxxx-xxxx<s3://lakefs-data/xxx>>> id=3fe239d80f74e" response_headers="map[Content-Type:[application/json] X-Request-Id:[da585880-e027-41bb-b3cc-b64bfa08a818]]" sent_bytes=0 service_name=rest_api status_code=500 took=30.008900206s
👀 1
g
Hey @Thomas Vander Wal Seems like the issue is the connection between the client to lakeFS. Can you please try running the merge command using lakectl directly on your lakeFS machine.
t
I got the same error doing the merge from the UI
i
Hey @Thomas Vander Wal By “timeout between lakeFS and AWS” you mean lakeFS & S3? If so, we would be surprised if that’s the case. During commit/merge lakeFS will upload only metadata files called ranges and metaranges. Their size is hardly ever exceeding 20MB so reaching a 30 seconds timeout there is unlikely. I also don’t see anything in lakeFS configuration that might suggests that requests are cancelled after 30 seconds. Is it possible that somewhere along the way (VPC rules, load balancers, etc…) there’s a component that has that limit? The reason @Guy Hardonag requested to check using lakectl on the running lakeFS machine is exactly that - bypassing everything and checking the merge directly on lakeFS.
t
Ok I’ll post the logs in a second, but I got the same 504 timeout. Other merges seem to work fine though fwiw with less files. What makes me think it’s between the app and s3 is that request seems to be going out to the backing bucket that I’m using.
Command /home/lakefs $ lakectl merge lakefs://aif-xxx-xxx/xxx-xxx-processing-a6c0f lakefs://aif-xxx-xxx/main -c /tmp/.lakectl.yaml --l og-level trace DEBU[0000]/build/cmd/lakectl/cmd/root.go:67 github.com/treeverse/lakefs/cmd/lakectl/cmd.glob..func66() loaded configuration from file fields.file=/tmp/.lakectl.yaml file=/tmp/.lakectl.yaml Source: lakefs://aif-xxx-xxx/xxx-xxx-processing-a6c0f Destination: lakefs://aif-xxx-xxx/main 504 Gateway Timeout
Copy code
Logs from Pod:
time="2021-11-10T181939Z" level=trace msg="HTTP call ended" func=pkg/httputil.TracingMiddleware.func1.1 file="build/pkg/httputil/tracing.go:149" host=lakefs.ai.us.lmco.com method=POST path=/api/v1/repositories/aif-xxx-xxx/refs/xxx-xxx-processing-a6c0f/merge/main request_body="[123 125]" request_id=fc0348ba-0c85-4674-921f-253dd6839b66 response_body="{\"message\":\"merge in CommitManager: apply ns=s3://lakefs-data/aif-xxx-xxx id=3fe239d80f74e" response_headers="map[Content-Type:[application/json] X-Request-Id:[fc0348ba-0c85-4674-921f-253dd6839b66]]" sent_bytes=0 service_name=rest_api stat us_code=500 took=29.992150768s ```
Successful merge on same cluster, just smaller repo /home/lakefs $ lakectl merge lakefs://proxies-test/test lakefs://proxies-test/main -c /tmp/.lakectl.yaml Source: lakefs://proxies-test/test Destination: lakefs://proxies-test/main Merged "test" into "main" to get "f29f5da746ae0da5248c316ff8963cdf3030f8618b160c11ae2ae9854b4b8699". Added: 1 Changed: 0 Removed: 0
i
Thanks Thomas for checking. Looking through the code, I didn’t find anything obvious. We’ll continue to investigate that and track the progress under this issue.
heart lakefs 1
t
Great thanks! If we find anything else worth reporting we’ll update on there
i
@Thomas Vander Wal just making sure - did you run the
lakectl merge
command from the same pod that lakeFS is running?
t
@Itai Admi yes that’s correct
i
And the endpoint of the
.lakectl.yaml
is
127.0.0.1
? We want to avoid the lb routing..
t
My apologies it was not! Switching it to avoid the load balancer did complete the commit
g
Thanks @Thomas Vander Wal , Happy to hear it worked! We are leaving the issue open and checking what caused the merge to take more than 30 seconds. We will keep you updated!
t
Ok I added some comments to the ticket. I need to leave for the weekend but I’m almost certain it’s a default timeout for open shift routes that I’ll need to configure
I can test that out and report back if it was indeed the cause
g
Thanks, that would be great, Enjoy your weekend :)