Is there an upper limit to how many files can be apart of a lakeFS #help

Is there an upper limit to how many files can be a...

Thomas Vander Wal

11/10/2021, 4:52 PM

Is there an upper limit to how many files can be apart of a merge? Were hitting a timeout between LakeFS and AWS trying to merge a branch that has a few hundred thousand files. Or is there a way to configure the timeout between them, current it looks like it’s set to 30s. Adding and committing these changes worked no problem.

time="2021-11-10T15:33:02Z" level=trace msg="HTTP call ended" func=pkg/httputil.TracingMiddleware.func1.1 file="build/pkg/httputil/tracing.go:149" host=xxxx<<http:///|http:///>> method=POST path=/api/v1/repositories/aif-xxxx-xxxx/refs/xxxx-xxxx-processing-a6c0f/merge/main request_body="[123 125]" request_id=da585880-e027-41bb-b3cc-b64bfa08a818 response_body="{\"message\":\"merge in CommitManager: apply ns=<<s3://lakefs-data/aif-xxxx-xxxx%3Cs3://lakefs-data/xxx%3E%7Cs3://lakefs-data/aif-xxxx-xxxx<s3://lakefs-data/xxx>>> id=3fe239d80f74e" response_headers="map[Content-Type:[application/json] X-Request-Id:[da585880-e027-41bb-b3cc-b64bfa08a818]]" sent_bytes=0 service_name=rest_api status_code=500 took=30.008900206s

👀 1

Guy Hardonag

11/10/2021, 5:30 PM

Hey @Thomas Vander Wal Seems like the issue is the connection between the client to lakeFS. Can you please try running the merge command using lakectl directly on your lakeFS machine.

Thomas Vander Wal

11/10/2021, 5:36 PM

I got the same error doing the merge from the UI

Itai Admi

11/10/2021, 5:52 PM

Hey @Thomas Vander Wal By “timeout between lakeFS and AWS” you mean lakeFS & S3? If so, we would be surprised if that’s the case. During commit/merge lakeFS will upload only metadata files called ranges and metaranges. Their size is hardly ever exceeding 20MB so reaching a 30 seconds timeout there is unlikely. I also don’t see anything in lakeFS configuration that might suggests that requests are cancelled after 30 seconds. Is it possible that somewhere along the way (VPC rules, load balancers, etc…) there’s a component that has that limit? The reason @Guy Hardonag requested to check using lakectl on the running lakeFS machine is exactly that - bypassing everything and checking the merge directly on lakeFS.

Thomas Vander Wal

11/10/2021, 6:34 PM

Ok I’ll post the logs in a second, but I got the same 504 timeout. Other merges seem to work fine though fwiw with less files. What makes me think it’s between the app and s3 is that request seems to be going out to the backing bucket that I’m using.

Thomas Vander Wal

11/10/2021, 6:35 PM

Command /home/lakefs $ lakectl merge lakefs://aif-xxx-xxx/xxx-xxx-processing-a6c0f lakefs://aif-xxx-xxx/main -c /tmp/.lakectl.yaml --l og-level trace DEBU[0000]/build/cmd/lakectl/cmd/root.go:67 github.com/treeverse/lakefs/cmd/lakectl/cmd.glob..func66() loaded configuration from file fields.file=/tmp/.lakectl.yaml file=/tmp/.lakectl.yaml Source: lakefs://aif-xxx-xxx/xxx-xxx-processing-a6c0f Destination: lakefs://aif-xxx-xxx/main 504 Gateway Timeout

Copy code

Logs from Pod:

time="2021-11-10T181939Z" level=trace msg="HTTP call ended" func=pkg/httputil.TracingMiddleware.func1.1 file="build/pkg/httputil/tracing.go:149" host=lakefs.ai.us.lmco.com method=POST path=/api/v1/repositories/aif-xxx-xxx/refs/xxx-xxx-processing-a6c0f/merge/main request_body="[123 125]" request_id=fc0348ba-0c85-4674-921f-253dd6839b66 response_body="{\"message\":\"merge in CommitManager: apply ns=s3://lakefs-data/aif-xxx-xxx id=3fe239d80f74e" response_headers="map[Content-Type:[application/json] X-Request-Id:[fc0348ba-0c85-4674-921f-253dd6839b66]]" sent_bytes=0 service_name=rest_api stat us_code=500 took=29.992150768s ```

Thomas Vander Wal

11/10/2021, 6:39 PM

Successful merge on same cluster, just smaller repo /home/lakefs $ lakectl merge lakefs://proxies-test/test lakefs://proxies-test/main -c /tmp/.lakectl.yaml Source: lakefs://proxies-test/test Destination: lakefs://proxies-test/main Merged "test" into "main" to get "f29f5da746ae0da5248c316ff8963cdf3030f8618b160c11ae2ae9854b4b8699". Added: 1 Changed: 0 Removed: 0

Itai Admi

11/10/2021, 6:55 PM

Thanks Thomas for checking. Looking through the code, I didn’t find anything obvious. We’ll continue to investigate that and track the progress under this issue.

heart lakefs 1

Thomas Vander Wal

11/10/2021, 7:12 PM

Great thanks! If we find anything else worth reporting we’ll update on there

Itai Admi

11/11/2021, 9:46 AM

@Thomas Vander Wal just making sure - did you run the

lakectl merge

command from the same pod that lakeFS is running?

Thomas Vander Wal

11/11/2021, 1:43 PM

@Itai Admi yes that’s correct

Itai Admi

11/11/2021, 1:46 PM

And the endpoint of the

.lakectl.yaml

127.0.0.1

? We want to avoid the lb routing..

Thomas Vander Wal

11/12/2021, 1:08 AM

My apologies it was not! Switching it to avoid the load balancer did complete the commit

Guy Hardonag

11/12/2021, 1:24 AM

Thanks @Thomas Vander Wal , Happy to hear it worked! We are leaving the issue open and checking what caused the merge to take more than 30 seconds. We will keep you updated!

Thomas Vander Wal

11/12/2021, 1:27 AM

Ok I added some comments to the ticket. I need to leave for the weekend but I’m almost certain it’s a default timeout for open shift routes that I’ll need to configure

Thomas Vander Wal

11/12/2021, 1:27 AM

I can test that out and report back if it was indeed the cause

Guy Hardonag

11/12/2021, 1:37 AM

Thanks, that would be great, Enjoy your weekend :)

15 Views

Open in Slack

Previous Next