Hi all, i was getting ```Connection broken: Incomp...
# help
y
Hi all, i was getting
Copy code
Connection broken: IncompleteRead
when trying to download a file using lakefs python sdk, i also get similar error for the same file
Copy code
download rks_updated_date=2023-09-24_part_029.trig failed: could not write file '####/works/rks_updated_date=2023-09-24_part_029.trig': stream error: stream ID 245; INTERNAL_ERROR; received from peer
when using lakectl
a
Hi @Yaphet Kebede, I'm sorry, but that could be any number of things. I would ideally need you to attach logs from the lakeFS server, or find some other way to add information. Thanks!
y
lakefs seems ok
Copy code
│
│ For support or any other question,                            >(._.)<
│     join our Slack channel <https://docs.lakefs.io/slack>         (  )_
│

Version 1.18.0

time="2024-04-12T13:48:14Z" level=info msg="initialized S3 Gateway handler" func=pkg/gateway.NewHandler file="build/pkg/gateway/handler.go:128" s3_bare_domain="[<http://s3.local.lakefs.io|s3.local.lakefs.io>]" s3_region=us-east-1
time="2024-04-12T13:48:14Z" level=info msg="starting HTTP server" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:328" listen_address="0.0.0.0:8000"
this is the last log , since the server started, and i got the attached image log from the ingress
i also tried going straight to the lakefs server avoiding any ingress and i doesn't act much differently , it also doesn't log anything more
this is lakefs v1.18.0
a
I don't understand: did lakeFS restart, or is the problem that the client didn't reach lakeFS?
y
no the lakefs didn't restart , and my client is able to reach it. Its just on this file i get errors on the client side, but lakefs server is silent ...
Copy code
INFO:avalon:Starting downloads
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection broken: IncompleteRead(2653159424 bytes read, 178329620 more expected)', IncompleteRead(2653159424 bytes read, 178329620 more expected))': /api/v1/repositories/sem-open-alex-kg/refs/main/objects?path=works/rks_updated_date%3D2023-09-24_part_029.trig
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection broken: IncompleteRead(2653159424 bytes read, 178329620 more expected)', IncompleteRead(2653159424 bytes read, 178329620 more expected))': /api/v1/repositories/sem-open-alex-kg/refs/main/objects?path=works/rks_updated_date%3D2023-09-24_part_029.trig
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection broken: IncompleteRead(2653159424 bytes read, 178329620 more expected)', IncompleteRead(2653159424 bytes read, 178329620 more expected))': /api/v1/repositories/sem-open-alex-kg/refs/main/objects?path=works/rks_updated_date%3D2023-09-24_part_029.trig
here's the complete error from the python client
🤔 1
a
This is very strange to me, so sorry for any silly questions. Your client-side logs are that the download failed, 3 times, after the exact same number of bytes. It's also a fairly round number: 0x9e240000 bytes downloaded. lakeFS does almost nothing on the data path. Timeouts cannot come into effect at a fixed number of bytes. Do you have space on the local disk?
y
Thank you for helping me on this, an no questions are ever silly to me 🙂 , i was wondering about that too so i tried using rclone to grab the same file
Copy code
time="2024-04-12T15:33:28Z" level=error msg="could not write response body for object" func="pkg/gateway/operations.(*GetObject).Handle" file="build/pkg/gateway/operations/getobject.go:133" error="write tcp 10.11.242.96:8000->10.11.93.108:57778: write: connection reset by peer" host=<http://frink-lakefs.apps.renci.org|frink-lakefs.apps.renci.org> matched_host=false method=GET operation_id=get_object path="works/rks_updated_date=2023-09-24_part_029.trig" ref=main repository=sem-open-alex-kg request_id=d5edc854-93bf-4bd3-88d7-a4a55f0e160a service_name=s3_gateway user=admin
time="2024-04-12T15:33:28Z" level=error msg="could not write response body for object" func="pkg/gateway/operations.(*GetObject).Handle" file="build/pkg/gateway/operations/getobject.go:133" error="write tcp 10.11.242.96:8000->10.11.93.108:57794: write: connection reset by peer" host=<http://frink-lakefs.apps.renci.org|frink-lakefs.apps.renci.org> matched_host=false method=GET operation_id=get_object path="works/rks_updated_date=2023-09-24_part_029.trig" ref=main repository=sem-open-alex-kg request_id=aa433d9c-4d16-4fe6-b705-d417caa8e340 service_name=s3_gateway user=admin
time="2024-04-12T15:33:28Z" level=error msg="could not write response body for object" func="pkg/gateway/operations.(*GetObject).Handle" file="build/pkg/gateway/operations/getobject.go:133" error="write tcp 10.11.242.96:8000->10.11.93.108:57804: write: connection reset by peer" host=<http://frink-lakefs.apps.renci.org|frink-lakefs.apps.renci.org> matched_host=false method=GET operation_id=get_object path="works/rks_updated_date=2023-09-24_part_029.trig" ref=main repository=sem-open-alex-kg request_id=6d060386-9bea-42c0-8f0c-68435d58d2ea service_name=s3_gateway user=admin
time="2024-04-12T15:36:31Z" level=error msg="Removing file failed" func="pkg/pyramid.(*TierFS).removeFromLocalInternal" file="build/pkg/pyramid/tier_fs.go:129" error="remove /home/lakefs/lakefs/data/cache/meta-range/dream-kg/218ec452c7b70b2fc5490160315993c6fb0b239580caf9a2584ec7ad84431163: no such file or directory" module=pyramid path=/home/lakefs/lakefs/data/cache/meta-range/dream-kg/218ec452c7b70b2fc5490160315993c6fb0b239580caf9a2584ec7ad84431163
time="2024-04-12T15:36:31Z" level=error msg="Removing file failed" func="pkg/pyramid.(*TierFS).removeFromLocalInternal" file="build/pkg/pyramid/tier_fs.go:129" error="remove /home/lakefs/lakefs/data/cache/range/dream-kg/54d1782873408bca6337f4ce19feb45e1932be67e8fa4c0f9c2ff0f05bb2b4dd: no such file or directory" module=pyramid path=/home/lakefs/lakefs/data/cache/range/dream-kg/54d1782873408bca6337f4ce19feb45e1932be67e8fa4c0f9c2ff0f05bb2b4dd
time="2024-04-12T15:36:44Z" level=error msg="Removing file failed" func="pkg/pyramid.(*TierFS).removeFromLocalInternal" file="build/pkg/pyramid/tier_fs.go:129" error="remove /home/lakefs/lakefs/data/cache/range/soc-kg/0c0b018e8abb300a513ac9b82641341db123576230316ba308ff8cbe03a18c87: no such file or directory" module=pyramid path=/home/lakefs/lakefs/data/cache/range/soc-kg/0c0b018e8abb300a513ac9b82641341db123576230316ba308ff8cbe03a18c87
time="2024-04-12T15:37:29Z" level=error msg="could not write response body for object" func="pkg/gateway/operations.(*GetObject).Handle" file="build/pkg/gateway/operations/getobject.go:133" error="write tcp 10.11.242.96:8000->10.11.93.108:50244: write: connection reset by peer" host=<http://frink-lakefs.apps.renci.org|frink-lakefs.apps.renci.org> matched_host=false method=GET operation_id=get_object path="works/rks_updated_date=2023-09-24_part_029.trig" ref=main repository=sem-open-alex-kg request_id=a1b1876a-5a9a-4a7c-bba6-be817b98ffab service_name=s3_gateway user=admin
and i see there is interaction with /home/lakefs/lakefs/data/cache/... which is an ephemeral path, that would not have a lot on it ... could that be related? maybe i should mount that path to a bigger storage ?
(btw the above log is from lakefs sever, it seems to log when using the s3 interface)
a
The cache error messages seem fine, they're minutes away from the download failures, and anyway innocuous - usually caused by a race to delete things from the cache. But "connection reset by peer" means lakeFS didn't close the connection. Either your connection is timing out, or some proxy doesn't like it, or you have an issue on the client machine. Can you try: 1. Downloading from another machine? 2. Measuring how long the download takes before the error message?
y
ok , i will do that and get back to you , fwiw this was reported by a collegue of mine, and i was able to reproduce it for the same file. but i can time it exactly how long it takes for the disconnection,
👍 1
a
Do you have a required proxy?
y
no i did port forward to connect to the server that's running in a k8s env
😕 1
223 Views