after testing locally, I'm moving to production, u...
# help
g
after testing locally, I'm moving to production, unfortunately it seems the webhook connection is getting refused
Copy code
Error: Post "<http://127.0.0.1:8081/webhooks/format>": dial tcp 127.0.0.1:8081: connect: connection refused
I know this might not be related much to LakeFS itself, but I'm out of options
tried
sudo ss -tulpn
and seems fine
tcp LISTEN 0 4096 [:ffff127.0.0.1]:8081 : users:(("java",pid=372597,fd=109))
i
hey, can you explain what are you trying to achieve? Maybe I can help but, Im missing context when looking at the this thread.
g
Sure, sorry So, I have this webhook under
_lakefs_actions
Copy code
name: Dataset
description: This webhook ensures that a DATASET.json should be present
on:
  pre-commit:
    branches:
      - "*"
hooks:
  - id: dataset_validator
    type: webhook
    description: Validate DATASET.json
    properties:
      url: "<http://127.0.0.1:8082/webhooks/format>"
and then I have a netty server running on localhost with Ktor, which is listening on the given address > 2024-03-05 140537.410 [main] INFO ktor.application - Autoreload is disabled because the development mode is off. > 2024-03-05 140537.977 [main] INFO ktor.application - Application started in 0.626 seconds. > 2024-03-05 140538.190 [DefaultDispatcher-worker-1] INFO ktor.application - Responding at http://127.0.0.1:8082
this setup worked fine on my local machine (port 8080), but on the remote production server it doesnt (port != 8080 because of MinIO)
Also, my local machine runs Ubuntu 23.10, the remote Debian 12
on pre-commit, lakeFS should contact the server on that address, but it fails, if I goes into "Actions" for more details
i
Got it thanks. It’s some sort of networking level issue related to either incorrect host or incorrect port in the production system. I see that the port in the error message is
8081
but the action file is targeting port
8082
- so maybe make sure ports are correct. Regarding the host mabe it’s
0.0.0.0
or something else depends how are the network interfaces are defined in your production environment. Or, maybe there’s no TCP / HTTP connection. I would test the following: Be on the same server / container the lakeFS server is running on. (i.e if I run
ps aux | grep -i lakefs
so you can see the process of the binary to make sure you share the same network host). Then run
curl <target address>
and see what happens?
g
yeah, sorry, after trying 8081, I tried 8082, same result, (both are in sync)
👍 1
curl 127.0.0.1:8082
returns nothing
without the port
Copy code
barbie15@lakefs:~$ curl 127.0.0.1
curl: (7) Failed to connect to 127.0.0.1 port 80 after 0 ms: Couldn't connect to server
o
is lakeFS and/or the webhook server running inside a container?
g
asking..
still podman container, sorry
thanks for helping 😛
and sorry for the noise 😕
i
1. re - curl what is the response when you use
-v
?
Copy code
curl -v <http://127:0>.0.0.1:8002
curl -v <http://127.0.0.1:8082/webhooks/format>
2. what do you get when running:
Copy code
telnet 127.0.0.1 8002
g
sorry, now I know what you meant, yes, I can see from the logs that the server actually reacted
Copy code
2024-03-05 14:26:32.830 [eventLoopGroupProxy-4-1] TRACE io.ktor.routing.Routing - Trace for []
/, segment:0 -> SUCCESS @ /
  /webhooks, segment:0 -> FAILURE "Selector didn't match" @ /webhooks
Matched routes:
  No results
Route resolve result:
  FAILURE "No matched subtrees found" @ /
2024-03-05 14:26:32.855 [eventLoopGroupProxy-4-1] TRACE i.k.s.p.c.ContentNegotiation - Skipping response body transformation from HttpStatusCode to OutgoingContent for the GET / request because the HttpStatusCode type is ignored. See [ContentNegotiationConfig::ignoreType].
2024-03-05 14:26:42.919 [eventLoopGroupProxy-4-2] TRACE io.ktor.routing.Routing - Trace for []
/, segment:0 -> SUCCESS @ /
  /webhooks, segment:0 -> FAILURE "Selector didn't match" @ /webhooks
Matched routes:
  No results
Route resolve result:
  FAILURE "No matched subtrees found" @ /
2024-03-05 14:26:42.920 [eventLoopGroupProxy-4-2] TRACE i.k.s.p.c.ContentNegotiation - Skipping response body transformation from HttpStatusCode to OutgoingContent for the GET / request because the HttpStatusCode type is ignored. See [ContentNegotiationConfig::ignoreType].
2024-03-05 15:39:22.833 [eventLoopGroupProxy-4-3] TRACE io.ktor.routing.Routing - Trace for [webhooks, format]
/, segment:0 -> SUCCESS @ /
  /webhooks, segment:1 -> SUCCESS @ /webhooks
    /webhooks/format, segment:2 -> SUCCESS @ /webhooks/format
      /webhooks/format/(method:POST), segment:2 -> FAILURE "Selector didn't match" @ /webhooks/format/(method:POST)
Matched routes:
  No results
Route resolve result:
  FAILURE "No matched subtrees found" @ /
2024-03-05 15:39:22.834 [eventLoopGroupProxy-4-3] TRACE i.k.s.p.c.ContentNegotiation - Skipping response body transformation from HttpStatusCode to OutgoingContent for the GET /webhooks/format request because the HttpStatusCode type is ignored. See [ContentNegotiationConfig::ignoreType].
Copy code
barbie15@lakefs:~$ curl -v <http://127.0.0.1:8082/webhooks/format>
*   Trying 127.0.0.1:8082...
* Connected to 127.0.0.1 (127.0.0.1) port 8082 (#0)
> GET /webhooks/format HTTP/1.1
> Host: 127.0.0.1:8082
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 405 Method Not Allowed
< Content-Length: 0
< 
* Connection #0 to host 127.0.0.1 left intact
i
sorry, the curl is GET and you see the error 405 ^ what happens when you run the same with
-X POST
which is what the lakeFS webhook action is doing.
g
Copy code
barbie15@lakefs:~$ curl -v -X POST <http://127.0.0.1:8082/webhooks/format>
*   Trying 127.0.0.1:8082...
* Connected to 127.0.0.1 (127.0.0.1) port 8082 (#0)
> POST /webhooks/format HTTP/1.1
> Host: 127.0.0.1:8082
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 415 Unsupported Media Type
< Content-Length: 0
< 
* Connection #0 to host 127.0.0.1 left intact
i
as you can see there’s some error with the request from the server you are using.
Connection #0 to host 127.0.0.1 left intact
But, what strange is that in your examples now you are using port
8082
but in the action file you supplied its
8002
, could it be related?
g
> but in the action file you supplied its
8002
is it? I see the right
8082
i
is it? I see the right
8082
(edited)
my bad 🙂
g
np
ah, I copy pasted your suggestion, that's why
are you still interested in the
telnet
output, though?
i
nope, it seems that there is something listening on this address
👍 1
so why is it returning. HTTP 415? How would lakeFS succeed if the curl failed. ?
When running
curl
are you able to get from the webhook sever logs of that request? If yes, when running the action in lakeFS do you see it in the server logs?
BTW somewhat related - in lakeFS a webhook server if the response: statusCode < 200 || statusCode >= 300 it’s considered an error.
g
How would lakeFS succeed if the curl failed. ?
No idea, but now I'm trying to stop the container running on a dedicated user
When running
curl
are you able to get from the webhook sever logs of that request?
yes, those that I pasted above
i
so to recap, even though lakeFS would error for the status code 415 you got in the CURL response, I don’t think it’s related to the original error message you shared. The message
Connection Refused
has two main causes: 1. Nothing is listening on the IP:Port you are trying to connect to. 2. The port is blocked by a firewall. From inspecting the ps aux gist you shared i noticed that there are pid’s running lakeFS in a container
825, 783
while also there’s a binary on the host (pid
1183
). That makes me wonder if you are running more then 1 lakeFS instances and the lakeFS that returns the connection refused error is not the one you think it is. • i would verify that it’s the same lakeFS that’s responding by directly inspecting the stdout/err logs of the lakeFS process with supported tooling either
tail
or other available tools. Then trigger an action and watch the logs. • If it’s inside a container that really depends on your network setup in production. • I would also look into
ifconfig
settings inspecting the network interfaces defined on the lakeFS host process. • Finally, try modifying the host address to maybe something else that is available on the host based on the network interfaces. i.e it could be
0.0.0.0
or some internal vpc address like
10.x.y.z
- that really depends on your production environment.
👍 1
g
thanks Isan, that's a very nice catch, I'll follow your suggestions
🙏 1
so, 1183 is the bash script launching lakefs, pid 783, which spawns a child at 825
to forward the needed port, I just added a new option
Copy code
#!/usr/bin/bash
podman run --replace --pull=newer\
  --name lakeFS \
  -p 8000:8000 \
  -p 8082:8082 \
however, when I start lakefs, the webserver crashes with the following
Exception in thread "main" java.net.BindException: Address already in use
if I stop lakefs, the server starts and listen as usual
as far as I got,
-p incoming:outgoing
forward the
incoming
port on the container on the
outgoing
port on the host
i
Looks like some sort of conflicts with ports on your server and (some Java error, not from lakeFS) servers with your environments. Since Im not familiar with your specific environment it’s hard to tell. I would recommend starting with lakeFS chart if possible (k8S), either way using real addresses of the server or domains and that would be easier to create the setup.
g
I would recommend starting with lakeFS chart if possible (k8S)
what is that?
i
lakeFS Helm chart