Hello, I'm struggling with using the AWS cli to a...
# help
u
Hello, I'm struggling with using the AWS cli to access a repository in LakeFS. I am quite certain that either I'm missing something or my setup/deployment is causing this. But I can't seem to figure it out and maybe you can help me. :) I have deployed LakeFS in a kubernetes cluster in GCP and configured the storage, created a repo and uploaded a file. So far so good. The part where I've complicated things a bit is the networking in kubernetes. I have registered two DNS names, lets call them
<http://lakefs.example.com|lakefs.example.com>
and
<http://lakefs.portal.example.com|lakefs.portal.example.com>
. If you access
<http://lakefs.portal.example.com|lakefs.portal.example.com>
it authenticates you with GCP and then forwards you to the portal. It also adds https to the connection. Works well. If you go to the
<http://lakefs.example.com|lakefs.example.com>
it drops the connection if the url doesn't contain "/api". Since all portal access should go to the other endpoint. It also adds https to the connection. I've configured the lakectl command and put the endpoint_url to
<https://lakefs.example.com/api/v1>
. And the command works. I can run
lakectl fs ls <lakefs://testrepo/main/>
and get the correct listing. Then I tried to use the aws cli. I created a profile called lakefs with the key and secret. And then I called it like this:
aws --profile lakefs --endpoint-url <https://lakefs.example.com/> s3 ls <s3://testrepo/main/>
And I get the following error:
An error occurred (404) when calling the ListObjectsV2 operation: Not found
I figured it was the wrong endpoint so I changed it to
<https://lakefs.example.com/api/v1>
And then I get this error:
An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
So my questions are: • First of all, "should" this work? :) • Second, what should the endpoint_url be? ◦ When I look at the architecture diagram I guess the S3 requests goes to the S3 Gateway? ◦ And my setup right now is dropping those requests? Please let me know if you have any suggestions or need any info. :) //Mattias
u
Hey @mwikstrom, Thanks for the detailed question. First of all this should work and the endpoint_url should be
<https://lakefs.example.com/>
u
One thing I don’t understand is why lakefs.example.com drops the connection when it doesn’t contain
/api
u
lakeFS listens and serves: • openAPI - under route api • S3 gateway - Identified by S3 headers • lakeFS UI - If Non of the above By blocking requests without 
/api
 you are blocking all S3 gateway requests as well
u
You could use an additional dns record for S3 gateway (e.g
<http://s3.lakefs.example.com|s3.lakefs.example.com>
*.<http://s3.lakefs.example.com|s3.lakefs.example.com>
) https://docs.lakefs.io/setup/virtual-host-addressing.html
u
Thank you for your answer. :) Like I said, I had the suspicion that the way I had it setup was my problem. 🙂 I hadn't thought about the S3 request, I had just tested the lakectl CLI and as I understand it that talks to the API. So dropping the request not going to /api was a way of "forcing" the portal access through the other route (lakefs.portal.example.com). And the reason for that was that I wanted to add the extra authentication for the UI. But you confirmed my thinking. Even though it stated in the documentation sometimes it's hard to get the full picture by just reading it. So thank you for that. 🙂 I'll see if/how I can modify the setup and get it working. 🙂
u
Happy to help, It sure is hard to get the full picture, especially with advanced topics like S3 virtual-host addressing. Feel free to ask here any further questions simple smile
u
I'll look into the S3 Virtual-host addressing. But just for my curiosity, which "S3 headers" does it use to identify the request? Just wanted to try filtering och that to see if it works. 🙂
u
Hi @mwikstrom we verify that a request is signed AWS Signature v4 or v2. There are couple of steps for the verification, ex: X-Amz-Signature header or Authorization header with value that prefix with AWS4 or AWS for v2. There is also an option to pass the information (X-Amz-Signature) in query parameter. lakeFS verify the above before passing it for full processing by the S3 gateway.
u
If you enable path style access and set a DNS record with the same address to access the lakeFS S3 gateway as suggested it will probably simplify the check as it will just match the host.
u
Awesome, thanks! 🙂