I'm trying to use the S3 gateway from R, but not h...
# help
r
I'm trying to use the S3 gateway from R, but not having much luck. • It connects without error, but returns no objects when listing 'buckets' or objects within a specified bucket. • The same lakeFS server works fine with boto and
aws
CLI doing the same thing • The lakeFS server log for the HTTP interaction is the same for the R (doesn't work) and Python (does work) • The R code works fine listing buckets and objects with Minio. • R library and S3 HTTP code Any pointers to what to check next? thanks. Full details in thread
## AWS CLI
Copy code
aws s3api list-buckets --endpoint-url <http://127.0.0.1:8000>
Copy code
json
{
    "Buckets": [
        {
            "Name": "example",
            "CreationDate": "2023-03-07T14:08:35.111000+00:00"
        }
    ],
    "Owner": {
        "DisplayName": "",
        "ID": ""
    }
}
## List buckets from boto
Copy code
import boto3

AWS_ACCESS_KEY_ID='AKIAJNSOLJH5YUW4KLBQ'
AWS_SECRET_ACCESS_KEY='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'

endpoint = '<http://localhost:8000>'

session = boto3.Session(
                aws_access_key_id=AWS_ACCESS_KEY_ID,
                aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
           )
s3 = session.resource('s3', endpoint_url=endpoint)

print(list(s3.buckets.all()))
Output
Copy code
[s3.Bucket(name='test-repo')]
Server log
Copy code
{
    "action": "list_repos",
    "file": "build/pkg/gateway/middleware.go:124",
    "func": "pkg/gateway.EnrichWithOperation.func1.1",
    "level": "debug",
    "message_type": "action",
    "msg": "performing S3 action",
    "ref": "",
    "repository": "",
    "time": "2023-03-07T13:47:17Z",
    "user_id": "admin"
}
{
    "client": "Boto3/1.26.78 Python/3.11.2 Darwin/22.2.0 Botocore/1.29.78 Resource",
    "file": "usr/local/go/src/net/http/server.go:2109",
    "func": "net/http.HandlerFunc.ServeHTTP",
    "host": "localhost:8000",
    "level": "debug",
    "log_audit": true,
    "method": "GET",
    "msg": "HTTP call ended",
    "operation_id": "list_buckets",
    "path": "/",
    "request_id": "81fb3769-dd20-468d-a889-a4cd8d43b7c2",
    "sent_bytes": 250,
    "service_name": "s3_gateway",
    "source_ip": "172.17.0.1:58854",
    "status_code": 200,
    "time": "2023-03-07T13:47:17Z",
    "took": 644584,
    "user": "admin"
}
## List buckets from R
Copy code
# Install packages
install.packages(c("aws.s3"))

# Load necessary libraries
library(aws.s3)
library(data.table)
library(ggplot2)

# Define AWS S3 credentials and endpoint
Sys.setenv("AWS_ACCESS_KEY_ID" = "AKIAIOSFODNN7EXAMPLE",
           "AWS_SECRET_ACCESS_KEY" = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
baseurl="localhost:8000"
b="drones03"

# List buckets
bucketlist(add_region = FALSE,base_url=baseurl,verbose=FALSE,use_https=FALSE)
# List objects in bucket
get_bucket_df(bucket=b, base_url=baseurl,add_region=FALSE,verbose=FALSE,use_https=FALSE)
Output: Note that nothing is returned for either command
Copy code
> # List buckets
> bucketlist(add_region = FALSE,base_url=baseurl,verbose=FALSE,use_https=FALSE)
data frame with 0 columns and 0 rows

> # List objects in bucket
> get_bucket_df(bucket=b, base_url=baseurl,add_region=FALSE,verbose=FALSE,use_https=FALSE)
[1] Key               LastModified      ETag              Size             
[5] Owner_ID          Owner_DisplayName StorageClass      Bucket           
<0 rows> (or 0-length row.names)
Server log
Copy code
{
    "action": "list_repos",
    "file": "build/pkg/gateway/middleware.go:124",
    "func": "pkg/gateway.EnrichWithOperation.func1.1",
    "level": "debug",
    "message_type": "action",
    "msg": "performing S3 action",
    "ref": "",
    "repository": "",
    "time": "2023-03-07T13:49:28Z",
    "user_id": "admin"
}
{
    "client": "libcurl/7.85.0 r-curl/5.0.0 httr/1.4.5",
    "file": "usr/local/go/src/net/http/server.go:2109",
    "func": "net/http.HandlerFunc.ServeHTTP",
    "host": "localhost:8000",
    "level": "debug",
    "log_audit": true,
    "method": "GET",
    "msg": "HTTP call ended",
    "operation_id": "list_buckets",
    "path": "/",
    "request_id": "3c5df6e2-0a0e-49bc-9d74-c3fd3dfcccde",
    "sent_bytes": 250,
    "service_name": "s3_gateway",
    "source_ip": "172.17.0.1:52904",
    "status_code": 200,
    "time": "2023-03-07T13:49:28Z",
    "took": 307500,
    "user": "admin"
}
## Prove R code works against another S3 implementation
Copy code
# Define Minio credentials
Sys.setenv("AWS_ACCESS_KEY_ID" = "minioadmin",
           "AWS_SECRET_ACCESS_KEY" = "minioadmin")
baseurl="localhost:9000"
b="example"

# List buckets
bucketlist(add_region = FALSE,base_url=baseurl,verbose=FALSE,use_https=FALSE)
# List objects in bucket
get_bucket_df(bucket=b, base_url=baseurl,add_region=FALSE,verbose=FALSE,use_https=FALSE)
Output
Copy code
> # List buckets
> bucketlist(add_region = FALSE,base_url=baseurl,verbose=FALSE,use_https=FALSE)
   Bucket             CreationDate
1 example 2023-03-07T14:08:18.427Z

> # List objects in bucket
> get_bucket_df(bucket=b, base_url=baseurl,add_region=FALSE,verbose=FALSE,use_https=FALSE)
    Key             LastModified                               ETag Size
1 dummy 2023-03-07T14:08:35.103Z "18ab9e2980f7223750c5ed4833f45dab"   70
                                                          Owner_ID
1 02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4
  Owner_DisplayName StorageClass  Bucket
1             minio     STANDARD example
>
does anyone have any insights to share on this? it's not urgent but it would be neat to figure out. thanks.
b
Think we should need an issue for this one - it may be related fact that R s3 package uses list objects v1. We should support it - but may have a bug there.
Interesting it seems that the issue is related to the content type we return for the s3 protocol
R checks for "application/xml":
Copy code
ctype <- httr::headers(r)[["content-type"]]
    if (is.null(ctype) || ctype == "application/xml"){
But currently for this request we return
text/xml
.
r
Thanks Barak. Issue logged here: https://github.com/treeverse/lakeFS/issues/5441