Hi guys, I want to query parquet file on S3 using ...
# help
n
Hi guys, I want to query parquet file on S3 using DuckDB, I used lakefs to partition the files and write on the S3. I setup the secrets like this
conn = duckdb.connect()
conn.execute("INSTALL httpfs")
conn.execute("LOAD httpfs")
conn.execute(_f_"SET s3_region='{aws_region}'")
conn.execute(_f_"SET s3_endpoint='{lakefs_endpoint}'")
conn.execute("SET s3_url_style='vhost'")
conn.execute(_f_"SET s3_access_key_id='{aws_access_key_id}'")
conn.execute(_f_"SET s3_secret_access_key='{aws_secret_access_key}'")
conn.execute("SET s3_use_ssl=false")
When I do something like :
query = "SELECT * FROM read_parquet('<s3://mys3/main/**/*.snappy.parquet>');"
resp = conn.execute(query).fetchall()
I am getting some error like
HTTP Error: HTTP GET error on '/mys3/?encoding-type=url&list-type=2&prefix=main%2F' (HTTP 403)
What could be the cause of this?
n
Hi @Nethsara My guess will be that you do not have the necessary permissions to perform the operation. If you have deployed lakeFS locally - look at the logs and see which request returned the 403. You can also the credentials, and which user and group it belongs to and see what kind of permission it has
o
do note that using lakefs as the s3 endpoint means passing a lakefs access key and secret (I see from what you shared that you're possibly using aws credentials instead)
🙏🏽 1