Kevin Vasko
01/30/2023, 7:06 PMMatija Teršek
02/02/2023, 3:32 PMupload to backing store: MissingRegion: could not find region configuration
I believe this can be simply solved by setting ``blockstore.s3.region``, but what should we set it to when we are not using AWS directly?Yaphet Kebede
02/02/2023, 8:53 PMConor Simmons
02/03/2023, 8:48 PMlakectl
when performing multiple imports?
Let's say I have S3 bucket 1 with cat.jpg at s3://cat_bucket/cat.jpg▾
s3://dog_bucket/dog.jpg▾
lakectl import --from <s3://cat_bucket/> --to <lakefs://repo/main/>
and manually merge _main_imported
• lakectl import --from <s3://dog_bucket/> --to <lakefs://repo/main/>
. Now, when I go to compare branches, I can see that cat.jpg is being removed if I merge
Output (not expected): only dog.jpg is now in mainConor Simmons
02/03/2023, 10:10 PMTemilola Onaneye
02/06/2023, 12:15 PM23/02/06 12:13:28 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: <lakefs://ragoldstandard/main/bronze_layer/sample.csv>.
java.io.IOException: statObject
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:779)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:46)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1777)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:571)
at jdk.internal.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.lakefs.hadoop.shade.api.ApiException: Content type "text/html; charset=utf-8" is not supported for type: class io.lakefs.hadoop.shade.api.model.ObjectStats
at io.lakefs.hadoop.shade.api.ApiClient.deserialize(ApiClient.java:822)
at io.lakefs.hadoop.shade.api.ApiClient.handleResponse(ApiClient.java:1020)
at io.lakefs.hadoop.shade.api.ApiClient.execute(ApiClient.java:944)
at io.lakefs.hadoop.shade.api.ObjectsApi.statObjectWithHttpInfo(ObjectsApi.java:1478)
at io.lakefs.hadoop.shade.api.ObjectsApi.statObject(ObjectsApi.java:1451)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:775)
Please can some helpΓιάννης Μαντιός
02/10/2023, 5:53 PMMatija Teršek
02/17/2023, 8:58 PMtime="2023-02-16T07:39:00Z" level=warning msg="error verifying credentials for key" func=pkg/gateway.AuthenticationHandler.func1 file="build/pkg/gateway/middleware.go:55" authenticator=sigv4 error=SignatureDoesNotMatch key=MASKED
Is it possible to see more detailed error of why this happened? Any thoughts what could cause that? CC @Conor Simmons for visibilitySanidhya Singh
02/22/2023, 4:18 AMimport pandas as pd
import os
import s3fs
class S3FileSystemPatched(s3fs.S3FileSystem):
def __init__(self, *k, **kw):
super(S3FileSystemPatched, self).__init__(*k,
key = os.environ["AWS_ACCESS_KEY_ID"],
secret = os.environ["AWS_SECRET_ACCESS_KEY"],
client_kwargs={'endpoint_url': os.environ["AWS_S3_ENDPOINT"]},
**kw)
print('S3FileSystem is patched')
s3fs.S3FileSystem = S3FileSystemPatched
data = pd.read_csv("<s3://example/master/test.csv>")
it throws FileNotFoundError: example/master/test.csv
Jonas
02/22/2023, 9:30 AMKevin Vasko
02/22/2023, 9:53 PMKevin Vasko
02/22/2023, 11:25 PMAdrian Rumpold
02/28/2023, 9:01 AMgateways.s3.fallback_url
setting, but haven't been able to figure out exactly how to use it. Is there any documentation besides the configuration reference guide?
In particular, I'm unsure how to pass authentication credentials to the fallback backend: If I specify credentials that don't exist in lakeFS but are valid on the fallback S3 endpoint, I get an InvalidAccessKeyId
error returned by the aws s3
command. If I specify the lakeFS credentials and try to access a bucket that would be handled by the fallback case, I get an AccessDenied
error.
Do the credentials need to be identical for both the lakeFS S3 endpoint and the fallback?
Thanks for your help! 🙏Robin Moffatt
03/01/2023, 4:01 PMJonas
03/02/2023, 9:48 AMinsufficient permissions: uM: insufficient permissions at ot (<http://localhost:8090/assets/index-7bf1aa94.js:71:93127>) at async T$.list (<http://localhost:8090/assets/index-7bf1aa94.js:71:100206>) at async <http://localhost:8090/assets/index-7bf1aa94.js:71:111901>
when accessing http://localhost:8090/repositories. What am I missing?
EDIT: I can restrict write access, my issue though is that I don't even want the user to see repos he has no access toMonde Sinxi
03/02/2023, 3:25 PM[lakefs]
env_auth = false
type = s3
provider = Other
endpoint = <http://lakefs:8000>
region = use-east-1
secret_access_key = $SECRET_ACCESS_KEY
access_key_id = $ACCESS_KEY_ID
force_path_style = true
no_check_bucket = true
I have to use Other
as a provider because rclone will override force_path_style
with false
if I use the AWS
provider as suggested on the docs.
If I then run:
rclone ls -vv --dump=bodies lakefs:repo/branch/path
I get the following error
2023/03/02 17:11:13 Failed to lsd with 2 errors: last error was: MissingFields: Missing fields in request.
status code: 400, request id: , host id:
Anyone have any idea how I go about resolving this?
``````Robin Moffatt
03/02/2023, 5:35 PMdrones.duckdb> SET s3_endpoint='<http://rmoff-test.us-east-2.lakefscloud.io|rmoff-test.us-east-2.lakefscloud.io>';
0 rows in set
Time: 0.000s
drones.duckdb> SELECT * FROM duckdb_settings() where name like 's3_e%';
+-------------+------------------+------------------------------------------+------------+
| name | value | description | input_type |
+-------------+------------------+------------------------------------------+------------+
| s3_endpoint | <http://s3.amazonaws.com|s3.amazonaws.com> | S3 Endpoint (default '<http://s3.amazonaws.com|s3.amazonaws.com>') | VARCHAR |
+-------------+------------------+------------------------------------------+------------+
1 row in set
Time: 0.006s
Maybe this is more a duckdb question than lakeFS, but interested if anyone has seen thisRobin Moffatt
03/03/2023, 6:02 PMD export database '<s3://drones03/main/drone-registrations/>' (format parquet);
94% ▕████████████████████████████████████████████████████████▍ ▏ Error: Invalid Error: IO Error: Unexpected response during S3 multipart upload finalization
It works just fine writing to a branch - I'm just interested in where to follow up re. improving the errorRobin Moffatt
03/06/2023, 10:24 AMaws s3
I can connect to my lakeFS just fine using the S3 gateway:
$ aws s3 --endpoint-url <http://127.0.0.1:8000> ls <s3://drones03/main/>
2023-03-06 09:59:24 119663 Registations-P107-Active-2017.parquet
But if I use DuckDB it errors, I think because it is forcing the connection to HTTPS (which it's not)
D SET s3_endpoint='<http://127.0.0.1:8000>';
D select * from read_parquet('<s3://drones03/main/Registations-P107-Active-2017.parquet>');
Error: Invalid Error: IO Error: Connection error for HTTP HEAD to '<https://http>://127.0.0.1:8000/drones03/main/Registations-P107-Active-2017.parquet'
D SET s3_endpoint='127.0.0.1:8000';
D select * from read_parquet('<s3://drones03/main/Registations-P107-Active-2017.parquet>');
Error: Invalid Error: IO Error: SSLConnection error for HTTP HEAD to '<https://127.0.0.1:8000/drones03/main/Registations-P107-Active-2017.parquet>'
Is there a way around this?Barak Amar
Robin Moffatt
03/06/2023, 12:46 PMfs:*
(neither allow nor deny), is that the same as granting deny
?
Put another way, if I only want to allow certain actions, do I need to deny
the others, or can I simply allow
the ones that I want?Robin Moffatt
03/07/2023, 2:46 PMaws
CLI doing the same thing
• The lakeFS server log for the HTTP interaction is the same for the R (doesn't work) and Python (does work)
• The R code works fine listing buckets and objects with Minio.
• R library and S3 HTTP code
Any pointers to what to check next? thanks.
Full details in threadNatasha Taylor
03/08/2023, 11:52 AMTemilola Onaneye
03/08/2023, 7:05 PMMiguel Rodríguez
03/09/2023, 6:46 AM500 Internal Server Error
that I reported as a bug some time ago and was already supposed to be solved.
I have just added the new details with the error as a new comment in the related Github Issue: https://github.com/treeverse/lakeFS/issues/4909
Can someone help me reopening the issue?Alexander Reinthal
03/11/2023, 10:43 PMsudo docker run \
--name lakefs \
-p 80:8000 \
-e LAKEFS_DATABASE_TYPE="postgres" \
-e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING=$LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY=$LAKEFS_AUTH_ENCRYPT_SECRET_KEY \
-e LAKEFS_BLOCKSTORE_TYPE="azure" \
-e LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCOUNT=$LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCOUNT \
-e LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCESS_KEY=$LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCESS_KEY \
and I get the following error
time="2023-03-11T22:41:44Z" level=error msg="failed to get azure blob from container &{%!s(*generated.ContainerClient=&{<https://lakefs11673f4e.blob.core.windows.net/mds> {[0x1400120 0xc008045c00 0xc0080b01c0 0xc008077770 0xc00804ff38 0x13ffda0 0x13ffaa0 {0xc000159350}]}}) %!s(*exported.SharedKeyCredential=<nil>)} key &{%!s(*generated.BlobClient=&{<https://lakefs11673f4e.blob.core.windows.net/mds/dummy> {[0x1400120 0xc008045c00 0xc0080b01c0 0xc008077770 0xc00804ff38 0x13ffda0 0x13ffaa0 {0xc000159350}]}}) %!s(*generated.BlockBlobClient=&{<https://lakefs11673f4e.blob.core.windows.net/mds/dummy> {[0x1400120 0xc008045c00 0xc0080b01c0 0xc008077770 0xc00804ff38 0x13ffda0 0x13ffaa0 {0xc000159350}]}}) %!s(*exported.SharedKeyCredential=<nil>)}" func="pkg/logging.(*logrusEntryWrapper).Errorf" file="build/pkg/logging/logger.go:262" error="DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tEnvironmentCredential: missing environment variable AZURE_TENANT_ID\n\tManagedIdentityCredential: no default identity is assigned to this resource\n\tAzureCLICredential: Azure CLI not found on path" host=68.219.229.172 method=POST operation_id=CreateRepository path=/api/v1/repositories request_id=94bca5e0-f1e3-49e0-ab44-20313230bbdf service_name=rest_api user=alex
time="2023-03-11T22:41:44Z" level=warning msg="Could not access storage namespace" func="pkg/api.(*Controller).CreateRepository" file="build/pkg/api/controller.go:1406" error="DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tEnvironmentCredential: missing environment variable AZURE_TENANT_ID\n\tManagedIdentityCredential: no default identity is assigned to this resource\n\tAzureCLICredential: Azure CLI not found on path" reason=unknown service=api_gateway storage_namespace="<https://lakefs11673f4e.blob.core.windows.net/mds>"
Reading the above, it looks like lakefs tries to use my storage access key, then fails and tries to fall back on Service principal credentials.
I verified that shared key works using python. What should I do?Adi Polak
03/12/2023, 1:44 PMdocker ps
.
I've noticed that there is a lakeCTL command in the lakefs-setup -> && lakectl repo create <lakefs://example>
<s3://example>
any idea if the things are connected? or how to fix it?
lakefs-setup:
image: treeverse/lakefs:latest
container_name: lakefs-setup
depends_on:
- postgres
- minio-setup
environment:
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
- LAKEFS_DATABASE_CONNECTION_STRING=<postgres://lakefs:lakefs@postgres/postgres?sslmode=disable>
- LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
- LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- LAKECTL_SERVER_ENDPOINT_URL=<http://lakefs:8000>
- LAKEFS_BLOCKSTORE_TYPE=s3
entrypoint: ["/app/wait-for", "postgres:5432", "--", "sh", "-c",
"lakefs setup --user-name docker --access-key-id AKIAIOSFODNN7EXAMPLE --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY && lakectl repo create <lakefs://example> <s3://example>"
]
Robin Moffatt
03/15/2023, 6:58 AMRobin Moffatt
03/15/2023, 12:30 PMbranch revert
docs and trying to do a revert, but hitting an unclear error.
I want to rollback the last commit (ideally without having to look up its specific ID). Is that possible?
$ lakectl branch revert <lakefs://quickstart/main> HEAD~1
Branch: <lakefs://quickstart/main>
Are you sure you want to revert the effect of commits HEAD~1: y
get commit from ref HEAD~1: not found
404 Not Found