GK Palem
10/30/2023, 4:08 AMgang ye
10/30/2023, 6:04 AMstorage namespace already in use
.
I thought that if we use same postgres and S3 path to persist data and metadata, the lakefs server should be able to restore from previous deployment status. It seems that is not the case.
Is there any setting to avoid configure the lakefs server again during redeployment?Maxim
10/30/2023, 2:24 PMNicholas Junge
10/30/2023, 3:17 PMstat
-like object on file up-/downloads containing basic version info” plan re: the lakeFS server?
I saw that the RFC was merged, and that you commented on the idea. Basically for now, I can emulate this in the frontend by grabbing the object info as normal (via objects_api.stat_object
), and then basically doing the equivalent of git rev-parse $ref
with the requested revision.
The is object staged or committed
use case you asked for is e.g. integrity checking: Am I working with properly tracked/committed data, or a “dirty” untracked version staged on main by my colleague without my knowledge?
This would matter more when pulling data using raw branch names instead of commit SHAs, but from my experience, that’s quite common actually.HT
11/01/2023, 3:27 AMgoogle-chrome-stable --disable-web-security --user-data-dir='test'
) solve the issue: the image show up in LabelStudio web UI, thus we are confident that credentials are correct.
I am not super familiar with CORS. From what I understand, LakeFS S3 gateway need to be configure in someway in order for LabelStudio to show image that is not coming directly from itself ???
Anybody have any clue on this ?GK Palem
11/01/2023, 1:44 PMmain
branch always point to the latest version of all these datasets. What is the best way to organize this with LakeFS?Florentino Sainz
11/02/2023, 2:16 PMGK Palem
11/05/2023, 1:19 PMgit clone
for LakeFS to clone and manipulate the data locally?
I have setup a LakeFS in docker with Minio S3 endpoint as the object store. Uploaded some files into LakeFS through its UI. Now, I do not see any option to organize the uploaded files (such as creating folders) in LakeFS UI nor see any option to clone
the data to local machine to manipulate its hierarchy and commit back. What am I missing?
What is the recommended toolset or workflow to organize the data hierarchies in LakeFS? I have some files in LakeFS UI and now want to be able to re-organize them into different folders and commit as new branch.Andreas Fred-Ojala
11/06/2023, 10:46 AMFlorentino Sainz
11/08/2023, 10:10 AMAlex Treyvus
11/08/2023, 9:30 PMdocker run --pull always -p 8000:8000 -e LAKEFS_BLOCKSTORE_TYPE='s3' -e AWS_ACCESS_KEY_ID='<My Access Key ID>' -e AWS_SECRET_ACCESS_KEY='<My Secret Access Key>' treeverse/lakefs run --local-settings
I get this:
Using local-settings parameters configuration. This is suitable only for testing! It is NOT SUPPORTED for production.
time="2023-11-08T20:59:25Z" level=info msg="lakeFS run" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:91" version=1.1.0
time="2023-11-08T20:59:25Z" level=info msg="initialized Auth service" func=pkg/auth.NewAuthService file="build/pkg/auth/service.go:188" service=auth_service
time="2023-11-08T20:59:25Z" level=warning msg="Tried to to get AWS account ID for BI" func="pkg/cloud/aws.(*MetadataProvider).GetMetadata.func1" file="build/pkg/cloud/aws/metadata.go:81" error="operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: bcfe8cca-10c1-4bee-b9c8-db359a3bf938, api error InvalidClientTokenId: The security token included in the request is invalid."
I'm 100% sure that AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
I'm sending as parameters to docker are correct but I'm not sure how to include security token.Alex Treyvus
11/10/2023, 3:43 AMYaphet Kebede
11/15/2023, 8:34 PMYaphet Kebede
11/15/2023, 8:34 PMYaphet Kebede
11/15/2023, 8:37 PMAmin
11/17/2023, 3:08 PM성진영
11/20/2023, 12:44 AMNicholas Junge
11/21/2023, 8:16 AMAngela Bovo
11/21/2023, 6:39 PMYoni Augarten
11/22/2023, 8:04 AMErin Aho
11/23/2023, 1:39 PMMaaax Maaax
11/26/2023, 12:59 PMAdrian Rumpold
11/27/2023, 9:12 AMLinenBot
11/28/2023, 7:00 AMtrashcan for jys
joined #help.youtay
11/28/2023, 7:04 AM성진영
11/28/2023, 7:58 AM:/mnt
),
the same server that lakeFS docker container is operating.
sudo docker run \
-v /mnt:/home/lakefs/ext_data \
-p 8000:8000 \
-e LAKEFS_DATABASE_TYPE=postgres \
...
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="lakefs" \
-e LAKEFS_BLOCKSTORE_TYPE=local \
-e LAKEFS_BLOCKSTORE_LOCAL_PATH="~/lakefs" \
-e LAKEFS_BLOCKSTORE_LOCAL_IMPORT_ENABLED=true \
-e LAKEFS_BLOCKSTORE_LOCAL_ALLOWED_EXTERNAL_PREFIXES="/mnt" \
treeverse/lakefs:latest run
I can access all files in NAS via docker volume (:/home/lakefs/ext_data
)
but can't figure it out how to import files inside the LakeFS repository.Ryan Prasad
11/28/2023, 6:01 PMTal Sofer
11/29/2023, 7:49 AMGiacomo Matrone
11/30/2023, 7:32 AMAl
11/30/2023, 6:37 PMlakefsEndPoint = '<https://rntlj-151-236-193-133.a.free.pinggy.link/api>'
spark.conf.set("fs.lakefs.access.mode", "presigned")
spark.conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
spark.conf.set("fs.lakefs.access.key", f"{lakefsAccessKey}")
spark.conf.set("fs.lakefs.secret.key", f"{lakefsSecretKey}")
spark.conf.set("fs.lakefs.endpoint", f"{lakefsEndPoint}")
spark.conf.set("spark.databricks.delta.logStore.crossCloud.fatal", "false")
installed libraries in the Databricks on Azure
io.lakefs:hadoop-lakefs-assembly:0.2.1 Maven
io.lakefs:lakefs-spark-client-312-hadoop3_2.12:0.10.0 Maven
lakefs-client PyPI
Attempt to read file:
repo_name = "test-rep-1"
sourceBranch = "main"
dataPath = 'tst_3a.csv'
dataPath = f"lakefs://{repo_name}/{sourceBranch}/{dataPath}"
print(dataPath)
print(f"Reading CSV from {dataPath}")
df = spark.read.csv(dataPath)
df.show()
causes error:
java.io.IOException: statObject
Py4JJavaError: An error occurred while calling o480.csv.
: java.io.IOException: statObject
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:764)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1777)
...
Caused by: io.lakefs.hadoop.shade.sdk.ApiException: Message: Content type "text/html; charset=utf-8" is not supported for type: class io.lakefs.hadoop.shade.sdk.model.ObjectStats
HTTP response code: 200
HTTP response body: <!DOCTYPE html>
<html lang="en">
<head>
<!-- Generated with Vite-->
...
</html>
HTTP response headers: {accept-ranges=[bytes], cache-control=[no-cache, no-store, no-transform, must-revalidate, private, max-age=0], content-length=[480], content-type=[text/html; charset=utf-8], date=[Thu, 30 Nov 2023 12:52:33 GMT], expires=[Thu, 01 Jan 1970 00:00:00 GMT], pragma=[no-cache], x-accel-expires=[0], x-frame-options=[SAMEORIGIN]}
at io.lakefs.hadoop.shade.sdk.ApiClient.deserialize(ApiClient.java:925)
at io.lakefs.hadoop.shade.sdk.ApiClient.handleResponse(ApiClient.java:1127)
and when i try to write parquet to the repositore i have the same error:
fileName = 'tst_3.csv'
dataPath = 'tst_3_df'
df = spark.read.csv(f'/{source_data}/{fileName}')
df.write.format("csv").save(f"lakefs://{repo_name}/{sourceBranch}/{dataPath}")
What can be a mistake ? And I use this instructions https: // lakefs.io/blog/databricks-lakefs-integration-tutorial/ (for Azure storage) but it doesn`t work for me.