Michael Gaebel
08/24/2023, 1:53 PMAn error occurred while calling o128.getDynamicFrame. io/lakefs/iceberg/LakeFSCatalog has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
I've used the same libraries listed in the setup link above (lakefs-iceberg:0.1.2, iceberg-spark-runtime-3.3_2.12:1.3.0), and I've also tried using the same spark runtime version that was listed as a test dependency for io.lakefs:lakefs-iceberg:0.1.2
in Maven. Am I missing something obvious, or is there a version of the lakefs-iceberg lib that was compiled with Java runtime 52.0?Rich McLaughlin
08/24/2023, 8:16 PMHT
08/24/2023, 11:57 PMlakefs_client
python sdk:
File "/path/lakefs_helper.py", line 102, in merge
return self.lakefs.refs.merge_into_branch(
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/api/refs_api.py", line 869, in merge_into_branch
return self.merge_into_branch_endpoint.call_with_http_info(**kwargs)
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/api_client.py", line 835, in call_with_http_info
return self.api_client.call_api(
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/api_client.py", line 409, in call_api
return self.__call_api(resource_path, method,
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/api_client.py", line 203, in __call_api
raise e
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/api_client.py", line 196, in __call_api
response_data = self.request(
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/api_client.py", line 455, in request
return <http://self.rest_client.POST|self.rest_client.POST>(url,
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/rest.py", line 267, in POST
return self.request("POST", url,
File "/path/venv/lib64/python3.10/site-packages/lakefs_client/rest.py", line 224, in request
raise ServiceException(http_resp=r)
lakefs_client.exceptions.ServiceException: (503)
Reason: Service Unavailable
HTTP response headers: HTTPHeaderDict({'content-length': '118', 'content-type': 'text/plain', 'date': 'Thu, 24 Aug 2023 23:47:57 GMT'})
HTTP response body: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection termination
My helper function that do the merge is:
def merge(self, source_branch, dest_branch, metadata={}, message=None):
try:
# Check if there is a need to merge:
diffs = self.diff(left_ref=dest_branch, right_ref=source_branch)
if len(diffs) == 0:
print(f"No changes detected. Skipping merge {source_branch} to {dest_branch}")
return
if message:
merge = models.Merge(message=message, metadata=metadata)
else:
merge = models.Merge(metadata=metadata)
return self.lakefs.refs.merge_into_branch(
repository=self.repo_name,
source_ref=source_branch,
destination_branch=dest_branch,
merge=merge,
)
except Exception as e:
# traceback.print_exc()
print(f"ERROR: Failed to merge {source_branch} to {dest_branch}")
raise e
With self.lakefs as :
configuration = lakefs_client.Configuration()
configuration.username = conf["access_key_id"]
configuration.password = conf["secret_access_key"]
configuration.host = conf["endpoint"]
self.lakefs = LakeFSClient(configuration)
But then if I do the merge of that 2 exact same branch via the UI, it succeed !
Our self deployed server running : 107.0
And the lakefs_client is 0.107.0Bertrand Gallice
08/28/2023, 9:10 AM$ lakectl import \
--from <s3://jaffle_shop_repo/raw_payments.csv> \
--to <lakefs://jaffle-shop-repo/main/raw_payments.csv>
Import failed: import error: error on ingest: NoSuchKey:
status code: 404, request id: tx00000000000000a72c046-0064ec5ed8-b6658b2f-fra1b, host id:
the Lakefs repo uses <s3://jaffle_shop_repo>
as the storage namespace, and is located at the root of the S3 bucket.
It doesnāt seem like a credentials problem, since the lakectl fs upload
command works to add data in the repo, and write this data in the S3 storage repoās folder.
I suspect that itās just a uri path problem, tried several variations but none seem to work
Any idea on what could be wrong?Taha Sadiki
08/29/2023, 12:27 PMDieu M. Nguyen
08/29/2023, 9:01 PMto_zarr()
step is getting errors.
import dask.array as da
import xarray as xr
import numpy as np
configuration = lakefs_client.Configuration()
configuration.username = access_key_id
configuration.password = secret_access_key
configuration.host = endpoint_url
client = LakeFSClient(configuration)
repo = "zarr-test"
branch = "zarr-store"
client.branches.create_branch(
repository=repo,
branch_creation=models.BranchCreation(
name=branch,
source="main"))
# Create random array
state = da.random.RandomState(1234)
shape = (180, 360, 400)
chunk_shape = (36, 72, 200)
nlats, nlons, ntimes = shape
arr = state.random(shape, chunks=chunk_shape)
arr = da.random.random(shape, chunks=chunk_shape)
ds = xr.Dataset(
data_vars={
"precipitation": xr.DataArray(arr, dims=('lat', 'lon', 'time'))
},
coords={
"lat": xr.DataArray(np.linspace(-90, 90, num=nlats, endpoint=False), dims='lat'),
"lon": xr.DataArray(np.linspace(-180, 180, num=nlons, endpoint=False), dims='lon'),
"time": xr.date_range(start="2000-06-01", freq="D", periods=ntimes)
},
attrs={
"description": "GPM IMERG test dataset"
}
)
# Write the first 200 time slices
ds_0 = ds.isel(time=slice(0, 200))
s3a_gateway_path = f's3a://{repo}/{branch}/precipitation_data.zarr'
task = ds_0.to_zarr(s3a_gateway_path,
# zarr_version=3,
mode='w',
compute=False)
Without zarr_version
, I get PermissionError: Access Denied
. If I set zarr_version=3
, I get KeyError: 'zarr.json'
. Maybe I am setting the s3a_gateway_path
incorrectly?lingyu zhang
08/31/2023, 12:18 PMsizeBytes
and modifiedTime
of a file when comparing local files with remote commits. However, I believe there may be some risks associated with this approach. For instance, if two clients have different system times and both edit the same file, let's say a.txt
, the sizeBytes
and modifiedTime
could be the same on both clients, but the contents are different. Consequently, when I commit the changes on client A and then pull from the remote on client B, the modifications could be lost. So here are my questions:
1. Do you have any evidence or user cases to prove this situation is rare?
2. What is the probability of encountering this risk? Have any tests been conducted?
3. Why do we only use second-level precision for the modifiedTime
and not something more precise like nanoseconds? (Unix time in nanoseconds cannot be represented by an int64 for dates only prior to the year 1678 or after 2262)
Thanks a lot! :)Justin Pottenger
09/02/2023, 5:52 PMINFO [2023-09-02T17:23:06Z]build/pkg/auth/service.go:189 pkg/auth.NewAuthService initialized Auth service service=auth_service
WARNING[2023-09-02T17:23:06Z]build/pkg/cloud/aws/metadata.go:64 pkg/cloud/aws.(*MetadataProvider).GetMetadata.func1 Tried to to get AWS account ID for BI error="InvalidParameterValue: Unsupported action GetCallerIdentity\n\tstatus code: 400, request id: 178124CA4D3D18C0"
Digging into the code it appears that the AWS package is trying to find the email of the AWS account, and since MinIO doesnāt have one, it fails. Relevant code is here and here.
Before I fork LakeFS and try to dig in myself I thought I would see if anyone here had any ideas on a more elegant solution (aka supported) solution?
TLDR: LakeFS + Minio + Headless = not supported?Florentino Sainz
09/04/2023, 8:24 AMTruce Wallace
09/04/2023, 9:47 PMOh my! An error occurred: Unauthorized
when trying to registerTruce Wallace
09/05/2023, 8:29 AMDieu M. Nguyen
09/05/2023, 8:37 PMspark-submit
command as directed by the documentation. As far as I can tell, the command ran and finished without errors. But in S3, I donāt see the list of objects removed in _lakefs/retention/gc/unified/<RUN_ID>/deleted/
and my storage didnāt go down, so I assume objects from old versions havenāt been deleted. Do you have any ideas about this? Note: I just set the GC policy after already writing all my versions - Is this why?HT
09/06/2023, 1:31 AMYaphet Kebede
09/06/2023, 5:09 PMPrakash Kumar
09/08/2023, 4:11 PMPrakash Kumar
09/08/2023, 10:48 PMPrakash Kumar
09/08/2023, 10:48 PMTongguo Pang
09/10/2023, 1:14 PMNatanael de Sousa Neto
09/11/2023, 1:40 PMCristian Caloian
09/12/2023, 9:30 AMaws --profile lakefs s3 ls s3://<bucket>/<repo> --recursive
. I have configured the lakefs
profile with credentials from LakeFS, and the corresponding endpoint url. I also have AuthFullAccess
, FSFullAccess
and RepoManagementFullAccess
permissions. However, I still get the following error
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Forbidden
Do you have an idea what I am missing?Nicolas Gibaud
09/17/2023, 3:09 PMSELECT * FROM READ_PARQUET('<lakefs://quickstart/main/lakes.parquet>');
I got a error Error: Invalid Input Error: No magic bytes found at end of file '<lakefs://quickstart/main/lakes.parquet>'
Do you know what causes this and how to fix ?HT
09/19/2023, 1:08 AMf1
in commit c1
then I delete f1
in commit c2
Further commit happen unrelated to f1
My branch head is at `c8`(and without f1
)
f1
is commited file but not in the branch head. Does it mean that it will be deleted if I run garbage collection one year later (eg: beyond any retention rule) ?Neil Murray
09/19/2023, 10:53 AMSivan Bercovici
09/19/2023, 1:18 PMAriel Shaqed (Scolnicov)
09/19/2023, 1:41 PMIs there any limit to the size to file uploaded via web client?There should be no limit. Obviously for more than a few hundred MiB you'll want to do something more reliable; lakeFS is also accessible using the S3 protocol, where you can even use multipart uploads.
Ariel Shaqed (Scolnicov)
09/19/2023, 1:44 PMIs there any way to label or attach properties to data files (other than tagging)?Yes, but! Objects support user metadata, this is a tested part of the API. However there is no current UI support to access it. Currently it only gets used to store MIME type. But if you need it, please make sure to ask us for it!
Ariel Shaqed (Scolnicov)
09/19/2023, 1:45 PMAny limit to number of concurrent users?Nope. I don't think the lakeFS server even has a concept of "active users", other than just caching some properties.
Eirik KnƦvelsrud
09/20/2023, 9:59 AMHT
09/21/2023, 1:33 AMGary Mclaughlin
09/21/2023, 2:36 PM