Hi there I encountered an issue and I was wondering if you h lakeFS #help

Hi there, I encountered an issue, and I was wonder...

jwshin

01/05/2024, 6:00 AM

Hi there, I encountered an issue, and I was wondering if you have a moment to help? The problem I'm facing is as follows: I uploaded sample data of more than 5MB to the LakeFS Repository, and I encountered an "s3 error: missing etag" message. It seems that adding the etag option to the CORS header should resolve this, but I'm not quite sure how to do it. Could you provide some guidance or a guide on how to address this?

Jonathan Rosenberg

01/05/2024, 7:26 AM

Hi @jwshin , Let me have a look… In the meantime, did you enable the pre-signed url option in lakeFS?

jwshin

01/05/2024, 7:28 AM

You're asking if they are inquiring about whether you have enabled the pre-signed URL option as mentioned in the guide, correct? https://docs.lakefs.io/reference/security/presigned-url.html

Jonathan Rosenberg

01/05/2024, 7:28 AM

If you enabled it at all

Jonathan Rosenberg

01/05/2024, 7:33 AM

and also what is the client you use to upload the object?

jwshin

01/05/2024, 7:36 AM

"I didn't enable it separately. The configurations I've set are identical to the attached config file below. The client used to upload objects is Ceph."

Copy code

---
logging:
  format: json
  level: DEBUG
  output: "-"

database:
  type: "postgres"
  postgres:
    connection_string: "<postgres://lakefs:pass@10.3.1.151:5432/lakefs>"

blockstore:
  type: s3
  s3:
    force_path_style: true
    endpoint: <http://10.3.1.153:8080>
    discover_bucket_region: false
    credentials:
      access_key_id: P4MSQ29IG3M5GZ9PA68C
      secret_access_key: eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP

jwshin

01/05/2024, 7:39 AM

"Could I set

blockstore.s3.disable_pre_signed_ui=true

to enable

blockstore.blockstore-name.disable_pre_signed_ui

as recommended in the guide?"

Copy code

podman run --name lakefs-2 -d -p 8000:8000 \
           -e LAKEFS_BLOCKSTORE_TYPE=s3 \
           -e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true \
           -e LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://10.3.1.153:8080> \
           -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=P4MSQ29IG3M5GZ9PA68C\
           -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP \
           -v /mnt/lakefs/lakefs.yaml:/etc/lakefs/config.yaml \
           treeverse/lakefs:latest \
           run --local-settings

Jonathan Rosenberg

01/05/2024, 7:41 AM

blockstore.s3.disable_pre_signed_ui=true

is the default value so you don’t need to explicitly set it. Ok, let me take a look

jwshin

01/05/2024, 7:43 AM

Thank you. I'll wait for your response.

Jonathan Rosenberg

01/05/2024, 7:46 AM

👍

👍 1

jwshin

01/05/2024, 8:03 AM

I have confirmed that the upload is working correctly using the

lakefs-client-upload.py

script. However, when using the LakeFS UI, I encounter the

"s3 error: missing ETag"

error, and it seems that the upload is not happening to the LakeFS Repository. Uploading through the UI results in data being stored in the Ceph S3 bucket, but it does not get uploaded to the LakeFS Repository. Please review and confirm.

Copy code

import boto3

# LakeFS S3 엔드포인트 및 자격 증명 설정
lakefs_endpoint = "<http://10.3.1.151:8000>"
lakefs_access_key = "AKIAJPMCBNMMSR4RIR2Q"
lakefs_secret_key = "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"

# S3 클라이언트 생성
s3 = boto3.client(
    's3',
    endpoint_url=lakefs_endpoint,
    aws_access_key_id=lakefs_access_key,
    aws_secret_access_key=lakefs_secret_key,
)

# 업로드할 파일 정보
bucket_name = 'tes444'  # LakeFS 버킷 이름
object_key = 'main/test-file2.dmg'  # 업로드할 파일의 S3 객체 키(branchName+/+filePath)
file_path = '/Users/jwshin/test-file2.dmg'  # 업로드할 파일 경로

# Multipart 업로드 시작
upload_id = s3.create_multipart_upload(Bucket=bucket_name, Key=object_key)['UploadId']

# 파일을 여러 파트로 나누어 업로드
part_number = 1
etags = []
with open(file_path, 'rb') as file:
    while True:
        data = file.read(5 * 1024 * 1024)  # 5MB 단위로 나누어 업로드
        if not data:
            break
        response = s3.upload_part(
            Bucket=bucket_name,
            Key=object_key,
            PartNumber=part_number,
            UploadId=upload_id,
            Body=data,
        )
        print(f"Uploaded Part {part_number}")
        etags.append(response['ETag']) # ETag 는 Part 별로 생성됨
        part_number += 1

# Multipart 업로드 완료
parts = [{'PartNumber': i, 'ETag': etags[i-1]} for i in range(1, part_number)]

for obj in parts:
    print(f"- PartNumber: {obj['PartNumber']}, ETag: {obj['ETag']}")

s3.complete_multipart_upload(
    Bucket=bucket_name,
    Key=object_key,
    UploadId=upload_id,
    MultipartUpload={'Parts': parts},
)

print(f"File uploaded to {lakefs_endpoint}/{bucket_name}/{object_key}")

Barak Amar

01/05/2024, 8:25 AM

Hi @jwshin, the python script uses lakefs's s3 protocol support, while the UI uses lakeFS API to upload the data. The lakefs log and the UI network tab may hold more information about the reason we fail to upload. From the above description it looks like lakeFS manages to upload the data in this case, but doesn't get the response format from the storage and fail to extract ETag when the upload completes. Can you write down what you see in lakeFS log while trying to upload the file from the UI?

jwshin

01/05/2024, 8:35 AM

Hi @Barak Amar, Where should I submit the logs to LakeFS?

Barak Amar

01/05/2024, 8:43 AM

here, just a couple of lines from the time you uploaded the file

155 Views

Open in Slack

Previous Next