https://lakefs.io/ logo
#help
Title
# help
j

jwshin

01/05/2024, 6:00 AM
Hi there, I encountered an issue, and I was wondering if you have a moment to help? The problem I'm facing is as follows: I uploaded sample data of more than 5MB to the LakeFS Repository, and I encountered an "s3 error: missing etag" message. It seems that adding the etag option to the CORS header should resolve this, but I'm not quite sure how to do it. Could you provide some guidance or a guide on how to address this?
j

Jonathan Rosenberg

01/05/2024, 7:26 AM
Hi @jwshin , Let me have a look… In the meantime, did you enable the pre-signed url option in lakeFS?
j

jwshin

01/05/2024, 7:28 AM
You're asking if they are inquiring about whether you have enabled the pre-signed URL option as mentioned in the guide, correct? https://docs.lakefs.io/reference/security/presigned-url.html
j

Jonathan Rosenberg

01/05/2024, 7:28 AM
If you enabled it at all
and also what is the client you use to upload the object?
j

jwshin

01/05/2024, 7:36 AM
"I didn't enable it separately. The configurations I've set are identical to the attached config file below. The client used to upload objects is Ceph."
Copy code
---
logging:
  format: json
  level: DEBUG
  output: "-"

database:
  type: "postgres"
  postgres:
    connection_string: "<postgres://lakefs:pass@10.3.1.151:5432/lakefs>"

blockstore:
  type: s3
  s3:
    force_path_style: true
    endpoint: <http://10.3.1.153:8080>
    discover_bucket_region: false
    credentials:
      access_key_id: P4MSQ29IG3M5GZ9PA68C
      secret_access_key: eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP
"Could I set
blockstore.s3.disable_pre_signed_ui=true
to enable
blockstore.blockstore-name.disable_pre_signed_ui
as recommended in the guide?"
Copy code
podman run --name lakefs-2 -d -p 8000:8000 \
           -e LAKEFS_BLOCKSTORE_TYPE=s3 \
           -e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true \
           -e LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://10.3.1.153:8080> \
           -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=P4MSQ29IG3M5GZ9PA68C\
           -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP \
           -v /mnt/lakefs/lakefs.yaml:/etc/lakefs/config.yaml \
           treeverse/lakefs:latest \
           run --local-settings
j

Jonathan Rosenberg

01/05/2024, 7:41 AM
blockstore.s3.disable_pre_signed_ui=true
is the default value so you don’t need to explicitly set it. Ok, let me take a look
j

jwshin

01/05/2024, 7:43 AM
Thank you. I'll wait for your response.
j

Jonathan Rosenberg

01/05/2024, 7:46 AM
👍
👍 1
j

jwshin

01/05/2024, 8:03 AM
I have confirmed that the upload is working correctly using the
lakefs-client-upload.py
script. However, when using the LakeFS UI, I encounter the
"s3 error: missing ETag"
error, and it seems that the upload is not happening to the LakeFS Repository. Uploading through the UI results in data being stored in the Ceph S3 bucket, but it does not get uploaded to the LakeFS Repository. Please review and confirm.
Copy code
import boto3

# LakeFS S3 엔드포인트 및 자격 증명 설정
lakefs_endpoint = "<http://10.3.1.151:8000>"
lakefs_access_key = "AKIAJPMCBNMMSR4RIR2Q"
lakefs_secret_key = "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"

# S3 클라이언트 생성
s3 = boto3.client(
    's3',
    endpoint_url=lakefs_endpoint,
    aws_access_key_id=lakefs_access_key,
    aws_secret_access_key=lakefs_secret_key,
)

# 업로드할 파일 정보
bucket_name = 'tes444'  # LakeFS 버킷 이름
object_key = 'main/test-file2.dmg'  # 업로드할 파일의 S3 객체 키(branchName+/+filePath)
file_path = '/Users/jwshin/test-file2.dmg'  # 업로드할 파일 경로

# Multipart 업로드 시작
upload_id = s3.create_multipart_upload(Bucket=bucket_name, Key=object_key)['UploadId']

# 파일을 여러 파트로 나누어 업로드
part_number = 1
etags = []
with open(file_path, 'rb') as file:
    while True:
        data = file.read(5 * 1024 * 1024)  # 5MB 단위로 나누어 업로드
        if not data:
            break
        response = s3.upload_part(
            Bucket=bucket_name,
            Key=object_key,
            PartNumber=part_number,
            UploadId=upload_id,
            Body=data,
        )
        print(f"Uploaded Part {part_number}")
        etags.append(response['ETag']) # ETag 는 Part 별로 생성됨
        part_number += 1

# Multipart 업로드 완료
parts = [{'PartNumber': i, 'ETag': etags[i-1]} for i in range(1, part_number)]

for obj in parts:
    print(f"- PartNumber: {obj['PartNumber']}, ETag: {obj['ETag']}")

s3.complete_multipart_upload(
    Bucket=bucket_name,
    Key=object_key,
    UploadId=upload_id,
    MultipartUpload={'Parts': parts},
)

print(f"File uploaded to {lakefs_endpoint}/{bucket_name}/{object_key}")
b

Barak Amar

01/05/2024, 8:25 AM
Hi @jwshin, the python script uses lakefs's s3 protocol support, while the UI uses lakeFS API to upload the data. The lakefs log and the UI network tab may hold more information about the reason we fail to upload. From the above description it looks like lakeFS manages to upload the data in this case, but doesn't get the response format from the storage and fail to extract ETag when the upload completes. Can you write down what you see in lakeFS log while trying to upload the file from the UI?
j

jwshin

01/05/2024, 8:35 AM
Hi @Barak Amar, Where should I submit the logs to LakeFS?
b

Barak Amar

01/05/2024, 8:43 AM
here, just a couple of lines from the time you uploaded the file
8 Views