Hi there, I encountered an issue, and I was wonder...
# help
j
Hi there, I encountered an issue, and I was wondering if you have a moment to help? The problem I'm facing is as follows: I uploaded sample data of more than 5MB to the LakeFS Repository, and I encountered an "s3 error: missing etag" message. It seems that adding the etag option to the CORS header should resolve this, but I'm not quite sure how to do it. Could you provide some guidance or a guide on how to address this?
j
Hi @jwshin , Let me have a look… In the meantime, did you enable the pre-signed url option in lakeFS?
j
You're asking if they are inquiring about whether you have enabled the pre-signed URL option as mentioned in the guide, correct? https://docs.lakefs.io/reference/security/presigned-url.html
j
If you enabled it at all
and also what is the client you use to upload the object?
j
"I didn't enable it separately. The configurations I've set are identical to the attached config file below. The client used to upload objects is Ceph."
Copy code
---
logging:
  format: json
  level: DEBUG
  output: "-"

database:
  type: "postgres"
  postgres:
    connection_string: "<postgres://lakefs:pass@10.3.1.151:5432/lakefs>"

blockstore:
  type: s3
  s3:
    force_path_style: true
    endpoint: <http://10.3.1.153:8080>
    discover_bucket_region: false
    credentials:
      access_key_id: P4MSQ29IG3M5GZ9PA68C
      secret_access_key: eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP
"Could I set
blockstore.s3.disable_pre_signed_ui=true
to enable
blockstore.blockstore-name.disable_pre_signed_ui
as recommended in the guide?"
Copy code
podman run --name lakefs-2 -d -p 8000:8000 \
           -e LAKEFS_BLOCKSTORE_TYPE=s3 \
           -e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true \
           -e LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://10.3.1.153:8080> \
           -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=P4MSQ29IG3M5GZ9PA68C\
           -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP \
           -v /mnt/lakefs/lakefs.yaml:/etc/lakefs/config.yaml \
           treeverse/lakefs:latest \
           run --local-settings
j
blockstore.s3.disable_pre_signed_ui=true
is the default value so you don’t need to explicitly set it. Ok, let me take a look
j
Thank you. I'll wait for your response.
j
πŸ‘
πŸ‘ 1
j
I have confirmed that the upload is working correctly using the
lakefs-client-upload.py
script. However, when using the LakeFS UI, I encounter the
"s3 error: missing ETag"
error, and it seems that the upload is not happening to the LakeFS Repository. Uploading through the UI results in data being stored in the Ceph S3 bucket, but it does not get uploaded to the LakeFS Repository. Please review and confirm.
Copy code
import boto3

# LakeFS S3 μ—”λ“œν¬μΈνŠΈ 및 자격 증λͺ… μ„€μ •
lakefs_endpoint = "<http://10.3.1.151:8000>"
lakefs_access_key = "AKIAJPMCBNMMSR4RIR2Q"
lakefs_secret_key = "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"

# S3 ν΄λΌμ΄μ–ΈνŠΈ 생성
s3 = boto3.client(
    's3',
    endpoint_url=lakefs_endpoint,
    aws_access_key_id=lakefs_access_key,
    aws_secret_access_key=lakefs_secret_key,
)

# μ—…λ‘œλ“œν•  파일 정보
bucket_name = 'tes444'  # LakeFS 버킷 이름
object_key = 'main/test-file2.dmg'  # μ—…λ‘œλ“œν•  파일의 S3 객체 ν‚€(branchName+/+filePath)
file_path = '/Users/jwshin/test-file2.dmg'  # μ—…λ‘œλ“œν•  파일 경둜

# Multipart μ—…λ‘œλ“œ μ‹œμž‘
upload_id = s3.create_multipart_upload(Bucket=bucket_name, Key=object_key)['UploadId']

# νŒŒμΌμ„ μ—¬λŸ¬ 파트둜 λ‚˜λˆ„μ–΄ μ—…λ‘œλ“œ
part_number = 1
etags = []
with open(file_path, 'rb') as file:
    while True:
        data = file.read(5 * 1024 * 1024)  # 5MB λ‹¨μœ„λ‘œ λ‚˜λˆ„μ–΄ μ—…λ‘œλ“œ
        if not data:
            break
        response = s3.upload_part(
            Bucket=bucket_name,
            Key=object_key,
            PartNumber=part_number,
            UploadId=upload_id,
            Body=data,
        )
        print(f"Uploaded Part {part_number}")
        etags.append(response['ETag']) # ETag λŠ” Part λ³„λ‘œ 생성됨
        part_number += 1

# Multipart μ—…λ‘œλ“œ μ™„λ£Œ
parts = [{'PartNumber': i, 'ETag': etags[i-1]} for i in range(1, part_number)]

for obj in parts:
    print(f"- PartNumber: {obj['PartNumber']}, ETag: {obj['ETag']}")

s3.complete_multipart_upload(
    Bucket=bucket_name,
    Key=object_key,
    UploadId=upload_id,
    MultipartUpload={'Parts': parts},
)

print(f"File uploaded to {lakefs_endpoint}/{bucket_name}/{object_key}")
b
Hi @jwshin, the python script uses lakefs's s3 protocol support, while the UI uses lakeFS API to upload the data. The lakefs log and the UI network tab may hold more information about the reason we fail to upload. From the above description it looks like lakeFS manages to upload the data in this case, but doesn't get the response format from the storage and fail to extract ETag when the upload completes. Can you write down what you see in lakeFS log while trying to upload the file from the UI?
j
Hi @Barak Amar, Where should I submit the logs to LakeFS?
b
here, just a couple of lines from the time you uploaded the file