jwshin
01/05/2024, 6:00 AMJonathan Rosenberg
01/05/2024, 7:26 AMjwshin
01/05/2024, 7:28 AMJonathan Rosenberg
01/05/2024, 7:28 AMJonathan Rosenberg
01/05/2024, 7:33 AMjwshin
01/05/2024, 7:36 AM---
logging:
format: json
level: DEBUG
output: "-"
database:
type: "postgres"
postgres:
connection_string: "<postgres://lakefs:pass@10.3.1.151:5432/lakefs>"
blockstore:
type: s3
s3:
force_path_style: true
endpoint: <http://10.3.1.153:8080>
discover_bucket_region: false
credentials:
access_key_id: P4MSQ29IG3M5GZ9PA68C
secret_access_key: eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP
jwshin
01/05/2024, 7:39 AMblockstore.s3.disable_pre_signed_ui=true
to enable blockstore.blockstore-name.disable_pre_signed_ui
as recommended in the guide?"
podman run --name lakefs-2 -d -p 8000:8000 \
-e LAKEFS_BLOCKSTORE_TYPE=s3 \
-e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true \
-e LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://10.3.1.153:8080> \
-e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=P4MSQ29IG3M5GZ9PA68C\
-e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=eU179WNSgbAqMLbCVBG0pV1bRR7U4fOkgoRFGeiP \
-v /mnt/lakefs/lakefs.yaml:/etc/lakefs/config.yaml \
treeverse/lakefs:latest \
run --local-settings
Jonathan Rosenberg
01/05/2024, 7:41 AMblockstore.s3.disable_pre_signed_ui=true
is the default value so you donβt need to explicitly set it.
Ok, let me take a lookjwshin
01/05/2024, 7:43 AMJonathan Rosenberg
01/05/2024, 7:46 AMjwshin
01/05/2024, 8:03 AMlakefs-client-upload.py
script. However, when using the LakeFS UI, I encounter the "s3 error: missing ETag"
error, and it seems that the upload is not happening to the LakeFS Repository. Uploading through the UI results in data being stored in the Ceph S3 bucket, but it does not get uploaded to the LakeFS Repository. Please review and confirm.
import boto3
# LakeFS S3 μλν¬μΈνΈ λ° μ격 μ¦λͺ
μ€μ
lakefs_endpoint = "<http://10.3.1.151:8000>"
lakefs_access_key = "AKIAJPMCBNMMSR4RIR2Q"
lakefs_secret_key = "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"
# S3 ν΄λΌμ΄μΈνΈ μμ±
s3 = boto3.client(
's3',
endpoint_url=lakefs_endpoint,
aws_access_key_id=lakefs_access_key,
aws_secret_access_key=lakefs_secret_key,
)
# μ
λ‘λν νμΌ μ 보
bucket_name = 'tes444' # LakeFS λ²ν· μ΄λ¦
object_key = 'main/test-file2.dmg' # μ
λ‘λν νμΌμ S3 κ°μ²΄ ν€(branchName+/+filePath)
file_path = '/Users/jwshin/test-file2.dmg' # μ
λ‘λν νμΌ κ²½λ‘
# Multipart μ
λ‘λ μμ
upload_id = s3.create_multipart_upload(Bucket=bucket_name, Key=object_key)['UploadId']
# νμΌμ μ¬λ¬ ννΈλ‘ λλμ΄ μ
λ‘λ
part_number = 1
etags = []
with open(file_path, 'rb') as file:
while True:
data = file.read(5 * 1024 * 1024) # 5MB λ¨μλ‘ λλμ΄ μ
λ‘λ
if not data:
break
response = s3.upload_part(
Bucket=bucket_name,
Key=object_key,
PartNumber=part_number,
UploadId=upload_id,
Body=data,
)
print(f"Uploaded Part {part_number}")
etags.append(response['ETag']) # ETag λ Part λ³λ‘ μμ±λ¨
part_number += 1
# Multipart μ
λ‘λ μλ£
parts = [{'PartNumber': i, 'ETag': etags[i-1]} for i in range(1, part_number)]
for obj in parts:
print(f"- PartNumber: {obj['PartNumber']}, ETag: {obj['ETag']}")
s3.complete_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id,
MultipartUpload={'Parts': parts},
)
print(f"File uploaded to {lakefs_endpoint}/{bucket_name}/{object_key}")
Barak Amar
jwshin
01/05/2024, 8:35 AMBarak Amar