I have some problems with the Python SDK when I try to uploa lakeFS #help

I have some problems with the Python SDK. when I t...

Sebastian Haugg

04/25/2024, 12:39 PM

I have some problems with the Python SDK. when I try to upload a file to LakeFS with metadata as a dict. When I check the metadata online or when read by the python SDK there is only the first entry from the metadata present. I have the same problem whether I use the object.upload or the object.write method.I'm not sure if there is a Problem on my side or with LakeFS. Here is the code I'm using at the Moment:

Copy code

with open(filename, "r+b") as f:
  with self.branch_data.object(path=self._string_builder()).writer( mode="wb", metadata=meta_data) as writer:
    writer.write(f.read())

The dict is converted to type str:str beforehand

Niro

04/25/2024, 12:44 PM

@Sebastian Haugg Hi! Why are you converting the dict to a string?

Sebastian Haugg

04/25/2024, 12:46 PM

No, sorry for not making it clear, Im converting the entries to string beforehand. If the entries are eg. {"a": 1, "b": 0.01, "c": "text"} it would be converted to {"a": "1", "b": "0.01", "c": "text"}

Niro

04/25/2024, 12:58 PM

Which SDK version are you currently using?

Niro

04/25/2024, 12:59 PM

Can you please try to print the meta_data parameter before running the writer line

Sebastian Haugg

04/25/2024, 1:06 PM

lakefs: 0.4.1 lakefs-sdk: 1.13.0 will add meta_data output as soon as Code is finished running

Elad Lachmi

04/25/2024, 1:25 PM

Thank you @Sebastian Haugg Can you please try to print

meta_data

before calling

writer.write

and post it here?

Sebastian Haugg

04/25/2024, 3:01 PM

`print(meta_data)`output before calling the context manager:

{'Preprocessor': 'CDE', 'scaling_period': '35', 'n_target_days': '1', 'replace_zeros': 'True', 'remove_density': '0.01'}

Output after calling the context manager(just to be sure):

{'Preprocessor': 'CDE', 'scaling_period': '35', 'n_target_days': '1', 'replace_zeros': 'True', 'remove_density': '0.01'}

Niro

04/26/2024, 8:16 AM

How did you identify you only have the first key in the metadata?

Niro

04/26/2024, 8:16 AM

Can you try a stat object command after the write and look at the metadata returned?

Sebastian Haugg

04/26/2024, 10:21 AM

re 1) in the web-ui in the object-info it only displays the first entry(see screenshot) and additionally when I read the metadata at another place in the code with `obj.stats().metadata`then the resulting dictionary only has the first entry of the "original" matedata. re 2) the `stats()`call right after writing the object returns

{'path': 'gold/<customer>/File-Full.parquet', 'physical_address': 's3://<repo>/data/<uuid>/<uuid>', 'physical_address_expiry': None, 'checksum': 'abc123a068516cc2fcdd11bfff789xyz', 'size_bytes': 134440782, 'mtime': 1714126066, 'metadata': {'preprocessor': 'CDE'}, 'content_type': 'application/octet-stream'}

, which I redacted a little bit because of the policy at my company.

Niro

04/26/2024, 10:50 AM

That's so strange... I am not able to reproduce your issue in my setup. Can you please provide a minimal code snippet that reproduces the issue

Sebastian Haugg

04/26/2024, 11:10 AM

Since the Code contains highly private information and I'm not sure what and what not to share my supervisor will take this issue to the private help-channel from the company. All further discussion will be held there. Thank you for your time so far 😄 and have a nice Weekend

Niro

04/26/2024, 11:11 AM

Have a good weekend

5 Views

Open in Slack

Previous Next