I have some problems with the Python SDK. when I t...
# help
s
I have some problems with the Python SDK. when I try to upload a file to LakeFS with metadata as a dict. When I check the metadata online or when read by the python SDK there is only the first entry from the metadata present. I have the same problem whether I use the object.upload or the object.write method.I'm not sure if there is a Problem on my side or with LakeFS. Here is the code I'm using at the Moment:
Copy code
with open(filename, "r+b") as f:
  with self.branch_data.object(path=self._string_builder()).writer( mode="wb", metadata=meta_data) as writer:
    writer.write(f.read())
The dict is converted to type str:str beforehand
n
@Sebastian Haugg Hi! Why are you converting the dict to a string?
s
No, sorry for not making it clear, Im converting the entries to string beforehand. If the entries are eg. {"a": 1, "b": 0.01, "c": "text"} it would be converted to {"a": "1", "b": "0.01", "c": "text"}
n
Which SDK version are you currently using?
Can you please try to print the meta_data parameter before running the writer line
s
lakefs: 0.4.1 lakefs-sdk: 1.13.0 will add meta_data output as soon as Code is finished running
e
Thank you @Sebastian Haugg Can you please try to print
meta_data
before calling
writer.write
and post it here?
s
`print(meta_data)`output before calling the context manager:
{'Preprocessor': 'CDE', 'scaling_period': '35', 'n_target_days': '1', 'replace_zeros': 'True', 'remove_density': '0.01'}
Output after calling the context manager(just to be sure):
{'Preprocessor': 'CDE', 'scaling_period': '35', 'n_target_days': '1', 'replace_zeros': 'True', 'remove_density': '0.01'}
n
How did you identify you only have the first key in the metadata?
Can you try a stat object command after the write and look at the metadata returned?
s
re 1) in the web-ui in the object-info it only displays the first entry(see screenshot) and additionally when I read the metadata at another place in the code with `obj.stats().metadata`then the resulting dictionary only has the first entry of the "original" matedata. re 2) the `stats()`call right after writing the object returns
{'path': 'gold/<customer>/File-Full.parquet', 'physical_address': 's3://<repo>/data/<uuid>/<uuid>', 'physical_address_expiry': None, 'checksum': 'abc123a068516cc2fcdd11bfff789xyz', 'size_bytes': 134440782, 'mtime': 1714126066, 'metadata': {'preprocessor': 'CDE'}, 'content_type': 'application/octet-stream'}
, which I redacted a little bit because of the policy at my company.
n
That's so strange... I am not able to reproduce your issue in my setup. Can you please provide a minimal code snippet that reproduces the issue
s
Since the Code contains highly private information and I'm not sure what and what not to share my supervisor will take this issue to the private help-channel from the company. All further discussion will be held there. Thank you for your time so far 😄 and have a nice Weekend
n
Have a good weekend