I am testing out the new `lakefs_spec`-library in ...
# help
f
I am testing out the new
lakefs_spec
-library in Python and encountering some issues when trying to upload a pandas DataFrame to LakeFS. Local files are working as expected, but Pandas DataFrames does not seem to upload anything. Is this a known problem? Here is the test code I am working on:
Copy code
import lakefs
import pandas as pd

from lakefs_spec import LakeFSFileSystem

fs = LakeFSFileSystem(
    host="...",
    username="...",
    password="...",
)

REPO = "dev"
BRANCH = lakefs.Branch("dev", "testing", client=fs.client).create("main")

df = pd.read_csv("day-ahead.csv")

with fs.transaction(REPO, BRANCH) as tx:
    df.to_csv(f"lakefs://{REPO}/{tx.branch.id}/day-ahead.csv")
    fs.put("app.py", f"{REPO}/{tx.branch.id}/app.py")
    tx.commit(message="This is a test")
In this case, the
app.py
is successfully uploaded, while the
day-ahead.csv
is not uploaded/written to LakeFS. I've attached the CSV file/dataset.
n
Hi @Fredrik Bakken, lakefs-spec is a community contribution and is not maintained by lakeFS. I think you'll get better answers if you try posting an issue in their repo FYI @Nicholas Junge
f
Great, thanks @Niro! Sorry, I saw it mentioned on the docs-page and assumed it was an official LakeFS package. I'll head over there to investigate further.
n
NP, I'm sure there are some people who can help here, but I think you'll get better results there 🙂
🙌 1
n
Hey! Thanks for trying out our package. Your repro suggests that something (again…) broke in the
fs.open()
codepath, which is used internally by pandas’
<http://DataFrame.to|DataFrame.to>_csv()
method. I’d also like to see this issue on our GitHub for visibility, so if you’re up for it, could you raise this with the repro attached? Otherwise I can paste it over.
🙏🏽 1
f
@Nicholas Junge, thanks for the reply! I'll try to post a detailed explanation of the issue on the project's GitHub page today - or over the weekend :)
🦜 1
👍 1