Fredrik Bakken
05/30/2024, 1:47 PMlakefs_spec
-library in Python and encountering some issues when trying to upload a pandas DataFrame to LakeFS. Local files are working as expected, but Pandas DataFrames does not seem to upload anything. Is this a known problem?
Here is the test code I am working on:
import lakefs
import pandas as pd
from lakefs_spec import LakeFSFileSystem
fs = LakeFSFileSystem(
host="...",
username="...",
password="...",
)
REPO = "dev"
BRANCH = lakefs.Branch("dev", "testing", client=fs.client).create("main")
df = pd.read_csv("day-ahead.csv")
with fs.transaction(REPO, BRANCH) as tx:
df.to_csv(f"lakefs://{REPO}/{tx.branch.id}/day-ahead.csv")
fs.put("app.py", f"{REPO}/{tx.branch.id}/app.py")
tx.commit(message="This is a test")
In this case, the app.py
is successfully uploaded, while the day-ahead.csv
is not uploaded/written to LakeFS.
I've attached the CSV file/dataset.Niro
05/30/2024, 1:58 PMFredrik Bakken
05/30/2024, 2:01 PMNiro
05/30/2024, 2:01 PMNicholas Junge
05/31/2024, 6:08 AMfs.open()
codepath, which is used internally by pandas’ <http://DataFrame.to|DataFrame.to>_csv()
method.
I’d also like to see this issue on our GitHub for visibility, so if you’re up for it, could you raise this with the repro attached? Otherwise I can paste it over.Fredrik Bakken
05/31/2024, 6:29 AM