We've been encountering an issue where LakeFS file...
# help
a
We've been encountering an issue where LakeFS files that our system is creating end up being created as directories rather than files, causing issues when other processes try to create them. We've been able to reproduce the "consumer" side of the issue with
lakectl local checkout
which produces an error of the following form (file paths edited):
Copy code
$ lakectl local checkout --yes .
...
download path/to/example.jsonl failed: could not create file '/Users/aaron/repo/data/path/to/example.jsonl': open /Users/aaron/repo/data/path/to/example.jsonl failed: is a directory
The LakeFS location looks like this (paths changed, other things not):
Copy code
$ lakectl fs ls -r <lakefs://example/COMMIT/path/to/>
object          2025-08-09 09:15:10 -0700 PDT    83.5 kB         path/to/example.jsonl
object          2025-08-01 12:06:13 -0700 PDT    86.6 kB         path/to/example.jsonl/9e0b1aabbf762a4494e47dd282e5c4cca1daaed40ac96f8ffcc61ecf38a47242
What it appears is that some LakeFS operation is partially failing, causing it to leave the object in some sort of broken state? Any guidance on how best to debug this? We've written a script to clean these up and re-run things but that's obviously not ideal! One theory is that seems to happen when the LakeFS deployment is under higher load.
a
Hi Aaron, I'm sorry you're having issues with
lakectl local
. I've had an in-depth look. Firstly, I'd like to affirm that this is not a lakeFS bug involving data loss. You are running into a limitation of
lakectl local
. lakeFS is a versioned object store. Like S3, in an object store there are no directories and no special meaning to characters like
/
- they are just characters in a path. So your repository has 2 objects, whose names are:
Copy code
path/to/example1.json
path/to/example1.json/9e0b1aabbf76...
This is allowed in object stores, and indeed some older applications take advantage of this. But
lakectl local
has to create a mirror of your repository on your local filesystem. Filesystems are not object stores. There are directories, and a character
/
is really special. There is no way for
lakectl local
to recreate this structure: •
path/to/example1.json
needs to be a file of 83.5KiB. •
path/to/example1.json
also needs to be a directory, which will contain a file
9e0b1aabbf76...
. But this is of course impossible on a filesystem. Fortunately, while the mapping from object stores to filesystems can fail, the reverse mapping from filesystems to object stores is always possible. Could you verify your intended directory structure after
lakectl local
, please? I think it will then be possible to figure out the paths to use on lakeFS. Hope this helps!
a
Hi Ariel, thanks for looking into this! That analysis makes sense in terms of why this issue is manifesting in the way that it currently does from the lakectl command line. I suppose that now points to the root of the issue, which is still unknown to us. The intended directory structure is a single file called:
Copy code
path/to/example1.json
The inner file is not something that our code ever creates. Our system uses https://lakefs-spec.org/latest/ from Python code to write files to LakeFS, and occasionally (we think when the LakeFS service we have is under particularly high load), this scenario occurs and causes an error in our pipeline. Is there anything within LakeFS or this library (I realize it's not directly owned by you all though it is mentioned in the docs) that could cause such a scenario?
a
Hmmm... interesting. There's this concept of "directory markers" in object-store-as-filesystems. Unfortunately there is no standard way to do this. lakeFS-spec might be generating a directory marker which lakectl local does not understand. That would be a bug in either one of them. Hopefully we still have @Nicholas Junge (or perhaps the other @Nicholas Junge) around - if so, please comment!
👍 1
n
Thanks for the tag! I have not seen this before either in tests or our own user code (we developed lakefs-spec to solve a need in an internal project), so a reproducer would be really helpful. If you cannot provide one, the sequence of filesystem apis you are calling would also help.
a
Hi again @Aaron Taylor , Did you get a chance to look at this? Right now we've no real understanding of what went wrong.