Hi teams. I have a weird bug that I don't know if ...
# help
h
Hi teams. I have a weird bug that I don't know if it coming from lakefs or fsspec/s3fs : The code:
Copy code
import s3fs
import smartfs
conf = smartfs.load_rclone_profile("lakefs")



fs = s3fs.S3FileSystem(
                    anon=False,
                    endpoint_url=conf["endpoint"],
                    key=conf["access_key_id"],
                    secret=conf["secret_access_key"],
                )

## Comment/uncomment following to trigger the bug. 
fs.find("<s3://test-repo/main/>",maxdepth=None,withdirs=False)

print(fs.find("<s3://test-repo/main/>",maxdepth=1,withdirs=True))
The content looks like this:
Copy code
$ rclone ls lakefs:test-repo/main/
        0 dir1/fileB
        0 fileA
When I run the code above, the second
find
should list 2 entries:
fileA
,
dir1
but somehow it is missing
dir1
But if I comment out the first
find
, then the second
find
will list the 2 entries correctely. I don't have an AWS account in order to check if it's a lakefs issue or s3fs issue. Can someone with a real AWS account test this ? Context: Lakefs self hosted: 0.104.0 pip freeze:
Copy code
aiobotocore==2.5.2
boto3==1.21.21
botocore==1.29.161
s3fs==2023.6.0
s3transfer==0.5.2
Edit: I was decribing the expected result slightly wrong.
n
Hi @HT , There was a bug fix that was released recently and might help with your issue. Can you try upgrading to the latest lakeFS version (v0.107.0) and try again?
👍 1
h
We just updated our server to 107 and the bug is still there ...
n
Thanks @HT - we'll try to reproduce it on our end
g
Hi @HT, I managed to reproduce this issue The same thing happens in S3 as well when running on lakeFS I noticed that in both cases (with and without the comment) only one request is sent Looks like
s3fs
is caching the first request
Try adding:
fs.invalidate_cache(path=None)
between calls:
Copy code
import s3fs
import smartfs
conf = smartfs.load_rclone_profile("lakefs")



fs = s3fs.S3FileSystem(
                    anon=False,
                    endpoint_url=conf["endpoint"],
                    key=conf["access_key_id"],
                    secret=conf["secret_access_key"],
                )

## Comment/uncomment following to trigger the bug. 
fs.find("<s3://test-repo/main/>",maxdepth=None,withdirs=False)

fs.invalidate_cache(path=None)

print(fs.find("<s3://test-repo/main/>",maxdepth=1,withdirs=True))
h
Thank you very much for confirming this. I will log an issue to fsspec/s3fs.
👌 1