HT
08/21/2023, 2:44 AMimport s3fs
import smartfs
conf = smartfs.load_rclone_profile("lakefs")
fs = s3fs.S3FileSystem(
anon=False,
endpoint_url=conf["endpoint"],
key=conf["access_key_id"],
secret=conf["secret_access_key"],
)
## Comment/uncomment following to trigger the bug.
fs.find("<s3://test-repo/main/>",maxdepth=None,withdirs=False)
print(fs.find("<s3://test-repo/main/>",maxdepth=1,withdirs=True))
The content looks like this:
$ rclone ls lakefs:test-repo/main/
0 dir1/fileB
0 fileA
When I run the code above, the second find
should list 2 entries: fileA
, dir1
but somehow it is missing dir1
But if I comment out the first find
, then the second find
will list the 2 entries correctely.
I don't have an AWS account in order to check if it's a lakefs issue or s3fs issue.
Can someone with a real AWS account test this ?
Context:
Lakefs self hosted: 0.104.0
pip freeze:
aiobotocore==2.5.2
boto3==1.21.21
botocore==1.29.161
s3fs==2023.6.0
s3transfer==0.5.2
Edit: I was decribing the expected result slightly wrong.Niro
08/21/2023, 6:47 AMHT
08/21/2023, 8:31 AMNiro
08/21/2023, 8:32 AMGuy Hardonag
08/21/2023, 4:26 PMs3fs
is caching the first requestfs.invalidate_cache(path=None)
between calls:
import s3fs
import smartfs
conf = smartfs.load_rclone_profile("lakefs")
fs = s3fs.S3FileSystem(
anon=False,
endpoint_url=conf["endpoint"],
key=conf["access_key_id"],
secret=conf["secret_access_key"],
)
## Comment/uncomment following to trigger the bug.
fs.find("<s3://test-repo/main/>",maxdepth=None,withdirs=False)
fs.invalidate_cache(path=None)
print(fs.find("<s3://test-repo/main/>",maxdepth=1,withdirs=True))
HT
08/21/2023, 10:05 PM