I'm trying to import some local data into lakeFS, ...
# help
g
I'm trying to import some local data into lakeFS, but I keep getting: > > raise lakefs_ex from e > E lakefs.exceptions.NotFoundException: code: 404, reason: Not Found, body: {'message': 'not found'} > > ../venv/lib/python3.11/site-packages/lakefs/exceptions.py148 NotFoundException
Copy code
lakefs_setup_branch.import_data(commit_message='added test data') \
        .object("<local://mnt/data/test_resources/test_pass/newest_pc2/flame_at_noctua2-naughty_subroutines.png>",
                "data/flame_at_noctua2-naughty_subroutines.png") \
        .run()
what's wrong?
n
@Giuseppe Barbieri please try:
Copy code
local:///mnt/data/test_resources/test_pass/newest_pc2/flame_at_noctua2-naughty_subroutines.png
note the
///
g
nope, same
but it should work, right? Is that compatible with local files?
file is definitely there and can be read from code
n
Can you please attach logs from the lakeFS server
Copy code
def test_pass(lakefs_setup_branch):
    print(os.path.exists("/mnt/data/test_resources/test_pass/newest_pc2/flame_at_noctua2-naughty_subroutines.png"))
    lakefs_setup_branch.import_data(commit_message='added test data') \
        .object("local:///mnt/data/test_resources/test_pass/newest_pc2/flame_at_noctua2-naughty_subroutines.png",
                "data/flame_at_noctua2-naughty_subroutines.png") \
        .run()
Can you please attach logs from the lakeFS server
or did you mean some other logs?
n
Yes, I need the logs from the lakeFS server itself, not from the SDK client
g
could you help me get them?
n
How did you run lakeFS?
g
as a service
n
Are you running the binary locally?
g
Copy code
● lakefs.service - LakeFS in podman container
     Loaded: loaded (/etc/systemd/system/lakefs.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-03-27 10:59:34 CET; 2h 43min ago
   Main PID: 21641 (lakefs.sh)
      Tasks: 14 (limit: 19123)
     Memory: 146.0M
        CPU: 5.739s
     CGroup: /system.slice/lakefs.service
             ├─21641 /usr/bin/bash /home/lakefs/lakefs.sh
             └─21642 ./lakefs --config config.yaml run
n
I'm not familiar with podman, so I don't know how to extract logs from it
I think I have some suspicion on what the problem might be. Can you try the following code:
Copy code
importer = lakefs_setup_branch.import_data(commit_message='added test data') \
        importer.object("local:///mnt/data/test_resources/test_pass/newest_pc2/flame_at_noctua2-naughty_subroutines.png",
                "data/flame_at_noctua2-naughty_subroutines.png")

importer.start()
sleep(2)
importer.wait()
ps: I imported
sleep
from
time
Copy code
importer.start()
    status = importer.start()
    while not status.completed:
        time.sleep(3)  # or whatever interval you choose
        status = importer.status()
> raise ImportManagerException("Import in progress")
E lakefs.exceptions.ImportManagerException: Import in progress
../venv/lib/python3.11/site-packages/lakefs/import_manager.py107 ImportManagerException
status = importer.status()
and not
status = importer.start()
(docs typo)
> raise lakefs_ex from e
E lakefs.exceptions.NotFoundException: code: 404, reason: Not Found, body: {'message': 'not found'}
../venv/lib/python3.11/site-packages/lakefs/exceptions.py148 NotFoundException
n
great, that's progress. The error you're seeing stems from the fact you didn't provide lakeFS permissions to access that local path. See blockstore.local configuration reference
g
so,
blockstore.local.path("/mnt/data")
do the trick?
n
No, please look at the configuration that pertains to import
g
sorry, I don't get it
I have no idea what is a blockstore/adapter and why I'd ever need it
n
The blockstore represents the underlying storage on top the lakeFS server works. It can be S3, Azure, GCP and local. In your case you are using your local filesystem (or the container's) to store the lakeFS data. Due to security reasons, in local adapter you need to state explicitly in the configuration which external paths are allowed for import. You do that via the
blockstore.local.import_enabled
and
blockstore.local.allowed_external_prefixes
configuration variables.
👍 1
g
tried
Copy code
blockstore:
  local:
    import_enabled: true
    allowed_external_prefixes: "/mnt/data"
in
.config.yaml
but still
E lakefs.exceptions.NotFoundException: code: 404, reason: Not Found, body: {'message': 'not found'}
../venv/lib/python3.11/site-packages/lakefs/exceptions.py148 NotFoundException
is there a way to check the current lakefs configuration?
ps: tried both /// and //
n
The lakeFS configuration is printed to the logs when the server initializes
is the mnt folder located inside the podman container or on your local system?
g
local, there is no podman involved right now
n
Are you using the code snippet I provided you?
g
yep
I can push, wait
the one you gave me it's commented, but I give it a try
n
Please do - also I don't have an account for that platform
g
sorry, I mean "I gave it a try"
same result
n
I'll need to see the stacktrace or lakeFS logs to help
g
n
Not it's not because there is no sleep there...
and you are calling status
I suggest you use the snippet I gave you exactly as it is and let me know the outcome
g
ok
..it works
wow, how?
n
Magic 🙂
🤪 1
I assume using
run
we query the status before the import id status is created. This is probably a bug when using import with local blockstore
I'll open an issue for it
👍 1
g
I'm sorry but I have to come with bad news 😞 even if the files and the commit are there, whenever I try to open a text file from the web interface, I get
operation error S3: GetObject, https response error StatusCode: 404, RequestID: 17C2841444BA5C58, HostID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, api error NoSuchBucket: The specified bucket does not exist
image.png
this is the webhook, for example