Hi all, We're using the high-level Python SDK to ...
# help
o
Hi all, We're using the high-level Python SDK to interact with LakeFS and have tagged a release. Is there documentation on parallel downloading of objects in the tag using the SDK? I saw this can be done with `lakectl fs download`β€”is there a similar option with the SDK? Any suggestions on the best approach? Second question, is mounting a remote LakeFS repo to a local directory with Everest only available for Enterprise users? Thanks!
At the moment, this is what we have:
Copy code
# Download data
    with tqdm(total=len(objects), desc="Downloading files", unit="file") as pbar:
        for obj in objects:
            print(f"Downloading {obj.path}...")

            # Define the local file path
            local_file_path = os.path.join(local_training_data_storage, obj.path)

            # Create the directory if not exists
            os.makedirs(os.path.dirname(local_file_path), exist_ok=True)

            try:
                with (
                    open(local_file_path, "wb") as local_file,
                    tag.object(obj.path).reader("rb") as r,
                ):
                    local_file.write(r.read())
            except Exception as e:
                print(f"Failed to download {obj.path}: {e}")
            pbar.update(1)
a
Hi @Oscar Wong, Welcome! I think you've pretty much nailed down how to perform multiple concurrent downloads. There is no magic that "our" solutions perform. You might also use
lakectl local
, which provides an experience somewhat closer to "Everest". Everest itself is a feature of lakeFS Enterprise.
o
Thank you.
πŸ˜‡ 1
n
@Oscar Wong You can also use multiprocessing to achieve concurrency of the download process
πŸ‘ 1
πŸ‘πŸΌ 1
gratitude thank you 1