I have a question. If anyone knows, please help. ...
# help
u
I have a question. If anyone knows, please help. Let me start with my current situation. We have TBs of data. To manage it, I want to implement LakeFS. The data resides on my local server. It's an Ubuntu server and the file system is organized in a directory format. Is it possible to import them all at once? If you know of any documentation, please let me know. Thank you very much.
h
you should try ChatGPT to do translation 😉
👍 1
u
Thank you !! 🙂
h
Docs only mention import from Cloud Storage: https://docs.lakefs.io/howto/import.html Import for local filesystem ... need to wait for LakeFS engineer
👍 1
u
Hmm.. I would like to import existing data in lakeFS locally.. okay!
u
oh! there is "rclone sync /home/myuser/path/ lakefs:example-repo/main/path"
okay! Thank you I will do that!
h
just be aware that is a "copy" and not import. Meaning that TBs will be duplicated. Not sure if that was the initial intention ... But it may be the only solution in this case
🫠 1
u
Okay, You are right.. Copying is different to my intention..
a
lakeFS can import from object store only. So, you have 2 options: 1. Copy from local server to object store. Import data from object store to lakeFS. Delete data from local server. 2. Use Rclone to copy data from local server to lakeFS repo. Delete data from local server once you start using lakeFS.
u
Thank you for your kind response. I was wondering about a little more detail. For method 2, I think I can install "Rclone" and follow the steps in the link you sent out. For method 1, can you be a little more detailed, where exactly is the object store? This may be a basic question, but I'd appreciate it if you could answer it.
a
lakeFS works with Object Store/Storage like S3, Azure Blob, GCS, MinIO etc. Do you use any of these Object Store?
u
No,, Because my data are medical image. We need to store only local server.
a
u
Thank you I am so appreciate with you!! I look into those!
a
If you use lakeFS Samples for testing then it includes MinIO Object Storage (which is Open Source): https://github.com/treeverse/lakeFS-samples/tree/main?tab=readme-ov-file#containers Once you get familiar with MinIO then you can install MinIO on your local storage and move your data to MinIO.
👍 1
h
@Amit Kesarwani Is it ok to use non-object storage as backend storage for LakeFS in "production" context ?? I know that the example Lakefs with Docker use local filesystem as storage backend but not sure if that scale well with production ?
u
Actually, I do not build it with production env.
Good Reference! thank you for let me broad the insight
@HT where is the example Lakefs with Docker use local filesystem as storage backend?
h
https://docs.lakefs.io/quickstart/launch.html This will spin up a bunch of containers and provide a Lakefs server for you, locally. In this case, the lakefs server will use a folder inside one of those container as storage backend Edit: crap ! google got me with that old doc again
gratitude thank you 1
👍 1
n
Hi Martin, If you are using a local filesystem as the backing store for lakeFS you could use the import functionality to do what you need
@HT the link you provided is for a very old documentation of lakeFS
u
@HT Thank you!! @Niro Hi ! Okay! the import functionality can support the local filesystem?!
n
Yes, as long as the underlying storage for your lakeFS server is also the local filesystem. You will also need to allow imports as well as the import path in the blockstore configuration since we added safe guards for security reasons. You can find the full configuration parameters here
lakefs 1
👍 1
u
Thank you I will do that too. Can I have a question? below setting: is allowed_external_prefixes actual my data for importing?
Copy code
blockstore:
  type: local
  local:
    path: /home/jhhan/02_dev/dvc/work/lakefs    # location where data and metadata kept by lakeFS
    import_enabled: true                  # required to be true to enable import files
                                          # from `allowed_external_prefixes` locations
    allowed_external_prefixes:
      - /home/jhhan/02_dev/dvc/src  # location with files we can import into lakeFS, require access from lakeFS
Cool! It works! Thank you everybody!
lakefs 1