I have a bucket with two billion files. What are s...
# help
y
I have a bucket with two billion files. What are some ways to speed up the import of data? Based on the current test speed (14,000/s), it will take me more than forty hours to complete the import of all the data. My test machine configuration is 8 cores and 16GB of memory.
o
Hi @YUREN WANDER Depending on your data structure, you may want to parallelize your import by importing different folders into different branches in lakeFS and then merge them all together into one branch
👍 1
n
Hi @YUREN WANDER The import API allows providing several sources for import. So basically if you have you data split into several prefixes you can provide these as sources and the lakeFS server will parallelize the import process. This is however available only while using the API directly and not via UI/CLI (at the moment)
👍 1
y
thx~
lakefs 1