How can I create folder-like structure under main ...
# help
u
How can I create folder-like structure under main branch to manage the data? For example, lakefs://myproject/main/trainingset, lakefs://myproject/main/validationset, lakefs://myproject/main/testset
u
Hey @donald, welcome! Similar to S3, objects in lakeFS are stored under keys. You can create a folder-like structure like you suggested by simply saving the objects under keys beginning with these prefixes. For example, uploading an object with the name
<lakefs://myproject/main/trainingset/data.csv|lakefs://myproject/main/trainingset/data.csv>
will appear as if it's contained in the directory "trainingset"
u
An example upload command would be:
Copy code
lakectl fs upload <lakefs://myproject/trainingset/data.csv|lakefs://myproject/trainingset/data.csv> -s /path/to/local/data.csv
u
@Yoni Svechinsky Thanks for your reply. Yes, it creates folder-like structure by uploading single file. However when I use the batch upload command, lakectl fs upload --recursive --source d:\data lakefs://myproject/main/trainingset/, it appears as "trainingset\1.jpg", "trainingset\2.jpg", ..., under main, not as "1.jpg", "2.jpg"... under main\trainingset. May I know how I can fix the issue?
u
@donald, thank you for the details, let me check this
u
@donald you are right, it's a bug. The good news is a fix is expected very soon. See the issue here: https://github.com/treeverse/lakeFS/issues/2880
u
Thanks. I will try linux version and see whether it works
u
Please let us know if it works. Also feel free to comment on the GitHub issue. We're right here if you need anything else
u
Yes, Linux version lakectl works as expected.
u
Thanks for the update! The Windows fix is expected by the end of the week
u
@Yoni Augarten One more question. Currently I am evaluating different DataOps tools to version control data. I need to have better understanding how the version control works. I have looked the lakefs blogs and cannot find any useful information how the data version control works. May I know where the meta data are maintained? I cannot see them in postgres database. Is this in "_lakefs" subfolder under my local storage path? is there any documents to explain how the version control is working in details?
u
Great question. You are right, the metadata describing committed objects resides under the
_lakefs
path of your storage namespace. You can read about how commits are structured in this great post: https://lakefs.io/concrete-graveler-committing-data-to-pebbledb-sstables/ Metadata about objects that are not committed yet is stored in postgres, in the
graveler_staging_kv
table