Hi, I've seen this pull request that have been mer...
# help
s
Hi, I've seen this pull request that have been merged : https://github.com/treeverse/lakeFS/pull/1310 But I do not understand how i can disallow empty commit on a repo. Is this possible?
b
Hi @Samuel THIRIET after the above PR you can't commit without changes. If you try, for example, to use lakectl and commit without don't any changes to your branch - the commit will fail the commit.
s
I have made more tests : The failure occured when there is no change but there is no failure, for example, if i upload the same file a second time and try to commit :
Copy code
root@74f1071332bb:/opt/bitnami/spark# lakectl fs upload <lakefs://my-repo/main/imdb/title.episode.tsv.gz> --source title.episode.tsv.gz
Path: imdb/title.episode.tsv.gz
Modified Time: 2021-11-30 13:04:33 +0000 UTC
Size: 28145618 bytes
Human Size: 28.1 MB
Physical Address: <local://storage-location/a1fd8e689f794ef39939c074b282242a>
Checksum: 990ca31cfa7cfecd36badecf08a4dec7
root@74f1071332bb:/opt/bitnami/spark# lakectl commit <lakefs://my-repo/main> --message "first commit"
Branch: <lakefs://my-repo/main>
Commit for branch "main" completed.

ID: 326bd71be9b17e09edbbf988c4a7ccf1253cfe6253df29231bd2e27b1daed603
Message: first commit
Timestamp: 2021-11-30 13:05:25 +0000 UTC
Parents: f742d9bb095fb689489cf7a2735ea7cfe7024513d648aae613bbcf7c23aa731d
root@74f1071332bb:/opt/bitnami/spark# lakectl fs upload <lakefs://my-repo/main/imdb/title.episode.tsv.gz> --source title.episode.tsv.gz
Path: imdb/title.episode.tsv.gz
Modified Time: 2021-11-30 13:06:44 +0000 UTC
Size: 28145618 bytes
Human Size: 28.1 MB
Physical Address: <local://storage-location/202bfa1d035f442685709f223f195999>
Checksum: 990ca31cfa7cfecd36badecf08a4dec7
root@74f1071332bb:/opt/bitnami/spark# lakectl commit <lakefs://my-repo/main> --message "second commit"
Branch: <lakefs://my-repo/main>
Commit for branch "main" completed.

ID: 5ad9c4118406bbdd1bee1452367c25f9b45505d15c15fcbc5541aad09f7f2fec
Message: second commit
Timestamp: 2021-11-30 13:06:49 +0000 UTC
Parents: 326bd71be9b17e09edbbf988c4a7ccf1253cfe6253df29231bd2e27b1daed603
root@74f1071332bb:/opt/bitnami/spark# lakectl diff <lakefs://my-repo/5ad9c4118406bbdd1bee1452367c25f9b45505d15c15fcbc5541aad09f7f2fec> <lakefs://my-repo/326bd71be9b17e09edbbf988c4a7ccf1253cfe6253df29231bd2e27b1daed603>
Left ref: <lakefs://my-repo/5ad9c4118406bbdd1bee1452367c25f9b45505d15c15fcbc5541aad09f7f2fec>
Right ref: <lakefs://my-repo/326bd71be9b17e09edbbf988c4a7ccf1253cfe6253df29231bd2e27b1daed603>
b
just to verify - when you upload the same file the second time. it is listed as a change as the timestamp was updated, right?
or you do not see any difference after the second upload?
s
Yes it is listed as changed before i commit (because of the modification date metadata?)
Copy code
root@74f1071332bb:/opt/bitnami/spark# lakectl fs upload <lakefs://my-repo/main/imdb/title.episode.tsv.gz> --source title.episode.tsv.gzPath: imdb/title.episode.tsv.gz
Modified Time: 2021-11-30 13:19:48 +0000 UTC
Size: 28145618 bytes
Human Size: 28.1 MB
Physical Address: <local://storage-location/e2808480d77b4a8f8956f7b6dd4c2c80>
Checksum: 990ca31cfa7cfecd36badecf08a4dec7

root@74f1071332bb:/opt/bitnami/spark# lakectl diff <lakefs://my-repo/main>                                                        
Ref: <lakefs://my-repo/main>
~ modified imdb/title.episode.tsv.gz
b
yes
you are looking for a way to ignore changes without content change?
s
yes exactly
b
how do you see your workflow for this? part of the commit? or part of a flag when you upload the data? because we will keep the old metadata.
s
perhaps on the upload action? by checking the checksum before the upload?
b
Skip upload if the content checksum is the same as a lakectl option? Note that if you still upload a file using our S3 interface or API in this case a change will be made. How do you use lakefs? If the change is requested from different clients, the above improvement will not be helpful.
I will open an issue so we can capture the request and track
s
yes you're right. it's not very consistent
Thank you for your help. Now I've understood why my commit is not empty 🙂 . I will try to explain our workflow on the issue.
b
Thanks I'll open it soon
@Samuel THIRIET fill free to watch and add your input to this one