Hello ! I’ve been taking look at LakeFS for a whil...
# help
d
Hello ! I’ve been taking look at LakeFS for a while now and it seems that it could solve a bunch of our issues and even replace some of our internal tooling. However like many other companies, we have strict GDPR rules to follow so I’ve read your “GDPR Best Practices Using lakeFS” document. In theory this fits my needs but I’m struggling to understand how is it applied in practice. Could you point me to examples or documentation that implements parts of the option 2 and/or 3 like how to : • delete an object so that it’s not referenced by a commit anymore • create a new version of the dataset for each commit Thanks you 🙏
i
Hi @Damien Matias! There are a couple of ways to delete data in lakeFS. The first way will be to delete the data “regularly”, with one difference - do this for all commits (historical versions or across branches). Once you do that to a file, the garbage collection of lakeFS will make sure to delete that file. A second, will be to physically delete the object from the object store. You can click on the cogwheel next to each object, select object info and see the physical address of that file. (or even use lakectl command line, see more info here). Of course you will have consequences trying to reach out to these files at the future like you would have trying to reach any file on S3 that no longer exists. Creating a new dataset for each commit will mean running the ETLs on the new dataset (once deleted) to achieve reproducibility of data. HTH, also happy to get jump on a conversation with our product team since we are investigating ourselves advanced use cases here. Will that be of interest for you?
@Damien Matias - Helpful?
d
sorry for the delay of response ! Thank you for your answer, this is quite useful 🙂 Regarding that conversation with the product team, I would be indeed interested
i
Thanks. Let me work on that. What time zone are you in?
d
Paris (GMT+1)