I’ve been chatting with data engineers about the d...
# help
u
I’ve been chatting with data engineers about the dead letter queue concept applied to data pipelines. Very related to @Adi Polak ‘s post about circuit breakers https://lakefs.slack.com/archives/C020N7X2Y0H/p1673514224119879. If data that’s being brought in has defects and I’d prefer it to be in the “penalty box” until inspection, is lakeFS a good option to hold it?
u
Hi @Beegee Alop, Wow, that's a big question! I guess @Adi Polak would be the best person to answer. Personally, I'd probably use a branch as my penalty box. So: • Data goes into branch
unchecked
as a new commit • Some orchestration (maybe Airflow?) : ◦ picks up the new commit ◦ validates it using my tool of choice ◦ IF validation passes, merge it into main So now I do have my latest "raw" data available for processing, but I also get to gate it before going to my main production branch. @Adi Polak, do you have some better ways please??