https://lakefs.io/ logo
Title
a

Adi Polak

02/06/2023, 2:00 PM
Late-arriving data (sometimes referred to as dimensions) is an issue we often encounter in data warehouse solutions. The solution depends on the team, and there are many different strategies for tackling this. I read this article from 2021 that shows how to handle late-arriving dimensions while protecting data integrity with a few different design approaches . • Never Process Fact - essentially discarding the record from loading into the fact table • .Park and Retry - insert the unmatched transactional record into a landing table and retry in the next batch process to load data into the fact table. • Inferred Flag - using a flag to state that this dimension is not available. Have you dealt with late-arriving dimensions? How did you make it work for you?
r

Robin Moffatt

02/07/2023, 11:37 AM
that's a useful article - thanks for sharing
I wonder if late-arriving dimensions are so common if the data dependencies are understood correctly? As in, late-arriving facts will happen for lots of reasons (network problems, human problems, etc) but for a fact to record the dimension's key as part of its data suggests that the dimension's key is already known and should therefore be available to the DW system. Just thinking out loud…
a

Adi Polak

02/08/2023, 6:28 PM
It just might be the difference between DW and ETL on a data lake that is more open and chaotic.