Tuesday, March 11, 2003

Data Warehouse Modeling guideline Part 3: Aggregate Data Modeling

Here's the third and final part of my guideline, ....this time I'm covering very briefly a complex subject : the aggregation strategy.


General

1. A first level of aggregation should be built on top of the atomic database level, mainly for 1) boosting query performance done on atomic data, 2) for storing pre-calculate KPI which are usually aggregated data, and optionally to archiving historical data not kept at atomic level.

2. The first level aggregation is best when in the same infrastructure as the atomic level (i.e. relational model). It can be accompanied with a second-level which uses OLAP independent Cube technologies.


Dimension Entity

1. Two roles dimension

1.1. In the context of a migration count aggregation, only the dimension of interest must have the dual roles (Origin and Destination), all other dimension functionally dependent on the migration i.e. are modified after the migration, must be defined and fixed as either Origin or Destination.



Fact Entity

1. Definition of the type of fact

1.1. In aggregation the rule of not mixing two types of facts into the table (instantaneous event-based vs. interval or end of period snapshot-based) can be violated. This is because aggregation usually needs to provide a wealth of information and because there is a need to limit the number of aggregate to build/maintain

1.2. (1.1 continued) The mix can only be possible under the condition that it does not affect the granularity. Granularity must always be fixed, i.e. a state-based EOP aggregation cannot accept event-based counts if these affect the subscriber-EOP by axis grain.

1.3. (1.1 continued) If any event measure affect the grain (may potentially require to have BOP by axis to be represented because number of subs that have cancelled some conditions may not exist in the original grain) then it can't be integrated. In this case, it is much more suited to use OLAP Cube which deal easily with the BOP-EOP comparison.

1.4. (1.1 continued) Furthermore, great care must be applied in doing this, because the occurrence of all axis is inconsistent, event should track occurrence of axis at the time of the event while EOP state number track them at the EOP snapshot time. This implies that computation between measures is not possible by axis.

2. Definition of Measures

2.1. In the aggregation, it is common practice to store explicitly independent measures (e.g # of active subs, # of ghost subs, or Nb of sms, Nb of normal calls) that correspond to various dimension condition. This helps decreasing the granularity of the aggregation table and ignores the dimensions only needed to identify the independent measures.

Martin

No comments: