Thursday, March 06, 2003

Data Warehouse Modeling guideline Part 2: Fact

Here's the second part of my small ad-hoc modeling guideline applicable to data warehouse using relational database.

Fact Entity

1. Definition of the type of fact
1.1. Never mix two types of facts into the same fact table (instantaneous event-based vs. interval or end of period snapshot-based). Exceptions to this rule are frequent in the aggregation layer (see rule in Aggregation section).

1.2. Dimension definition must restrict the use of this dimension to fact of same nature (e.g. a dimension defining transaction type must not be tied into snapshot fact type).

1.3. (1.2 continued) The nature of the dimension must be clear enough, in other words its definition should not depend on the fact it is tied to.



2. Definition of Measures

2.1. Store an attribute as measure into the fact table only when this detail info is necessary (e.g. for legal issues) but no end-user query will use this as constraint.

2.2. Every limited value space attribute should be designed as dimension and not measure.

2.3. Try avoiding the use of intelligent measure taking a contextual meaning linked to some business rule. All measures should be modeled denormalised even if this implies space wastage.

2.4. (2.3 continued) Because these are usually encapsulated into "hidden flag", it is subject to misunderstanding and misuse in the report.


3. Dimension choice to tie with fact

3.1. All dimensions that qualify the facts should be present in order to avoid any uncertainty in one fact (the goal is to have a fully qualified fact). When a single amount is described only partly by one or many dimensions, it must be corrected by adding every new dimension that will fully qualify the amount.

3.2. Limit the number of dimension for the first implementation of a star (especially for large fact table), so it can evolve with time.

3.3. Try not to tie dimensions that are not directly a characteristic of the fact (e.g. derived dimension dependent on different relationship, subscriber having a has-a relationship with other dimension), these relationships should be modeled in the aggregation strategy.



4. Snapshot-based fact

4.1. Although it seems possible to mix two interval periods (e.g. weeks, months) into the same fact, this will always result into difficulty with future requirements implying usually the addition of new facts. Note that this apply only when the same Date hierarchy is used for all intervals period, meaning that end of period that coincides on the same day will use the same surrogate key.


5. Exceptions in fact

5.1. Exception handling for a fact is necessary whenever the fact is not applicable throughout the dimension space. That is handled with foreign keys (usually negative keys) pointing to an exception for all the irrelevant dimensions.

5.2. (5.1 continued) The exceptions applied to the entire fact (fact-scope) require specific exceptions values stored over the whole hierarchy dimension structure.

5.3. (5.1 continued) These fact-scope exceptions are commonly of the following nature: 1- Error, 2- Unknown, 3- Not Applicable, 4-To be defined, 5- No history, 6- No reference, etc.

Martin

No comments: