Saturday, June 11, 2011

Conceptual modeling your star schema


Due to its simplicity and its general appeal, the dimensional data model (aka star schema) has probably reached a standard-like level in terms of its usage and acceptability in the BI world.  Arguably, this modeling techniques is not ideal when one’s goal is merely to store and integrate a number of heterogeneous source of data.   However, when one’s goal is to report and analyze on these source of data, then yes Dimensional modeling is your safest bet.   The normally architected EDW platform should hold at minimum three layers, each one having clear specifications and goals.   Ideally Dimensional modeling is restricted to layers closer to end-users :

Layers Dimension modeling  applicability
Staging area No –>  persistent & irrelevant.  Use same data model as source (typically 3NF)
Data Integration No –> lack of traceability and creates a dependency on what is now considered a myth:  single version of the truth.  Dimensional modeling applies business rules and transformation far too early to apply it onto this EDW layer.
Data Presentation / Datamart Yes –> Perfectly adapted as it exposes simple & understandable views well suited to different departmental business views

This post is not about the details of each layer and their recommended modeling approach, but rather presents a conceptual modeling technique that I often used in real world project as a tool for validating Datamart models with business stakeholder.
This technique is inspired by a representation methodology from the ad-hoc model tool called Dimensional Fact Model, and is well adapted for our presentation layers of the EDW.   This  dimensional fact model methodology was originally created by Matteo Golfarelli and Stefano Rizzi.   They came up with a model tool offering an easy and convenient way of communication between business stakeholders and IT technical teams.  Its main advantages include:
    • Specially aimed for multi-dimensional modeling
    • Simple graphical form showing constraints & functional dependencies
    • Recognized to be readable by non-expert
    • Ideal to support the dialogue between end-users and designer during design phase
I don’t usually make use of the full DFM representation, but rather on a set of limited basic concepts:
· Facts (the transaction, event, recording snapshot, .. ) occurring within the enterprise
· Measures: numerical property of a fact describing its quantitative aspect
· Dimension: axis of analyses of the fact that has a finite domain (property analyzed with the term : “By”)
· Dimension attribute: a dimension property with a finite domain and being part of at least one Hierarchy (as opposed to descriptive attributes)
· Hierarchy: a directed tree with nodes corresponding to dimension attribute and arcs to the many-to-one association between them
· Descriptive attribute: a dimension property not part of any Hierarchy, and typically being one-to-one with the dimension lowest members (can be amount-based or yet descriptive base)

The full DFM technique has richer and more complex elements, but not all of them are relevant in a ROLAP physical implementation (see link given above for more details).
Note that these model only present a Conceptual view of the date.  Each view normally represents one subject area where similar facts are grouped together.   Let’s also note that, these are not describing any logical/physical data model.   It may turn out that, during implementation, more than one physical star schema would be necessary to represent one Conceptual Design, or conversely, one physical star schemas may be used for more than one Conceptual Design. The role of these Conceptual Designs is to get the Business sign-off which is independent on the physical details implementation choice.

Below is a generic model diagram depicting the main concepts that I rely upon for my projects:

Design-Generic_v0.1

Martin