Context
This is my tentative to model an EDW Integrated layer leveraging the Data Vault approach. This is done for the industry I know best, i.e. the Mobile Communication industry. But first, here are a few remarks about the context of this doc:- DV modelling is often applied for the auditability of OLTP source systems, where one wishes to produce a system of record to guarantee full traceability, completeness and historical tracking of the data. I’m not interested in these goals for which there already exist tools/methods tp automatically populate what is often called Operational Datastores.
- Most transactional systems in big organisation (e.g. Billing, CRM, ERP, etc.) are overly complex- This is partly due to satisfying contradicting goals : being generic while providing detail completeness. And typically in BI, only a subset of their data contents is worthwhile for building the EDW.
- My main interest is Data Integration: how various data elements generated by business processes can be integrated into one common EDW layer using DV modelling approach independently of the OLTP data sources. This is in agreeing that EDW Integrated Layer should not contain any form of data transformation, cleansing, calculation and derivation- We leave these business rules transformation to downstream Layers closer to business user (this is not to say that EDW layer is simply storing data as-is without some form of integration).
- The focus is on the Business Entities stored as Hub along their natural keys, the Transaction/Association forming the Link. There is not much effort spent on the descriptive data content (i.e. the Satellite).
- The data model is much simplified without much concerns in terms of technical implementation details (more functional oriented than technical). Satellites data is just provided as examples to help better understanding the semantic of each model entities.
- The data model serves more as a way to provide validation of the DV modelling approach when used in the context of building an Integrated layer of EDW.
In this respect, I favourably adopt the position articulated by DataVault academy as shown in this training video or this high-level methodology guide. In summary, we should not only look at source system data/metadata for EDW modelling, but also look at the business side with aspects like :
- Business processes
- Enterprise initiatives (logical/semantic models, utilised datamart, ..)
- Business interviews and session
- Industry models
- Taxonomies
Modelling Methodology
1. Business Entity Identification and Modelling
- What are the business entities (core concepts like master data, events, or even transactions)
- Establish the EWBK -Enterprise Wide Business Keys- recognized by all business stakeholders. These are critical in integrating multiple data sources into a common EDW. Integration often requires the use of higher-level keys and not simply raw key found in the different OLTPs.
- Model the Hub around these keys
2. Business Relation Identification and Modelling
- What are the natural relationships between entities
- Analyze relationships involved during application of business process. These may even involve multiple atomic transactions (long transaction) often referred to as Unit of Work with multiple data sources (as opposed to a single-source transactional UoW).
- Model the Link around entities involved in these relationship
3. Context identification and Modelling
- What are the descriptive and context data associated to each Entity and Relationship
- Model the Satellite around these descriptive data using your choice of Satellite design criteria
In summary, whatever perspective you adopt, DV model should aim for this clear and simple objective :
"Separate business keys from relationships and from contextual data"I don't remember where I've read that statement but I find it elegant and quite coherent to what the goal of DV modelling is in comparison to other modelling approaches like Dimensional modelling.
Before explaining the end-result model, here's how it looks :
Fig 1. View using an organic layout of the overall DV model example (blue, yellow and red correspond to Hubs, Links and Satellite respectively ) |
You can also refer to the original PNG-file for all details.
The diagram has been formatted organically to better visualize the role of each entity type. Satellites are surrounding their Hubs/Links while Links tie these clusters together. Exception to that is the hanging cluster with missing Links. These corresponds to Hub_service and Hub_contract, and analysing additional business processes would be necessary to correct this situation.
Modification
My notes are often written for archival purposes without focusing too much on readability. As more people gets redirected here, it seems I should do a bit of effort in putting these long notes into smaller and more digest form.
So you can follow the next part here: Data Vault model: Mobile Telecom example part-2.
Martin
No comments:
Post a Comment