Thursday, February 28, 2013

Data Vault model: Mobile Telecom example (part-2)

This is the next part from a previous notes where I introduced "Data Vault model: Mobile Telecom example".  In this note, I discuss the model following each step of the methodology.  

Business Entities

Let’s start first with the identification of some of our key Business entities:

1. Customer

  • Definition: party that gets billed for the service/product offered by the Telecom
  • Characteristics
    • Business Key: BAN (billing account number)
    • History Evolution: quite stable as one BAN cannot change for the same account however, one subscriber can have different BAN when migrating account type (e.g. prepay to postpay).
    • Complexity: recursive relationship between BANs often needed to capture BAN hierarchies (BAN → parent BANs → root BAN) for large B2B customers.

2. Subscriber

  • Definition: individual attached to a particular subscription, i.e. MSISDN (mobile number that provides telecomunnication services).
  • Characteristics
    • Business Key: MSISDN, the telephone number given to the individual.
    • Key alternative: this Business key is not stable, as same subscriber typically changes MSISDN, so Billing systems usually provides an alternative key.  This alternate subscriber-ID must be used instead.
    • Complexity: Not all OLTPs have same identifiable key, any systems not so concerned with history usually tracks the subscriber with its MSISDN at given Timestamp, whereas others come up with more stable key like Subscriber-ID or technical key attached to the SIM card like IMSI.

3. Product

  • Definition:  In telco, the product does not correspond to your typical and tangible goods but rather associated to some networking features leveraged by your handset (e.g. SMS, MMS, call, voice mail, highspeed data, etc..).
  • Characteristics
    • Business Key: ProductCode (usually short text not so well standardized throughout the organization).
    • Complexity: Conformance and data integrity problems as each system may use its own, and usually involves some form of hierarchical representation.

4. Service

  • Definition:  Service is usually tightly coupled with the notion of product.  Subscribers purchase service (either inherited through some rating-plan or purchased individually) that include specific Product along with pricing and detail features.  (e.g. SMS option, Voice mail international, Local calls special rating, unlimited data volume, free calls number, etc..).
  • Characteristics
    • Business Key ServiceCode (again short text not so well standardized but mostly used by the Billing system).
    • Complexity: Usually involved some grouping or combination of more detail features related to network and Intelligent network layer (IN), and also linked to rating information used for billing purposes.

5. Handset

  • Definition:  Device through which the subscriber access network product.
  • Characteristics
    • Business Key TAC (standard code describing handset model manufacturer etc.).  Not to be confused with IMEI which uniquely identifies each physical device (handset id).

6. Invoice

  • Definition:  Bill statement sent regularly to the Customer for related subscription.
  • Characteristics
    • Business Key Invoice number (used mostly by billing system).
    • Complexity: typically applicable for a subset of the entire customer population (postpay), as prepay is billed in realtime according to usage
There are obviously many more entities that quickly come to mind (e.g. Sales, Call-Center, Agent, Contact, Campaign, Port-in/out, Visits, etc..), but let’s stop here for the sake of simplicity.

    Business Relationship Entities

    Again, let’s give out a few examples of relationships that should cover some different modeling scenario.  Telco are very complex organization with a large number of heterogenous systems, so these example are clearly over-simplified:

    1. Customer hierarchy

    • Definition: Large B2B customers have usually organizational hierarchy struture for managing employee payment responsibilities and complexity
    • Characteristics
      • Data Model name: LNK_CUSTOMER_HIER
      • Natural recursive relationship: Each Account is associated to a direct parent Account (a single node) until we reach the top account or Root BAN.  (BAN → parent BANs → .. → root BAN).
      • Links modelling: these recursive relationship are best modelled with a parent_key FK’s pointing back on same table.  However in DV, linkage structure stands on its own, so these Links must be stored separately.  The Child_key acts like the driving key of the relationship (is used to determine the end-dating of the link).

    2. Subscriber RatingPlan

    • Definition: The subscription is associated to some form of rating plan.
    • Characteristics
      • Data Model name: LNK_SUBS_RPLAN
      • Natural relationship: Any subscriber must have one and only one RatingPlan at any point in time.  This plan specifies the exact rating of all billable communication items.
      • Links modelling:  this is modelled using a typical Link structure between Subscriber & RatingPlan accompanied with lifespan From/To Date in Satellite (see discussion “Relationship of limited lifespan” ).

    3. Billing Invoice statement & Charge item

    • Definition:   Billing statement is invoiced to each customer account at each end of bill cycle.
    • Characteristics
      • Data Model name: LNK_INVOICE & LNK_INVOICE_RECHARGE
      • Natural relationship:  The Link relates Bill Statement with the customer, and also with the 1-to-many Subscribers attached to this Customer account.    There is also a Link which itemizes each charge by Product (and depending on RatingPlan) associated to one Invoice (LNK_INVOICE).   Things may vary here, but let’s keep it simple..
      • Links modelling:  The relationship are regular Link between Customer, Bill-Statement and Subscriber for the Invoice.  And for Invoice-charge, we tie it to the Product (item being billed).

    4. Recharge Activity

    • Definition:  Prepay subscriber must recharge (top-up) their balance as they go.
    • Characteristics
      • Data Model name: LNK_RECHARGE
      • Natural relationship:  This recharge activity is associated to various channel and payment method.
      • Links modelling:  A simple Link structure with FKs referring to Subscriber and Dealers (i.e. the channel through which the subscriber purchased new credit).

    5. Network Event (calls, sms, voicemail, internet)

    • Definition:  Transactions corresponding to network events which are consumed by the telco subscriber.
    • Characteristics
      • Data Model name: LNK_NETWORK_EVENT
      • Natural relationship Depending on the communication product (voice calls, sms, mms, gprs data, umts data, etc..), various entities are involved and can be linked.

      Contextual and Descriptive Data

      These are meant to convey the time-variant and historical part of each attribute captured by the EDW.   Modelling these as Satellites, we allow for timeline slicing and taking snapshot of how things (i.e. business state) were at that time.   The main advantage of Satellite, is we usually don't know in advance all features/attributes of interest needed... and even if you knew, things evolve at such pace that most likely you'll need to revisit them soon.    With Satellites, you can build them incrementally, bit by bit according to business needs and scope changes without affecting existing structure, as long as you keep them normalized and fine-grained.   No longer needed Satellite can simply stop being loaded, while new one get created.

      As my goal is not to capture a complete Mobile EDW,  I've only included a few Satellite here for
      illustration purposes.    These would be typical features/attributes associated to Hub/Link entities.  They are not meant to be exhaustive but rather convey the meaning of the different entities.

      The rest of the note is presented in the part-3, where some details and highlights are presented related to design decision and open issues that got raised during the modelling.

      See last part here: Data Vault model: Mobile Telecom example (part-3)


      Martin

      Data Vault model: Mobile Telecom example

      Context

      This is my tentative to model an EDW Integrated layer leveraging the Data Vault approach.   This is done for the industry I know best, i.e. the Mobile Communication industry.    But first, here are a few remarks about the context of this doc:
          • DV modelling is often applied for the auditability of OLTP source systems, where one wishes to produce a system of record to guarantee full traceability, completeness and historical tracking of the data.  I’m not interested in these goals for which there already exist tools/methods tp automatically populate what is often called Operational Datastores.
          • Most transactional systems in big organisation (e.g. Billing, CRM, ERP, etc.) are overly complex-  This is partly due to satisfying contradicting goals : being generic while providing detail completeness.  And typically in BI, only a subset of their data contents is worthwhile for building the EDW.
          • My main interest is Data Integration:  how various data elements generated by business processes can be integrated into one common EDW layer using DV modelling approach independently of the OLTP data sources.   This is in agreeing that EDW Integrated Layer should not contain any form of data transformation, cleansing, calculation and derivation-   We leave these business rules transformation to downstream Layers closer to business user  (this is not to say that EDW layer is simply storing data as-is without some form of integration).
          • The focus is on the Business Entities stored as Hub along their natural keys, the Transaction/Association forming the Link.   There is not much effort spent on the descriptive data content (i.e. the Satellite).
          • The data model is much simplified without much concerns in terms of technical implementation details (more functional oriented than technical).   Satellites data is just provided as examples to help better understanding the semantic of each model entities.
          • The data model serves more as a way to provide validation of the DV modelling approach when used in the context of building an Integrated layer of EDW.
      So the presented model is not based on reverse engineering of existing OLTPs, but rather aligned on functional analysis of business processes typical of Mobile communication organization.    I was arguing, in previous post, that not much integration is possible when DV modelling is simply used to store the OLTPs source data (100% of data 100% of the time).  This is a recognized issue in the context of the overall DW architecture, as people advocates the use of multi-layers Data Vault.  For the first level, we would then talk in terms of the Raw Vault, and for the second layer we would talk in terms of Business Data Vault.  This is broadly speaking and there is still confusion and misalignment about the exact terms and meaning (as found here).

      In this respect, I favourably adopt the position articulated by DataVault academy as shown in this training video or this high-level methodology guide.   In summary,  we should not only look at source system data/metadata for EDW modelling, but also look at the business side with aspects like :
        • Business processes
        • Enterprise initiatives (logical/semantic models, utilised datamart, ..)
        • Business interviews and session
        • Industry models
        • Taxonomies

      Modelling Methodology

      1. Business Entity Identification and Modelling

      1. What are the business entities (core concepts like master data, events, or even transactions)
      2. Establish the EWBK -Enterprise Wide Business Keys- recognized by all business stakeholders. These are critical in integrating multiple data sources into a common EDW.  Integration often requires  the use of higher-level keys and not simply raw key found in the different OLTPs.
      3. Model the Hub around these keys 

      2. Business Relation Identification and Modelling

      1. What are the natural relationships between entities
      2. Analyze relationships involved during application of business process.  These may even involve multiple atomic transactions (long transaction) often referred to as Unit of Work with multiple data sources (as opposed to a single-source transactional UoW).
      3. Model the Link around entities involved in these relationship

      3. Context identification and Modelling

      1. What are the descriptive and context data associated to each Entity and Relationship
      2. Model the Satellite around these descriptive data using your choice of Satellite design criteria  

      In summary, whatever perspective you adopt, DV model should aim for this clear and simple objective :
      "Separate business keys from relationships and from contextual data"
      I don't remember where I've read that statement but I find it elegant and quite coherent to what the goal of DV modelling is in comparison to other modelling approaches like Dimensional modelling.

      Before explaining the end-result model, here's how it looks  :

      Fig 1. View using an organic layout of the overall DV model example 
      (blue, yellow and red correspond to Hubs, Links and Satellite respectively )

      You can also refer to the original PNG-file for all details.

      The diagram has been formatted organically to better visualize the role of each entity type.    Satellites are surrounding their Hubs/Links while Links tie these clusters together.  Exception to that is the hanging cluster with missing Links.  These corresponds to Hub_service and Hub_contract, and  analysing additional business processes would be necessary to correct this situation.

      Modification
      My notes are often written for archival purposes without focusing too much on readability.  As more people gets redirected here, it seems I should do a bit of effort in putting these long notes into smaller and more digest form.   

      So you can follow the next part here: Data Vault model: Mobile Telecom example part-2.

      Martin