Friday, February 21, 2003

Data Warehouse

My professional work and interest have long been involving the creation and modeling of very large databases designed for reporting and analytical needs called, data warehouse. These systems also referred to as OLAP or On-line Analytical Processing (a term coined after its couterpart called OLTP, On-Line Transaction Processing in which the data is volatile and transaction-based) can be implemented in multiple ways:
  • ROLAP which uses a relational engine (exactly like DBMS used in OLTP, e.g. Oracle)
  • MOLAP using a multi-dimensional engine (e.g. Cognos PowerPlay)
  • HOLAP which is a combination of the two approach often keeping the atomic fact data in the relational database and build the aggregation for each dimension inside a MOLAP cube (e.g. Microsoft Analysis Services)

These systems complexity arise mostly from its data source characteristics (high volume, high heterogeneous, medium quality, etc.), from its performance expectation (should report millions of facts in split second! ), from its multi-department initiative and often misaligned requirement (in the case of corporate warehouse), from its challenging selling point (supposedly can report on any data produced within the company), and from its vast technology offer on the market.

For those interested more on this subject, I would recommend books from Ralph Kimball who has developed some interesting theory taking very pragmatic approach.

In the next few notes, I'll be sharing some Guidelines I've written after experimenting with modeling real-world datawarehouse using the relational paradigm (i.e. ROLAP). These notes will give unstructured tips and hints for building dimensional model into relational database. Although my notes are based on real experience done on projects implemented in Oracle, they are far from being exhaustive.

Martin

No comments: