Sunday, April 27, 2008

Decorating RPC server call

In RPC service layer, I often need special processing to occur before/after some of the call methods. For example, I may want to gather various access statistics related to some of the methods, compress the response when it exceeds some maximum threshold, or yet simply providing a cache response to speed things up!

So in essence, I need to intercept specific RPC calls to provide extra service to these calls and optionally even taking responsibility of handling the call without reaching the real service layer (e.g. as in the case of caching). The logical answer to this need would be to use a AOP solution (Aspect-Oriented Programming), however my service layer are already intercepted by some aspects to manage orthogonal concern such as transactional processing and security handling. And defining additional aspects around my service methods seem counter-intuitive especially for aspects that are relatively sparse (i.e. only apply to a few methods/service).

My solution to this problem is simply leveraging the Decorator design pattern. Conceptually, this pattern provides a way to attach additional responsibilities to object dynamically, i.e. here I could add auditing ability or caching ability to the service object. The next UML figure presents the class diagram involved in my solution. It is a slightly modified version of the classic pattern because both my concrete service object (to be wrapped by decorator) and my decorators inherit the same super class. This is more a convenience to centralize the various steps originating from GWT RPC handling mechanism with the template method pattern. The drawback is that all decorators have an annoying dependence on the GWT library. This is fine for my scenario, but I may have to refactor this in order to improving testability.








I’m providing the skeleton source code for the main classes involved, feel free to adapt them to your needs.
(no longer has access to my uploaded file, so I just removed these for now)


Martin

Thursday, April 24, 2008

BI and technology adoption

To follow-up on a question raised by someone at my consulting firm... why is BI group usually late adopter (compare to other group inside the enterprise) in terms of technology?

In fact, I've been asked this question by a manager, and at first, I did not exactly what to respond, actually my first reaction was to think that the big organizations were reluctant and hesitant to add more technology to a platform already loaded (or bloated?) with tools, difficult enough to manage and for which ROI is certainly not obvious to calculate.

After putting additional thoughts, I now feel that reasons are found inside and not outside BI. Traditionally the BI designer and architect come from a database background and culture, and these have a tendency to be more skeptical about new technology (they may well be right on that one!), whereas the general software development crowd usually come from computer science background very much inclined in looking into latest innovations ...


In general I have observed that it is among people running more operational systems, who are already used in adopting technologies, that will turn down proposal to connect to the BI platform using less traditional way. It is rather BI group itself which keeps proposing to send flat files to feed other external systems or give read-only access to their database (because these are familiar communication channel coming from the familiar ETL process). And when you propose to expose these information as web services instead: you get a funny look!

The outcome of this fact is that we often end up with BI platform that is very isolated (silo-type application) from the rest of the corporate integrated architecture. This penalizes the potential usefulness of BI... which could be leveraged better and exploit in a more transparent and integrated way into the existing corporate Business Process.


Hence there is no doubt that BI would gain by adopting technologies like web services, and architecture such as SOA, ESB? And there are a lot of initiatives along these lines... this is inevitable.


Martin

Saturday, April 12, 2008

Are we settled now?

All the places I once called home!

I was counting the number of places I lived in recently and this number seemed completely unreasonable... so I decided to blog about it mostly for my wife so she can see that it may well be the time to settle. Although things were quite stable at first (I actually lived for the first 18 years in a single place), it all changed once I moved out of my mom's place!

In case you are wondering, no I'm not 65 but only 36 years old...ok almost 37 now.

Here's the map:

View Larger Map


Here's the list:


  1. Chicoutimi: My first Home!
    Well almost the first, but I did move there at the age of 2 months-old...and left at the age of 19 years-old! Year: 1971-1990
  2. Regina : My first summer trip
    Spent about two months there to learn our national second official language! Year: 1989
  3. Vancouver
    Supposed to spend a full summer there with a friend, but we actually returned home prematurely ... I had to return right after we found our first job. Year: 1991
  4. Montreal: My first apartment
    2510 Bldv Édouard-Montpetit, Montréal. Did my first two years of University there..with two other good friends. Year:1990-91
  5. Worcester: WPI in the dorms
    I've completed my 3rd year of B.Eng. Year: 1992-93
  6. Montreal: Last apartment where I completed my B. at Ecole Polytechnique
    3210 Bldv Édouard-Montpetit Year:1993
  7. Montreal: First apartment with Kim
    2910 Av Linton Year: 1994
  8. Montreal: Second apartment with Kim
    2910 Av Linton Year: 1994
  9. Montreal: Third apartment with Kim
    Av Papineau Nice view of the park! Year: 1995
  10. Montreal: Fourth apartment with Kim
    Av Papineau Yes believe it or not, we moved next door to have an extra room, sigh!! Year: 1996
  11. Montreal: Fifth apartment with Kim
    6654 de Lanaudiere Well things are getting out of control..I have a solution! Year:1997
  12. Iasi: for Work
    I've been in and out of this place for work. It literally opened my eyes to the world, and I also met quite interesting and fun people. Year: About 4 months during 1998-1999
  13. Montreal: The first place we owned
    6554 rue de la Roche. Here's the solution: we bought a nice duplex...things should stabilize now, well that was my plan! Year: 1998-1997
  14. Tonbridge Wells: my sister place
    Stayed at my sister place for a couple of months waiting for my Swiss work permit Year: 2000
  15. Lausanne: Seventh place with Kim
    Saint-Roch. First apartment we rented in Lausanne. Small and..expensive (ok this is Switzerland) Year: 2000
  16. Lausanne: Eighth place with Kim
    Cite-Devant. Our second apartment in Lausanne, much nicer and right in the old part of Lausanne. This is where my daughter was born. Year: 2001-02
  17. Vaudreuil-sur-le-lac: Nineth place with Kim
    26 des Caryers. The house we bought. Year: 2003-2007
  18. Laval: Tenth place with Kim
    Du Bouvreuil. The place we rent for 6 months before going back to Europe. Year: 2007
  19. Epalinges: Eleventh place with Kim
    De la Jaquiere. This is our current location! Year: 2008-?

Martin

Tuesday, April 01, 2008

SQL Analytic


In my current consulting activity, I’m involved in a project relying on Oracle 10g for most of its data warehouse infrastructure. I thought it would be a nice opportunity for me to brush-up on my SQL skills.

The OLAP SQL functions (also referred to as Analytic Functions) come in very handy when dealing with data reporting or analytical needs. So this post will summarize the SQL analytic functions available in Oracle (some of them since 8i and other since 9i).

Although reports produced using these functions could very well be done using SQL or PL/SQL workarounds, these would be more clumsy and potentially less than optimal solution (depending on how well Oracle Optimize process these functions).


These functions can be roughly classified into these categories:

1. Ranking (show top 5 or bottom 5?),

2. Reporting Aggregation (compare values at different level of aggregation, e.g. region sales versus national, market share?),

3. Window Aggregation (moving average, cumulative sum?)

4. Row comparison (studying change or variation over sequential row, LAG/LEAD, or get first and last value of sorted groups with FIRST/LAST?)

It is important to note that these functions operate just before the ORDER BY clause of a query (which occurs last on the SQL process sequence order), or at the very last step when no such clause is present. Also important, these functions are not allowed within the WHERE, FROM and HAVING clauses, and are really meaningful within the result set of a query (i.e. in the SELECT clause) and more rarely in the ORDER BY clause.

The follow the following generic form:

function(<arguments>) OVER(<analytic clause>)

where

+function is the analytical function (e.g. AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE)

+arguments is optional and could include more than one (<argument1, argument2, argument3>)

+analytic clause can be of one the following: Query Partition clause (to break a result set into multiple partitions according to a criteria set expression); Order-By clause (for sorting data into each partition); and Windowing clause (to define a window of data).


1. Ranking functions

These functions role is to provide ordering info to records based on some given attribute. The basic three functions available (i.e. ROW_NUMBER, RANK and DENSE_RANK) only differ in the way they manage ordering ties. Here’s a very simple sample to illustrate the differences:


RANK and DENSE_RANK rules allow for the top-n query, which is not possible with ROW_NUMBER as it does not treat tie.


Note that we can also control the orderings of nulls value by specifying the self-explanatory clauses NULLS LAST or NULLS FIRST (default) after the “ORDER BY attr” clause.

Other functions can be classified within the Ranking category : PERCENT_RANK, CUME_DIST. They operate very similarly as the previous ones (e.x. PERCENT_RANK() OVER(ORDER BY attr), but compute the cumulative fraction typical of histograms.

In conclusion, the ranking functions offer a better alternative than relying on the well-known hack (creating a subselect virtual table with ORDER BY used in combination with the ROWNUM function) that can handle tie correctly.


2. Reporting Aggregation functions

Most common aggregation SQL functions (like SUM, COUNT, AVG, MAX, MIN, VARIANCE, etc) can be used as analytic functions as well, eliminating the constraint of having the aggregate value to be at the same level as the row set (mandatory when using GROUP BY). We can then have a global aggregated value presented with all data rows, as well as an aggregated value at a different partitioned window level all in a single query statement. Here’s a self-explanatory example with AVG:


Also belonging in this category is the RATIO_TO_REPORT which gives ratio of a value versus an aggregated comparator. Actually, this function is simply a shortcut version of : value / SUM(value) OVER() == RATIO_TO_REPORT (value) OVER().

We also include in this category the function ROLLUP and CUBE which both provide aggregated values at higher level (the difference being that CUBE will provide aggregate values throughout all combination of attributes) as illustrated by this simple example:


Notice the use of GROUPING function (which return 1 when aggregation was done over a specified attribute and 0 otherwise) to help differentiate between NULL values in the table and artificial NULL generated for reporting purposes.

Window Partitioning Operator

We often used in combination with the Ranking and Reporting aggregation functions, an operator known as the window partitioning operator. This operator role is to provide a way to separate data into grouping and perform the functions on these groupings. It relies on the use of PARTITION BY placed in the analytic clause. Here’s a simple example illustration the window partitioning used with the RANK() function:


We can see that the second RANK function now operates on a defined grouping (here defined by Grouping column) and not the overall set of rows.


3. Window Aggregation functions

These functions permit the calculation of moving average on a set of ordered data records. For example, to get a moving average using three data points window (one in the past, one current and one in future), we could use :

AVG(value) OVER(ORDER BY t ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

As found in the following sample:


Note that for missing values on the first and last record, the preceding and following records are considered NULL, and thus not used in the AVG function (see the first moving AVG value: 150 which is calculated from only 100 and 200).

This is a general rule applicable for treating NULL values when doing aggregation function, i.e. always ignored in the calculation.

The window aggregate function can also be used to calculate running or cumulative sum. This is possible using the ROWS UNBOUNDED PRECEDING and CURRENT ROW clauses for the window boundary definition. This gives:


As a final comment about this category of functions, let’s note that as in the case of ranking and report aggregation, it is possible to partition their calculation using the PARTITION BY clause.


4. Row Comparison

The row comparison functions are very useful for showing previous and subsequent records data on the same row as the current one within an ordered sequence of records. These functions are LAG and LEAD. These again are very simple to use as demonstrated in this small example:


The more generic format allow us to define offset and default values for NULL, i.e.: LAG(attr, offset, default_null).

Again, these functions can also be partitioned similarly as the previous ones. Anyone who has hacked a solution for this type of report by relying on self-join query can certainly appreciate the clearness and compactness of these functions.

Martin