Monday, March 27, 2006

Data Mining has now its open interface: JDMAPI

I've always been interested in data mining because it mixes some advanced statistical or mathematical methods with complex data computation algorithm (typically developped for computer learning). On the negative side, its application may have a bad press (sometimes well deserved) because of potential abuse it can lead to... I will not dwell time on identity and privacy sensitive issues, but when the goal is respectful of people privacy's right, one can leverage data mining to bring normal analysis to a much higher level. This is achieved through data induction (let the data speak for itself...) as opposed to data deduction (deducts conclusion based on specific report produced) typically encountered in more classical BI application.

Data mining functionality is now built-in inside Oracle 10g through a standardized JSR (http://www.jcp.org/en/jsr/detail?id=73). Although this first effort is limited in scope, it will ensure application could be developed independent on vendor proprietary API. It is currently being continued through a more complete initiative (JSR-247, http://www.jcp.org/en/jsr/detail?id=247) which will bring more mining functionality and advanced algorithm. Most big players contribute to this standardization effort: Oracle, SAS, Hyperion, SPSS, SAP, IBM, etc.., but yes of course except Microsoft.

The standard API also offers extensibility (each vendor can provide additional functionality not explicitly defined within the standard) and also covers the use of Web Services which will ensure complete independence of platform and language implementation.

More info can be found by googling JDMAPI.

I'll try to analyse this API in more detail and let you know my discoveries...

Sunday, March 19, 2006

From ER to OO

In my previous posts I've been sharing knowledge valuable for people dealing with the technology underlying relational database management system (RDBMS for short). This technology is used to store literally any information held by organization. I've been dealing with this technology since about 1996, and still continue to do so mainly because of its ubiquity in IT world. Relation databases store information using set theory and implement transaction and concurrency control to handle a large number of simultaneous connection, and as such their scope are fairly limited (although most big players are trying to include more functionality and processing flexibility into their engine, e.g. Oracle experiencing with the inclusion of JRE within their database...).


After doing modeling and designing architecture for database for some time, I started designing and developing stuff in Object Oriented language environment (around 2002). At first this can be quite daunting with all the flexibility OO programming can offer... Compared to database modeling where you have a rigid framework and theory guiding your work, OO modeling seems more to stimulate your artistic and creativity ability than your analytical expertise.

To overcome this new paradigm, here's some pragmatic steps I did and applied in learning Java, free of charge (or almost):
  1. Getting and reading good reference documentation, such as the free resource from Bruce Eckel, Thinking in Java. This first step will only help you gaining some knowledge, but to be able to do it yourself in an elegant and flexible way you'll definitely need more experience. After some practice you'll seem to face recurrent problem over and over... this is where step 2 kicks in.

  2. Getting a good reference on Design Pattern, this will teach in developing code with better quality (from aspects such as flexibility, robustness, adaptive, less error prone, etc.) following pattern developed by experienced developer. A good introduction book would be Head First Design Pattern, but to get the real reference document you should go to Gang of Four.

  3. If you're still shy and afraid of downloading a free copy of Eclipse to experiment and code yourself (at this point maybe you should simply reconsider coding ;-), then what is still available to you are millions of lines of quality code (mostly in Java) available in the best open source project, more on that later. However, most likely you'll actually be programming your own stuff relying one or many open source components, at least that's how I did it.

Martin