Thursday, September 07, 2006

Handling persistence with ORM

In Java application development, using an Object Relational Mapper to connect to database typically will offer many advantages :

  • avoiding the necessity to code against lower level JDBC API
  • dealing with data persistence concern in a more transparently way more aligned with object oriented code paradigm
  • providing isolation to database vendor specifics allowing easy porting to a number of different DB backend
  • providing additional services support built-in such as connection pooling, caching, etc .
  • reducing the need to be highly skilled in SQL, although ignoring about relational concept and SQL is definitely not realistic.
  • writing less code

On the flip side, I've realized that there are drawbacks as well, such as:

  • providing least common denominator functionality to achieve DB neutrality
  • losing control of the SQL statetement automatically generated for us
  • some performance degradation, no matter what the tool vendor will pretend (ORM will always be one layer on top of JDBC...), however smart caching strategy can mitigate this
  • requiring additional knowledge of the ORM API (so less code to write but more library code to understand and make use of)
  • failing when the application use case is more focused on data reporting and aggregation of large data volume rather than on data entry transaction-based use case.

Typically on the last project I built using Hibernate, I've enjoyed spending more time on the design of a good domain model layer since I've spent less on the persistent logic concern. However, I discovered later through a more realistic usage and data volume test, that it suffered some nasty performance degradation in some specific use case that were not discovered through unit testing (unit testing is only concerned with functionality testing and not performance scaling issues).

Without going into details, the problem had to do with the number of round-trip Hibernate was triggering to fetch object data graph. I had designed some relation (1:1 or N:1) to be eager fetch (always fetch the related object) instead of using Lazy fetching strategy (fetch in database only when necessary). This was good in some scenario, since some data dependencies were always needed and this avoided a second database call to get this dependent object data. However when confronted with getting collection of data, the effect was actually a separate DB call for every single data elements within the collection. So getting a long list of N items resulted in N+1 DB call! Alternative solutions exist for this, but the recommendation is to model most (if not all) object relation using a Lazy strategy and adjust this default by specifying different fetch mode during run-time.

Bottom line, there is no magic bullet especially when it comes with database interaction. We need a good grasp in relation database concept in order to build application interacting with database, no matter what tools or framework you'll be working on.

Martin

No comments: