2007-02-14

Apache Derby versus HSQLDB

So I tried to use HSQLDB for a little hobby AI project of mine, that is, mining information from Wikipedia. Now HSQLDB is so fast it is just silly. Thats all good. Now, I wanted to store a histogram in a table because the number of items was too great to store in RAM/HashMap. I convert the code to use the DB instead of the HashMap, and all is fine. Now I want to see the resulting topmost entries in this histogram thing. So I do something along these lines:

SELECT TOP 10 * FROM histo ORDER BY cnt;

Does this work? No. OutOfMemoryError. Why?? It turns out that HSQLDB does not use indexes for ORDER BY, and hence it tries to build a temporary result that consists of the entire database. I had a look at the source, and I was determined to fix this, however ugly the solution would be.

But then I found Apache Derby, and it looks to be all that I want. It does not seem to be as fast as HSQLDB, but on the other hand I should be mostly I/O bound anyway since my databases will be many times my RAM in size. Also, Derby seems to excel at the embedded and PreparedStatement corner, and that is exactly what I'm doing. 100% of my recurring SQL statements are already Prepared.

HSQLDB is not a very young project, but I must say Derby seems to be about 10 times more mature to me.

HSQLDB 0, Apache Derby 1.

2 comments:

Samir Savla said...

I am not sure if you can use limit or top n in Derby yet. I am trying to use it too and am getting an exception too.

sandos said...

I am fairly sure limit etc should work fine in derby! What exception are you getting ?