2008-07-28

Google Knol - Information without semantics

So I saw gnol mentioned on identi.ca. I checked it out, but oh boy was I disappointed. Absolutely no semantics at all? Wikipedia is more semantic than that! Faviki is more semantic still. And freebase!

Unless they fix that I don't see how it will be succesfull in the the long term. Other "semanticer" things will simply crush it.


Official Google Blog: Encouraging people to contribute knowledge

2008-07-06

Using joins to find missing data and blobs

I just realized, after having read about column based storage, how potentially bad it is to have blobs in a metadata-rich table.

I am currently developing a small project during my sparetime, and we have a table which is roughly: id int, mimetype varchar, data blob, version int, and so on with a few more non-blob columns in it. What I noticed was that selects on this table that does not need the blobs using the indexed mimetype-field was incredibly slow. Relevant is that I am running MySQL in development mode, so it never seems to use more than a few tens of MB of RAM, meaning almost no caching is happening here. This means, in a row-storage database like MySQL I will be doing a tremendous amount of either seeking or reading of unneccesary data, depending on how MySQL plans its disk-reading, how big the blobs are and the effect of any read-ahead in OS or elsewhere.

The other part of this slow query is a outer join, something like this:

select id from c
left outer join b on b.c_id = c.id and b.name = "something"
where b.name is null

This is apparently also fairly slow. So don't do these things too much. In this case, the join to find if data is not in a joined table could be easily fixed by adding data to the db.

More funny java bugs

So, I get this simple assignment at work. Install a new version of our software on a brand spanking new Intel Xeon quad-core machine. Nice.

Only not so very nice when I notice hard java crashes. This is in EDU.oswego...ConcurrentHashMap's iterators hasNext method, always. I try downgrading to JDK1.5, but that does not help.

I might have to do yet another probing bug hunt in third-party software. Argh.


Also, trying out identi.ca even though blogging, and micro-blogging especially is not really my thing. Finding it fairly good though, just as ping.fm.