2008-07-06

Using joins to find missing data and blobs

I just realized, after having read about column based storage, how potentially bad it is to have blobs in a metadata-rich table.

I am currently developing a small project during my sparetime, and we have a table which is roughly: id int, mimetype varchar, data blob, version int, and so on with a few more non-blob columns in it. What I noticed was that selects on this table that does not need the blobs using the indexed mimetype-field was incredibly slow. Relevant is that I am running MySQL in development mode, so it never seems to use more than a few tens of MB of RAM, meaning almost no caching is happening here. This means, in a row-storage database like MySQL I will be doing a tremendous amount of either seeking or reading of unneccesary data, depending on how MySQL plans its disk-reading, how big the blobs are and the effect of any read-ahead in OS or elsewhere.

The other part of this slow query is a outer join, something like this:

select id from c
left outer join b on b.c_id = c.id and b.name = "something"
where b.name is null

This is apparently also fairly slow. So don't do these things too much. In this case, the join to find if data is not in a joined table could be easily fixed by adding data to the db.

No comments: