So I saw gnol mentioned on identi.ca. I checked it out, but oh boy was I disappointed. Absolutely no semantics at all? Wikipedia is more semantic than that! Faviki is more semantic still. And freebase!
Unless they fix that I don't see how it will be succesfull in the the long term. Other "semanticer" things will simply crush it.
Official Google Blog: Encouraging people to contribute knowledge
2008-07-28
2008-07-06
Using joins to find missing data and blobs
I just realized, after having read about column based storage, how potentially bad it is to have blobs in a metadata-rich table.
I am currently developing a small project during my sparetime, and we have a table which is roughly: id int, mimetype varchar, data blob, version int, and so on with a few more non-blob columns in it. What I noticed was that selects on this table that does not need the blobs using the indexed mimetype-field was incredibly slow. Relevant is that I am running MySQL in development mode, so it never seems to use more than a few tens of MB of RAM, meaning almost no caching is happening here. This means, in a row-storage database like MySQL I will be doing a tremendous amount of either seeking or reading of unneccesary data, depending on how MySQL plans its disk-reading, how big the blobs are and the effect of any read-ahead in OS or elsewhere.
The other part of this slow query is a outer join, something like this:
select id from c
left outer join b on b.c_id = c.id and b.name = "something"
where b.name is null
This is apparently also fairly slow. So don't do these things too much. In this case, the join to find if data is not in a joined table could be easily fixed by adding data to the db.
I am currently developing a small project during my sparetime, and we have a table which is roughly: id int, mimetype varchar, data blob, version int, and so on with a few more non-blob columns in it. What I noticed was that selects on this table that does not need the blobs using the indexed mimetype-field was incredibly slow. Relevant is that I am running MySQL in development mode, so it never seems to use more than a few tens of MB of RAM, meaning almost no caching is happening here. This means, in a row-storage database like MySQL I will be doing a tremendous amount of either seeking or reading of unneccesary data, depending on how MySQL plans its disk-reading, how big the blobs are and the effect of any read-ahead in OS or elsewhere.
The other part of this slow query is a outer join, something like this:
select id from c
left outer join b on b.c_id = c.id and b.name = "something"
where b.name is null
This is apparently also fairly slow. So don't do these things too much. In this case, the join to find if data is not in a joined table could be easily fixed by adding data to the db.
More funny java bugs
So, I get this simple assignment at work. Install a new version of our software on a brand spanking new Intel Xeon quad-core machine. Nice.
Only not so very nice when I notice hard java crashes. This is in EDU.oswego...ConcurrentHashMap's iterators hasNext method, always. I try downgrading to JDK1.5, but that does not help.
I might have to do yet another probing bug hunt in third-party software. Argh.
Also, trying out identi.ca even though blogging, and micro-blogging especially is not really my thing. Finding it fairly good though, just as ping.fm.
Only not so very nice when I notice hard java crashes. This is in EDU.oswego...ConcurrentHashMap's iterators hasNext method, always. I try downgrading to JDK1.5, but that does not help.
I might have to do yet another probing bug hunt in third-party software. Argh.
Also, trying out identi.ca even though blogging, and micro-blogging especially is not really my thing. Finding it fairly good though, just as ping.fm.
Subscribe to:
Posts (Atom)