2006-12-28

"Wikipedia Distance"

I found the website I was talking about earlier, its available here: http://www.omnipelagos.com. It seems very rough around the edges, in addition to having very outdated wikipedia content. I might just make my own version of that site soon, if I find the time for it.

Wikipedia seems incredibly well-linked, since everything seems to have a distance of 4 or 5. This means that measuring links qualitatively becomes important: a very simple first step to doing this is to start considering the language-links as having higher "connectedness" than random linked words in sentences. Next step could be simple template-based recognition of common high-valued links such as "X is a Y", "X is a sort of Y" and so on. Doing more than this though very quickly becomes an academic exercise in implementing general AI.

Oh, oh, this thing would be so much easier if wikipedia started implementing semantic tags a la semantic mediawiki and ontoworld

2006-05-16

Exploiting Wikipedia for AI purposes

Firstly, I'm pretty sure I've seen a reference to a service which could tell you the wikipedia distance between two articles, but I can't for the life of me find it again. If anyone knows what I'm talking about, please leave me a comment telling me what you know about this!

Secondly, I'm perplexed that nobody has yet parsed wikipedia and used it for common-sense or other AI-related tasks such as spreading activation. Think Conceptnet, Wordnet and OpenMind. They all try to build some form of graph between concepts, you could say, and that is exactly what Wikipedia is. Wikipedia currently has 1M+ articles, easily besting both Wordnet and Conceptnet in number of nodes. I'm convinced that the number of internal links also outnumber the others mentioned here, so I think Wikipedia could be a really useful AI resource. You could even start to follow external links to get an even finer-grained link between Wordnet-nodes, although I suspect that would help very little since I suspect very few Wordnet articles references the same external URLs.

If I had the opportunity, I would try to experiment around with these ideas, but unfortunately it seems I won't have the time for that, finishing up my thesis and then going straight to getting a job. Actually, I am already trying to get a job.

2006-04-01

Groovy is slow!

I have been trying to write a prototype for my thesis work in groovy, but I was thinking it seemed a bit slow. Sure, this is to be expected for such a dynamic language, and interpreted at that. But when I wrote a few microbenchmarks I noticed that interpretation didnt seem to be the culprit here.

First, I tried adding things to a hashmap. This did not reveal any big difference between java and groovy though, the difference were about 100% which I find acceptable, so I continued my search on towards creating objects, and here groovy seems to be adding some substantial overhead:


val = new Integer("0")
for(i in 0..<1000000){
Integer tmp2 = new Integer(43)
val += tmp2
}


versus:


val = new Integer("0")
Integer tmp2 = new Integer(43)
for(i in 0..<1000000){
val += tmp2
}


Java: 387ms, 206ms, Groovy: 29873ms, 4527ms respectively. Firstly, even when not creating Integers in the loop, groovy is still much much slower, but adding the object creation just makes it intolerably slow! This was precompiled with groovyc then run with the -server parameter to gain some speed.


I am reading in and parsing the database of Wordnet, and that alone takes some 400 seconds in the current version of my code, which I find a bit too much to sit around and wait for each time I restart my app! Yes, there are ways around this, and I will use them eventually (such as only reloading changed code and not restart the entire JVM, keeping the intialized structures in RAM and ready).

2006-02-22

Rm my Mac

I saw this page http://rm-my-mac.wideopenbsd.org/ which hands out free ssh accounts on a mac mini, for the purpose of hacking it. Not my favorite pastime, for sure, but I'm sure there are others interested out there.

2006-02-08

Thesis: Opennlp and groovy..

Just wanted to write about what my thesis ended up being about: Question classification. No chatbot, no Loebner contest I'm afraid. Seems I will use Opennlp and java, but the problem is I hate Java, I find it way too slow to prototype in. So I decided to use groovy. This did not turn out well, it took me about a day to figure out that groovys error-messages are far from perfect, leaving me to believe that class loading was all broken in groovy.

Anyway, both groovy and opennlp seems like great projects. Ive also downloaded the Stanford parser, might be useful.

2006-01-02

Piratpartiet

Piratpartiet:
Piratpartiet siktar på att ta en vågmästarroll efter valet 2006. Det finns mellan 800 000 och 1 100 000 aktiva fildelare i Sverige, och de är alla trötta på att kallas kriminella. Vi behöver ha 225 000 av dessa med oss för att komma förbi fyraprocentspärren och hamna i vågmästarroll. Den rollen tänker vi sedan använda till att avskaffa upphovsrätten.


Mycket intressant, och det kan bli så att jag röstar på detta parti men är inte helt övertygad ännu.


Filed in: , ,