2009-03-27

Using putty (win32) to connect to vino in ubuntu 8.10

Vino, it seems, has a bug in that it only listens on the localhost ipv6 address in ubuntu 8.10, ie. ::1:5900. I've been trying to connect to this using putty for some quite some time, and googling turns up a few threads and a bug in putty, ipv6-literals, that is not yet fixed even in the snapshots. I finally found the solution, almost all by myself, I should add an hostname to use instead of the numeric ip which has the problematic ":" in it. But actually, there already is in ubuntu: ip6-localhost. So adding the tunnel 4L5900:ip6-localhost:5900 works!

Oh, by the way: I did not want to disable ipv6 in Ubuntu since I occasionally use it.

2009-03-18

New sources of world knowledge

I am trying to develop a natural language parser using a "knowledge intensive" approach. For this I need knowledge for my application. I have a few plans for amassing it:



For pattern matching I will need to develop patterns. Others have already done this, but I believe I can do even better. In wikipedia one can successfully use links to other articles as tokens as to simplify the structure of patterns when developing/evolving ones pattern to match against. Or at least I hope so.

More Wikipedia statistics

Doing some statistics on the WEX datadump, and I am once again hitting memory limits. At least unless I filter uncommon stuff out ,which is entirely almost possible, given that you know every common word should appear a number of times if you take a chunk of wikipedia articles. Current stats are:

words 29657715 sentences: 1795296 articles: 13801 words unique: 1182046

Words are the number of whitespace-separated strings I've seen, sentences is sentences according to WEX, which by the way is way off for some articles such as Andre Agassi where it believes sentences ending with "No." means a new sentence when its Agassi's world ranking that comes next. Articles is self-explanatory, and unique words are unique words, this time with all character allowed but all lower-cased. So, from 14K articles the number of words are already more than 1M. This is very different from my older run which ignored anything that was not a-zA-Z, where it took a couple of hundred K articles to reach that number.

2009-03-04

Making Firefox 3 play CNN videos

The fix that worked for me was to add cnn.com to the whitelist in flashblock.

Others have mentioned other fixes, such as installing mediawrap, cleaning the cache, moving from adblock to adblock plus, using ietab, not using ietab, etc. For me it was a bit simpler.