sandos

2009-05-26

LuxRender performance on AWS

Just plyaing around with LuxRender on a small Blender 2.48 testscene. On my laptop, which has a Core 2 Duo 2,2Ghz, I get about 22k samples/s. Using a "High-CPU Extra Large Instance" AWS instance I get between 120 and 300k. Average seems to be about 10 times higher than for my laptop, which is nice enough. These cost 0.8$ per instance-hour, which is pretty decent for shorter jobs. I'm not sure about the bandwidth needed, luxrender does transmit the "film" every now and then but the interval can be tweaked.

Scaling this up by using more instances is also very easy.

2009-05-19

Lyrics for Fischerspooner - Megacolon

Just corrected a few errors in the version that are around on the net:

Document on Google docs

2009-05-06

How to add implicit imports to an embedded GroovyConsole

I wanted to have a graphical GroovyConsole that had my own APIs automatically imported. This proved to take some time to fix up. This is how it ended up looking. I created this in my normal eclipse-project so that I can easily launch this with the correct classpath already done for me.

TestGUI.java:

package groovy.ui;

import java.lang.reflect.Field;
import java.net.URL;
import groovy.lang.GroovyClassLoader;
import groovy.lang.GroovyShell;

public class TestGUI extends Console
{
    public static void main(String[] args)
    {
        TestGUI gui = new TestGUI();

        gui.run();
    }

    public TestGUI()
    {
        super();
        try
        {
            setup();
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

    public void setup() throws Exception
    {
        Field shell = 
           Console.class.getDeclaredField("shell");
        shell.setAccessible(true);
        shell.set(this, new OurGroovyShell());
    }
}

OurGroovyShell.java:

package groovy.ui;

import java.util.List;
import org.codehaus.groovy.control.CompilationFailedException;
import groovy.lang.GroovyShell;

public class OurGroovyShell extends GroovyShell
{
    public Object run(String scriptText, 
           String fileName, List list)
        throws CompilationFailedException
    {
        scriptText = "import whatever.*\n" + scriptText;

        String[] args = new String[list.size()];
        list.toArray(args);
        return run(scriptText, fileName, args);
    }
}

Fairly simple in the end. I was fooling around a bit to get this to work at first. This was done with Groovy 1.6.2.

2009-03-27

Using putty (win32) to connect to vino in ubuntu 8.10

Vino, it seems, has a bug in that it only listens on the localhost ipv6 address in ubuntu 8.10, ie. ::1:5900. I've been trying to connect to this using putty for some quite some time, and googling turns up a few threads and a bug in putty, ipv6-literals, that is not yet fixed even in the snapshots. I finally found the solution, almost all by myself, I should add an hostname to use instead of the numeric ip which has the problematic ":" in it. But actually, there already is in ubuntu: ip6-localhost. So adding the tunnel 4L5900:ip6-localhost:5900 works!

Oh, by the way: I did not want to disable ipv6 in Ubuntu since I occasionally use it.

2009-03-18

New sources of world knowledge

I am trying to develop a natural language parser using a "knowledge intensive" approach. For this I need knowledge for my application. I have a few plans for amassing it:

Various ontologies: Freebase, UMBEL, WordNet, FrameNet, VerbNet, DbPedia etc

Using pattern matching on Wikipedia articles to extract info

When trivially working, using the parser itself to extract more knowledge

For pattern matching I will need to develop patterns. Others have already done this, but I believe I can do even better. In wikipedia one can successfully use links to other articles as tokens as to simplify the structure of patterns when developing/evolving ones pattern to match against. Or at least I hope so.

More Wikipedia statistics

Doing some statistics on the WEX datadump, and I am once again hitting memory limits. At least unless I filter uncommon stuff out ,which is entirely almost possible, given that you know every common word should appear a number of times if you take a chunk of wikipedia articles. Current stats are:

words 29657715 sentences: 1795296 articles: 13801 words unique: 1182046

Words are the number of whitespace-separated strings I've seen, sentences is sentences according to WEX, which by the way is way off for some articles such as Andre Agassi where it believes sentences ending with "No." means a new sentence when its Agassi's world ranking that comes next. Articles is self-explanatory, and unique words are unique words, this time with all character allowed but all lower-cased. So, from 14K articles the number of words are already more than 1M. This is very different from my older run which ignored anything that was not a-zA-Z, where it took a couple of hundred K articles to reach that number.

2009-03-04

Making Firefox 3 play CNN videos

The fix that worked for me was to add cnn.com to the whitelist in flashblock.

Others have mentioned other fixes, such as installing mediawrap, cleaning the cache, moving from adblock to adblock plus, using ietab, not using ietab, etc. For me it was a bit simpler.

2009-01-08

Official Google Reader Blog: Square is the new round.

I have to say the new Google Reader look is way better than before. I always hated round corners, too! I normally use the microfirefox theme in FF, so that tells you how anal I am about screenspace.

Official Google Reader Blog: Square is the new round.

2008-12-18

The best way to browser Flickr to date

This must be the way to browse Flickr lazily: tag browser. That is the tag browser I've been waiting for, basically. I would guess they are using something like DbPedia?

Novint Falcon

This thing seems really cool. Not so much as a game controller, since I barely play games, but rather as a simple CNC machine ;)

Imagine building a foam sculpter or a 3D scanner from this. Should be easy, especially the scanner since there are buttons on the attachments already, atleast if you dare modifying the official attachments. Apparently they have PICs in them, too, so there might be a real protocol between them and the falcon making building your own attachments that much more of a challenge.

2008-10-22

Android Developers Blog: Android is now Open Source

So, the andorod source was released. That is awesome, and I never doubted it would happen. On the other hand, symbian is also something I would like to see the source of, with me using it every day! I read this morning that the source would cost money to get at, which is really sad. This is what I found:

"The Symbian Foundation platform will be available to members under a royalty-free license during first half 2009. The Foundation will provide, manage and unify the platform, ultimately releasing it as open source. "

Hopefully this is just a temporary measure, as I understand releasing it in the end means I can look at it for free! We can hope anyway...

2008-08-19

Android signing signs..

I hope this is how signing will be done even in Android devices:

There are no requirements on the key used to sign .apk files;  locally-generated and self-signed keys are allowed.  There is no PKI, and developers will not be required to purchase certificates, or similar.   For developers who use the Eclipse/ADT plugin, application signing will be largely automatic.  Developers who do not use Eclipse/ADT can use the standard Java jarsigner tool to sign .apk files.

Taken from http://code.google.com/android/RELEASENOTES.html. This is the major problem with J2ME in my opinion: no way for developers to self-sign stuff.

2008-07-28

Google Knol - Information without semantics

So I saw gnol mentioned on identi.ca. I checked it out, but oh boy was I disappointed. Absolutely no semantics at all? Wikipedia is more semantic than that! Faviki is more semantic still. And freebase!

Unless they fix that I don't see how it will be succesfull in the the long term. Other "semanticer" things will simply crush it.

Official Google Blog: Encouraging people to contribute knowledge

2008-07-06

Using joins to find missing data and blobs

I just realized, after having read about column based storage, how potentially bad it is to have blobs in a metadata-rich table.

I am currently developing a small project during my sparetime, and we have a table which is roughly: id int, mimetype varchar, data blob, version int, and so on with a few more non-blob columns in it. What I noticed was that selects on this table that does not need the blobs using the indexed mimetype-field was incredibly slow. Relevant is that I am running MySQL in development mode, so it never seems to use more than a few tens of MB of RAM, meaning almost no caching is happening here. This means, in a row-storage database like MySQL I will be doing a tremendous amount of either seeking or reading of unneccesary data, depending on how MySQL plans its disk-reading, how big the blobs are and the effect of any read-ahead in OS or elsewhere.

The other part of this slow query is a outer join, something like this:

select id from c
left outer join b on b.c_id = c.id and b.name = "something"
where b.name is null

This is apparently also fairly slow. So don't do these things too much. In this case, the join to find if data is not in a joined table could be easily fixed by adding data to the db.

More funny java bugs

So, I get this simple assignment at work. Install a new version of our software on a brand spanking new Intel Xeon quad-core machine. Nice.

Only not so very nice when I notice hard java crashes. This is in EDU.oswego...ConcurrentHashMap's iterators hasNext method, always. I try downgrading to JDK1.5, but that does not help.

I might have to do yet another probing bug hunt in third-party software. Argh.

Also, trying out identi.ca even though blogging, and micro-blogging especially is not really my thing. Finding it fairly good though, just as ping.fm.

2008-06-10

HTMLUnit, MultiThreadedHttpConnectionManager and memory leaks

I have been having this wonderful time at my work. A small coding project involving use of mostly HtmlUnit was almost done, and working properly. But what happens? By chance I notice that it is leaking memory: Perm Gen space, even.

This was coded as a plugin for a larger product, and was dynamically reloaded at every invocation. I had to first remake the plugin CassLoader in this case to become a post-delegating classloader so that I could override the version of HttpClient already in the product. I couldn't really be sure it was not my changes leading to the classloader that gave rise to the leak, but eventually I got to that conclusion. The next step was to narrow down where this happened. Long story short, I found that if I changed HtmlUnit to not use the MultiThreadedHttpConnectionManager from HttpClient, it did not leak. I did not want to really do this though, being unsure of how HtmlUnit actually used this, and also because of the fact that we have multiple threads using HtmlUnit.

The thing that solved the issue was to call shutdownAll in the connection manager. I am not allowed to access that from my code as a user of HtmlUnit though, and I did want to avoid having to recompile anything, so I used reflection to subvert the access checks. Calling shutdown on the one manager did not work, however, nor did closing the connection, which HtmlUnit already did by the way.

I can only assume this is some obscure bug that nobody else ever trips, but now at least if somebody does, they might find this as a reference.

I could not use the latest HtmlUnit because of needing JDK1.4 compatibility, so this was done in HtmlUnit 1.13. Oh, and 1.14 needed CSS stuff that clashed with regular DOM libraries, making classloading not work. Not sure why this does not work when I can safely override HttpClient with a newer version.

2008-06-06

Semantic Web is freaking cool! (and on a roll it seems)

Been surfing around for semantic web websites to find ontologies or datasources to (ab)use for Yet Another AI-Project From Me. This is what I found:

True Knowledge. Incredibly cool question-answering frontend to an incredibly complex datamodel, with a moderately complex and severely boring input process. Not free data, I can not download a dump of their database.

Freebase. Took me some time to dig into this, actually, but I like what I am seeing. Data model seems a lot simpler than True Knowledge, or at least that is what i think (subclassing, transitivity missing?). Inputting stuff is from 2-10 times quicker/easier. For bulk stuff it is infinitely easier, since TK does not support that at all. Free data, but not RDF!

Faviki. Very nice and easy to use semantic social tagging/bookmarking service.

RDFScape. Visualizer for cytoscape. Very nice, have not had time to play with this yet.

Attempto Controlled English. Maybe the least exciting of the bunch, but is useful for my NLP-related project.

I also got access to twine. Oh my god what a bore. I just did not see the idea behind it, and the interface turned me off so much that after my third visit I never came back.

True Knowledge has some awesome NLP parsing going on, but it also fails miserably often. I have a simpe idea to get me atleast started, it pretty much builds upon AIML/patterns to extract meaning from stuff, specifically Wikipedia.

Freebase has a weak model in my mind, there does not seem to be a real inheritance hierarchy and the "upper ontology" is basically missing. The upper ontology not being there is not such a big deal though, I think. There should be a set of "uppermost" classes in Freebase that can be mapped to SUMO/YAGO/DBPedia/Wordnet or whatever to help with any inferencing/analogous thinking.

I can not help but think that in five years from now, "semantic" does not really exist. Everything is then semantic, or gone since long. AGI is not far behind, either. I predict a surge of NLP success in the coming few years, mainly with knowledge-intensive approaches. Common-sense is still the missing piece of the puzzle, the above efforts do not concentrate on this at all, but rather on knowledge that is useful to humans. Remember, common-sense is boring for humans to input and administer, since it is all so basic.

2008-04-28

Nokia E51 stability

I love my phone for several reasons. It is just the right size, fits very nicely in jeans-pockets without wearing them down in no time. My Motorola E398 was much worse in this regard due to being thicker.

I also love how it handles most J2ME apps, has email and so on. I've started using it during my commute every day to surf on my laptop, via bluetooth mostly since USB-cables are not that much fun carrying around. The phone is mostly stable. I've encountered crashes when for example running many apps and playing mp3s at the same time. Nicely enough it reboots automatically most of the time. I think I've found what it does not handle so well though: bluetooth internet sharing. Firstly it gets very hot, thats fine though I guess. But it also seems to become very unstable if I try to actually use it during this time.

Too bad, but I'm not surprised. I think that if I just don't touch anything while using it as a mode, it copes better.

2008-01-29

"not in" versus joins

I have been fighting this HQL query for probably 8 hours in total now, where I have a table Entity and EntityAttribute. Every entity has zero or more attributes, so the EntityAttribute table has Entity_ID row in it. Attributes have a name and value, each in its own column. EntityAttribute is mapped as a map collection from Entity, with the "name" as the index. I want to select entity based on whether they do or do not have an attribute with a particular name and value, although I know in practice that those who attributes with the right name do have the rigth value always, at the moment anyway.

The end of the HQL looks something like this:


SELET DISTINCT e FROM entity
WHERE ...
... AND
'type' in indices(e.attributes)

'attributes ' is the collection of attributes.

I would have guessed this to work, but no. I eventually tried this in plain SQL, where it of course also does not work.

I also tried with this, for the other way around (exclusion):


SELET DISTINCT e FROM entity
WHERE ...
... AND
e.attributes['type'] != 'animation'

I can sort of understand this last construct being wrong. The correct way to do things is apparently to swap these two, basically use "not in" for exclusion and use "join" to include things, which is what e.attributes['type'] = 'animation' uses.

2007-12-05

Domain Model - A Tale of Bad (J2EE) Design
(And a newbie developer trying to fix it)

Where I work we have a product that we can call ABC. Now ABC is what I would call "legacy": more than 10k loc (I think it is somewhere around 50-100k), and the project was started more than 5 years ago, in 2001 I think. I have been working on this for about 16 months now, and I am getting more and more comfortable with it every day due to actually getting assigned to develop it at the moment. I am the only developer since a couple of months back.

ABC is in need of a lot of things, this I saw the first week on the job. So, many obvious things are wrong. Where to start? Well, our "service layer" (Note: we do not have clearly defined layers, and nobody seems to have known what to call bigger parts of the system, so if I could say "hey, we have a problem with our service layer" nobody would understand what I was talking about) has code in it that is very, very long-winded, some made up code:


dto.Part p = DAOFactory.getPartsDAO().getPartById(new Integer(1));
validatePart(p);
if(p.getInventoryID() != null) {
 dto.Inventory inv = DAOFactory.getInventoryDAO().getInvetoryById(p.getInventoryID());

 //Do more...
}

Hey, we use a factory! Great. But, this code does very little for being so verbose, and note that this sort of code could go on for hundreds of lines. Can we see spot any problems? I can, at least now, previously I really could not. I am not a very experienced J2EE developer, this is my first job with it fresh from university:

I hate the fact that we have to explicitly do "DAO things". Why is this necessary? It is not, if we for example use hibernate a bit more correctly: ABC is currently not using any mappings between entities for example. Adding this, we could do away with fetching one object in a transaction and then accessing the collection mappings instead of all the "DAOing" as I call it.
^{(This DAO-ing also has severe performance penalties when done this way: it has a select n+1 problem. It can be solved by being "smart", but takes a lot of manual coding, for something that hibernate solves perfectly for us, practically for free. Logic is honestly more compact in a HQL query than in this sort of convoluted code. 20-lines of HQL did in some instances replace 100+ lines of Java for us, especially manually fetching objects, checking them instead of simply using a WHERE clause.)}

We are fetching dtos from a database, by the looks of it. Is this really the intended use for DTO pattern? No (as far as I know, anyway), and it turns out these are not DTOs, firstly, and secondly, they should not be. Our "DTO layer" here is misnamed, and misguided. It is actually our domain model, although an almost completely anemic such^[1]. We do in fact have another DTO layer, that is really a DTO layer, and which is needed and should be a DTO layer. This is used for sending data from webservices, and decoupling this from our domain model is on the contrary not a bad idea.

So we have a, more or less, procedural service layer with 95% of our business logic in it, and a domain model layer with data and only trivial behavior in it. It seems straightforward enough to just move behavior into our domain model, and be done with it. One slight problem here is that I have introduced alternative "DTOs" (entity beans, domain objects) that are properly mapped using hibernate, and adding _Mapped to their classname, inheriting from its non-collection-mapped hibernate class. For example we might have:

Part has a subclass PartMapped. PartMapped has collection mappings, Part does not.

It can also look like so: We have PartBase, PartMapped and Part. Part is the "original" DTO, with a foreign key in the form of a Integer. We don't really want this integer in our hibernate-collection-mapped variant, so we create a base class which does not have it, and extend that while adding the collections for hibernate to populate.

Modifying the domain model to fit hibernate is a bad idea, I realize that. We need to stop depending on hibernate. I have been looking at spring lately (which ABC of course does not use) and I think it could help with a lot of this. One reason for having alternate classes for collection-mapped entities was that I do not yet trust hibernate to actually save those back: we still always use the simple save() call for that on a entity with no collection mappings. Makes it very explicit what is happening. This is simply a matter of education though, I need to learn how hibernate works in this regard and, say, filtered collection.

This inheritance structure makes it not hard, far from, but a bit smelly to add behaviour to these objects. Mostly due to the structure being hibernate induced, but also because of the duplication arising from that: We would need two different implementations of a "public Inventory getInventory()" for a Part and a PartMapped, one using straight accessors (hibernate needs objects to use the accessors for its collection proxies to work) and the other one using old-style DAO-ing to fetch the object. We have a lot of these methods to implement if we were to move them into the domain model proper.

I assume the right course of action is to abstract away the database stuff such as Integer ID; in all domain objects, and let "something" (not specifically hibernate) manage persistence for us, and then start modeling the domain properly. And then document it all and educate everyone that, now, ABC is much more agile and can withstand changes easily.

It cannot today, that is for sure. I draw the conclusion that earlier developers have been "fooled" by EJB patterns or something similar, and was unable to see anything wrong with our anemic domain model. Design patterns are used, but those are localized, and in general there "is no design". No explicit design anyway, there is one that can be sort-of inferred from looking at the code, but it is very vague and lots of boundaries in that are often broken by code. It can still server as a "new vision" for the overall design though.