2008-12-18

Novint Falcon

This thing seems really cool. Not so much as a game controller, since I barely play games, but rather as a simple CNC machine ;)

Imagine building a foam sculpter or a 3D scanner from this. Should be easy, especially the scanner since there are buttons on the attachments already, atleast if you dare modifying the official attachments. Apparently they have PICs in them, too, so there might be a real protocol between them and the falcon making building your own attachments that much more of a challenge.

2008-10-22

Android Developers Blog: Android is now Open Source

So, the andorod source was released. That is awesome, and I never doubted it would happen. On the other hand, symbian is also something I would like to see the source of, with me using it every day! I read this morning that the source would cost money to get at, which is really sad. This is what I found:

"The Symbian Foundation platform will be available to members under a royalty-free license during first half 2009. The Foundation will provide, manage and unify the platform, ultimately releasing it as open source. "

Hopefully this is just a temporary measure, as I understand releasing it in the end means I can look at it for free! We can hope anyway...

2008-08-19

Android signing signs..

I hope this is how signing will be done even in Android devices:

There are no requirements on the key used to sign .apk files; locally-generated and self-signed keys are allowed. There is no PKI, and developers will not be required to purchase certificates, or similar. For developers who use the Eclipse/ADT plugin, application signing will be largely automatic. Developers who do not use Eclipse/ADT can use the standard Java jarsigner tool to sign .apk files.

Taken from http://code.google.com/android/RELEASENOTES.html. This is the major problem with J2ME in my opinion: no way for developers to self-sign stuff.

2008-07-28

Google Knol - Information without semantics

So I saw gnol mentioned on identi.ca. I checked it out, but oh boy was I disappointed. Absolutely no semantics at all? Wikipedia is more semantic than that! Faviki is more semantic still. And freebase!

Unless they fix that I don't see how it will be succesfull in the the long term. Other "semanticer" things will simply crush it.


Official Google Blog: Encouraging people to contribute knowledge

2008-07-06

Using joins to find missing data and blobs

I just realized, after having read about column based storage, how potentially bad it is to have blobs in a metadata-rich table.

I am currently developing a small project during my sparetime, and we have a table which is roughly: id int, mimetype varchar, data blob, version int, and so on with a few more non-blob columns in it. What I noticed was that selects on this table that does not need the blobs using the indexed mimetype-field was incredibly slow. Relevant is that I am running MySQL in development mode, so it never seems to use more than a few tens of MB of RAM, meaning almost no caching is happening here. This means, in a row-storage database like MySQL I will be doing a tremendous amount of either seeking or reading of unneccesary data, depending on how MySQL plans its disk-reading, how big the blobs are and the effect of any read-ahead in OS or elsewhere.

The other part of this slow query is a outer join, something like this:

select id from c
left outer join b on b.c_id = c.id and b.name = "something"
where b.name is null

This is apparently also fairly slow. So don't do these things too much. In this case, the join to find if data is not in a joined table could be easily fixed by adding data to the db.

More funny java bugs

So, I get this simple assignment at work. Install a new version of our software on a brand spanking new Intel Xeon quad-core machine. Nice.

Only not so very nice when I notice hard java crashes. This is in EDU.oswego...ConcurrentHashMap's iterators hasNext method, always. I try downgrading to JDK1.5, but that does not help.

I might have to do yet another probing bug hunt in third-party software. Argh.


Also, trying out identi.ca even though blogging, and micro-blogging especially is not really my thing. Finding it fairly good though, just as ping.fm.

2008-06-10

HTMLUnit, MultiThreadedHttpConnectionManager and memory leaks

I have been having this wonderful time at my work. A small coding project involving use of mostly HtmlUnit was almost done, and working properly. But what happens? By chance I notice that it is leaking memory: Perm Gen space, even.

This was coded as a plugin for a larger product, and was dynamically reloaded at every invocation. I had to first remake the plugin CassLoader in this case to become a post-delegating classloader so that I could override the version of HttpClient already in the product. I couldn't really be sure it was not my changes leading to the classloader that gave rise to the leak, but eventually I got to that conclusion. The next step was to narrow down where this happened. Long story short, I found that if I changed HtmlUnit to not use the MultiThreadedHttpConnectionManager from HttpClient, it did not leak. I did not want to really do this though, being unsure of how HtmlUnit actually used this, and also because of the fact that we have multiple threads using HtmlUnit.

The thing that solved the issue was to call shutdownAll in the connection manager. I am not allowed to access that from my code as a user of HtmlUnit though, and I did want to avoid having to recompile anything, so I used reflection to subvert the access checks. Calling shutdown on the one manager did not work, however, nor did closing the connection, which HtmlUnit already did by the way.

I can only assume this is some obscure bug that nobody else ever trips, but now at least if somebody does, they might find this as a reference.

I could not use the latest HtmlUnit because of needing JDK1.4 compatibility, so this was done in HtmlUnit 1.13. Oh, and 1.14 needed CSS stuff that clashed with regular DOM libraries, making classloading not work. Not sure why this does not work when I can safely override HttpClient with a newer version.

2008-06-06

Semantic Web is freaking cool! (and on a roll it seems)

Been surfing around for semantic web websites to find ontologies or datasources to (ab)use for Yet Another AI-Project From Me. This is what I found:


  • True Knowledge. Incredibly cool question-answering frontend to an incredibly complex datamodel, with a moderately complex and severely boring input process. Not free data, I can not download a dump of their database.

  • Freebase. Took me some time to dig into this, actually, but I like what I am seeing. Data model seems a lot simpler than True Knowledge, or at least that is what i think (subclassing, transitivity missing?). Inputting stuff is from 2-10 times quicker/easier. For bulk stuff it is infinitely easier, since TK does not support that at all. Free data, but not RDF!

  • Faviki. Very nice and easy to use semantic social tagging/bookmarking service.

  • RDFScape. Visualizer for cytoscape. Very nice, have not had time to play with this yet.

  • Attempto Controlled English. Maybe the least exciting of the bunch, but is useful for my NLP-related project.




I also got access to twine. Oh my god what a bore. I just did not see the idea behind it, and the interface turned me off so much that after my third visit I never came back.

True Knowledge has some awesome NLP parsing going on, but it also fails miserably often. I have a simpe idea to get me atleast started, it pretty much builds upon AIML/patterns to extract meaning from stuff, specifically Wikipedia.

Freebase has a weak model in my mind, there does not seem to be a real inheritance hierarchy and the "upper ontology" is basically missing. The upper ontology not being there is not such a big deal though, I think. There should be a set of "uppermost" classes in Freebase that can be mapped to SUMO/YAGO/DBPedia/Wordnet or whatever to help with any inferencing/analogous thinking.

I can not help but think that in five years from now, "semantic" does not really exist. Everything is then semantic, or gone since long. AGI is not far behind, either. I predict a surge of NLP success in the coming few years, mainly with knowledge-intensive approaches. Common-sense is still the missing piece of the puzzle, the above efforts do not concentrate on this at all, but rather on knowledge that is useful to humans. Remember, common-sense is boring for humans to input and administer, since it is all so basic.

2008-04-28

Nokia E51 stability

I love my phone for several reasons. It is just the right size, fits very nicely in jeans-pockets without wearing them down in no time. My Motorola E398 was much worse in this regard due to being thicker.

I also love how it handles most J2ME apps, has email and so on. I've started using it during my commute every day to surf on my laptop, via bluetooth mostly since USB-cables are not that much fun carrying around. The phone is mostly stable. I've encountered crashes when for example running many apps and playing mp3s at the same time. Nicely enough it reboots automatically most of the time. I think I've found what it does not handle so well though: bluetooth internet sharing. Firstly it gets very hot, thats fine though I guess. But it also seems to become very unstable if I try to actually use it during this time.

Too bad, but I'm not surprised. I think that if I just don't touch anything while using it as a mode, it copes better.

2008-01-29

"not in" versus joins

I have been fighting this HQL query for probably 8 hours in total now, where I have a table Entity and EntityAttribute. Every entity has zero or more attributes, so the EntityAttribute table has Entity_ID row in it. Attributes have a name and value, each in its own column. EntityAttribute is mapped as a map collection from Entity, with the "name" as the index. I want to select entity based on whether they do or do not have an attribute with a particular name and value, although I know in practice that those who attributes with the right name do have the rigth value always, at the moment anyway.

The end of the HQL looks something like this:


SELET DISTINCT e FROM entity
WHERE ...
... AND
'type' in indices(e.attributes)


'attributes ' is the collection of attributes.

I would have guessed this to work, but no. I eventually tried this in plain SQL, where it of course also does not work.

I also tried with this, for the other way around (exclusion):


SELET DISTINCT e FROM entity
WHERE ...
... AND
e.attributes['type'] != 'animation'


I can sort of understand this last construct being wrong. The correct way to do things is apparently to swap these two, basically use "not in" for exclusion and use "join" to include things, which is what e.attributes['type'] = 'animation' uses.

2007-12-05

Domain Model - A Tale of Bad (J2EE) Design
(And a newbie developer trying to fix it)

Where I work we have a product that we can call ABC. Now ABC is what I would call "legacy": more than 10k loc (I think it is somewhere around 50-100k), and the project was started more than 5 years ago, in 2001 I think. I have been working on this for about 16 months now, and I am getting more and more comfortable with it every day due to actually getting assigned to develop it at the moment. I am the only developer since a couple of months back.

ABC is in need of a lot of things, this I saw the first week on the job. So, many obvious things are wrong. Where to start? Well, our "service layer" (Note: we do not have clearly defined layers, and nobody seems to have known what to call bigger parts of the system, so if I could say "hey, we have a problem with our service layer" nobody would understand what I was talking about) has code in it that is very, very long-winded, some made up code:


dto.Part p = DAOFactory.getPartsDAO().getPartById(new Integer(1));
validatePart(p);
if(p.getInventoryID() != null) {
dto.Inventory inv = DAOFactory.getInventoryDAO().getInvetoryById(p.getInventoryID());

//Do more...
}


Hey, we use a factory! Great. But, this code does very little for being so verbose, and note that this sort of code could go on for hundreds of lines. Can we see spot any problems? I can, at least now, previously I really could not. I am not a very experienced J2EE developer, this is my first job with it fresh from university:


  • I hate the fact that we have to explicitly do "DAO things". Why is this necessary? It is not, if we for example use hibernate a bit more correctly: ABC is currently not using any mappings between entities for example. Adding this, we could do away with fetching one object in a transaction and then accessing the collection mappings instead of all the "DAOing" as I call it.
    (This DAO-ing also has severe performance penalties when done this way: it has a select n+1 problem. It can be solved by being "smart", but takes a lot of manual coding, for something that hibernate solves perfectly for us, practically for free. Logic is honestly more compact in a HQL query than in this sort of convoluted code. 20-lines of HQL did in some instances replace 100+ lines of Java for us, especially manually fetching objects, checking them instead of simply using a WHERE clause.)

  • We are fetching dtos from a database, by the looks of it. Is this really the intended use for DTO pattern? No (as far as I know, anyway), and it turns out these are not DTOs, firstly, and secondly, they should not be. Our "DTO layer" here is misnamed, and misguided. It is actually our domain model, although an almost completely anemic such[1]. We do in fact have another DTO layer, that is really a DTO layer, and which is needed and should be a DTO layer. This is used for sending data from webservices, and decoupling this from our domain model is on the contrary not a bad idea.



So we have a, more or less, procedural service layer with 95% of our business logic in it, and a domain model layer with data and only trivial behavior in it. It seems straightforward enough to just move behavior into our domain model, and be done with it. One slight problem here is that I have introduced alternative "DTOs" (entity beans, domain objects) that are properly mapped using hibernate, and adding _Mapped to their classname, inheriting from its non-collection-mapped hibernate class. For example we might have:

Part has a subclass PartMapped. PartMapped has collection mappings, Part does not.

It can also look like so: We have PartBase, PartMapped and Part. Part is the "original" DTO, with a foreign key in the form of a Integer. We don't really want this integer in our hibernate-collection-mapped variant, so we create a base class which does not have it, and extend that while adding the collections for hibernate to populate.

Modifying the domain model to fit hibernate is a bad idea, I realize that. We need to stop depending on hibernate. I have been looking at spring lately (which ABC of course does not use) and I think it could help with a lot of this. One reason for having alternate classes for collection-mapped entities was that I do not yet trust hibernate to actually save those back: we still always use the simple save() call for that on a entity with no collection mappings. Makes it very explicit what is happening. This is simply a matter of education though, I need to learn how hibernate works in this regard and, say, filtered collection.

This inheritance structure makes it not hard, far from, but a bit smelly to add behaviour to these objects. Mostly due to the structure being hibernate induced, but also because of the duplication arising from that: We would need two different implementations of a "public Inventory getInventory()" for a Part and a PartMapped, one using straight accessors (hibernate needs objects to use the accessors for its collection proxies to work) and the other one using old-style DAO-ing to fetch the object. We have a lot of these methods to implement if we were to move them into the domain model proper.

I assume the right course of action is to abstract away the database stuff such as Integer ID; in all domain objects, and let "something" (not specifically hibernate) manage persistence for us, and then start modeling the domain properly. And then document it all and educate everyone that, now, ABC is much more agile and can withstand changes easily.

It cannot today, that is for sure. I draw the conclusion that earlier developers have been "fooled" by EJB patterns or something similar, and was unable to see anything wrong with our anemic domain model. Design patterns are used, but those are localized, and in general there "is no design". No explicit design anyway, there is one that can be sort-of inferred from looking at the code, but it is very vague and lots of boundaries in that are often broken by code. It can still server as a "new vision" for the overall design though.

2007-07-02

Amazon EC2 and S3 performance...

I was playing around with a EC2 instance, and I got this great idea that I should benchmark the disk subsystem. I had expected pretty standard performance for the disk at /mnt, but I was wrong:

Benchmarking /dev/sda1 [1537MB], wait 30 seconds
Results: 1014 seeks/second, 0.99 ms random access time

Benchmarking /dev/sda2 [152704MB], wait 30 seconds
Results: 4494 seeks/second, 0.22 ms random access time

Thats pretty damn fast. To get this sort of performance out of a RAID-5 set I think you need a hole lot of disks, maybe 20 or so? It sure as hell beats any disk I have at home with a factor of about between 15 and 60.

I don't know if this could be a Xen problem with this particular benchmark, but if it is true I actually just found a use for EC2: to run my DB-intensive information extraction jobs on. Though they are still most likely better solved by just buying a new machine with 4 or 8GB of RAM. It is pretty cheap now.

Edit: Oh, yeah, I understand that this disk array is shared by a number of instances. A good question then is by how many? And how likely are people to actually use the disk for any stressful activity, with it being non-persistent and all?

Edit2: Oh yes, I am stupid. This is most likely due to Xen using a sparse Copy-on-Write format disk image? It would explain it all, hitting the same physical sector over and over....

2007-07-01

Flash random access write performance

I just bought a new 2GB USB stick, Sandisk Cruzer Micro. I have a Pretec Tiny 256MB since before. I benchmarked these using h2benchw in windows (and seeker in linux, but only for reads).

Now, these two are actually pretty close together in performance, h2benchw reports 0.77 ms access time for the 2GB stick, and 1.77 ms for the 256MB one. Raw read speeds are, according to linux hdparm -tT about 25MB/s for the smaller one, and slightly lower for the Sandisk.


But the real odd difference comes when I set h2benchw up to test random access _write_ performance. Pretec manages ~10ms access time here, whereas the Sandisk comes in at a whopping 132 ms!! Wooha. There faded my hope somewhat of using flash as a faster substitue for disks. X-bit labs says there are faster write access times though, specifically for Apacer HT202 sticks with a 28ms access time for writes. Thats unfortunately still in line with harddisks, so the only real benefit is the read access times, in other words flash will only be great when the ratio of read/writes is heavily biased towards reads. A typical database-load might not perform so well.

2007-05-22

MIDP 2.0 and killing homebrew..

Why is it that MIDP 2.0 decided to completely lock homebrew developers out of J2ME development? If you read this first you'll notice some people are annoyed at the way root certificates are handled, mainly that there is no certificate that is guaranteed to be available on every device, and the way Java Verified is per-device. This is a big problem.

Another problem that I have been having lately is that of free software and J2ME. Now you may ask why can't you just put up with a few security prompts for your hobby projects? Sure, I am fine with that. As long as they are few. The problem here is of course that they are not few! If you want to, for example, read maps from a memory card you could be reading from hundreds of tiles.

I think the problem here is that the APIs are actually becoming useful, and the security prompts for these sort of applications are becoming a big bottleneck, necessitating signing. But there is no way to sign anything for free! Not as far i know, anyway. You are also not allowed to import your own root-certificate.

I think security is a good thing, but locking out power-users is also bad. This while things is putting me off my long-needed phone-upgrade.

2007-04-10

KR tutorial...

I found this interesting and simple intro to KR and I must say it is good. I don't quite agree that description logics is the way of the future though.

I also fully agree that Google most likely is doing A.I. research. If they are not, I think they are totally misjudging how useful it is and how close we are to useful A.I. applications.

This book that was linked also looks very promising, I have to add it to my Amazon wish list for sure.

2007-04-08

Nokia E70 video playback resolution

I have been trying to find information on what Nokia 5300 and Nokia E70 is capable of playing back in terms of resolution. I would love a E70 if it can playback full-res video, and my girlfriend wants a 5300. This thread seems to imply the 5300 is unable to play back full-res video, and I suspect the E70 has the same problem, given its huge resolution.

2007-04-05

SleekXMPP

Another nice XMPP library that aims to make implementing or testing XEPs easy, something I really long to do.

Some day I am going to start using XMPP "for real": I have been on jabber for a couple of months but I have also noticed that I never really IM anyone anymore now that I work all day long. I am an antisocial creature I guess. Also I only have bots and automated services on jabber, while every human in my roster is on MSN or ICQ.

Canon TX-1

I would much like a Canon TX1 when I go to Venezuela in December. Or why not for the summer vacation in Paris/Öland as well.

I see many people complaining about the MJPEG of Canon. I like it though: its perfect for editing, no temporal component that makes re-encoding necessary when you cut the movie. Also the quality is very high. 8GB SDHC cards are not expensive enough to make them unachievable in any way. A problem is where to store all the raw footage when you empty the cards though... this will be a serious filler of HDDs.

2007-04-04

Linkbacks and blogger/blogspot

Why, oh why, doesn't Google support pingback? As far as I have understood, blogger does not support any of the linkback protocols at all.

Thankfully there is, as is common, a hack around this. The problem is that the hack this time around is very ugly: Trackback with some help from Greasemonkey:


  • Manually add links

  • Trackback as opposed to pingback

I need to do... more m$ stuff.

Wow, I found this blog really interesting: Vista Smalltalk. It is, you might have guessed it, Smalltalk for Vista, sort of. It's using .NET and Microsoft's "AJAX" WPF/E to do some very cool stuff in the browser. I happen not to like Smalltalk much though, i did a presentation on it when I was at university and I was not impressed. It lacked anything special enough for it to considered useful by me anyway.

The future will be very interesting when it comes to portable applications. Seems Microsoft really has got something going here, and sadly the XUL approach of Mozilla will not be able to compete here, even though it is not really aimed at the same thing. I really need to learn some .NET stuff for real soon, or I fear I will be obsolete in a short time.