2008-06-10

HTMLUnit, MultiThreadedHttpConnectionManager and memory leaks

I have been having this wonderful time at my work. A small coding project involving use of mostly HtmlUnit was almost done, and working properly. But what happens? By chance I notice that it is leaking memory: Perm Gen space, even.

This was coded as a plugin for a larger product, and was dynamically reloaded at every invocation. I had to first remake the plugin CassLoader in this case to become a post-delegating classloader so that I could override the version of HttpClient already in the product. I couldn't really be sure it was not my changes leading to the classloader that gave rise to the leak, but eventually I got to that conclusion. The next step was to narrow down where this happened. Long story short, I found that if I changed HtmlUnit to not use the MultiThreadedHttpConnectionManager from HttpClient, it did not leak. I did not want to really do this though, being unsure of how HtmlUnit actually used this, and also because of the fact that we have multiple threads using HtmlUnit.

The thing that solved the issue was to call shutdownAll in the connection manager. I am not allowed to access that from my code as a user of HtmlUnit though, and I did want to avoid having to recompile anything, so I used reflection to subvert the access checks. Calling shutdown on the one manager did not work, however, nor did closing the connection, which HtmlUnit already did by the way.

I can only assume this is some obscure bug that nobody else ever trips, but now at least if somebody does, they might find this as a reference.

I could not use the latest HtmlUnit because of needing JDK1.4 compatibility, so this was done in HtmlUnit 1.13. Oh, and 1.14 needed CSS stuff that clashed with regular DOM libraries, making classloading not work. Not sure why this does not work when I can safely override HttpClient with a newer version.

No comments: