Less WebKit is More WebKit
… at least that is true in the embedded space where your CPU footprint is just as important as your memory footprint and your storage (“on-disk”) footprint.
As a follow up to the Crank WebKit performance “shark week” we decided that since runtime performance was starting to look pretty good we should turn our sights to the memory footprint that the WebKit port was taking on our systems.
Due to the fact that this was a brand new browser engine (WebKit), with a minimal custom front end (Crank Launcher), ported to a brand new graphics framework from QNX (AG/GF), there was really no “apples to apples” comparable to use (though Crank does have many internal versions of Firefox). This isn’t as bad as it seems … in the world of software, attempting to “match” what previous software generations did rarely helps to advance the state of the technology. Our goal was to take a fixed block of time and make the biggest impact that we could.
We took base measurements off many different sites, but here I’ll use the plain old www.google.com as the reference. When we loaded that site up, after all of the shared objects were loaded and we finished rendering the page, we found that we would often hit close to 30M of memory being used! That was a pretty outrageous number.
There are lots of different tools for memory analysis, most of which don’t scale appropriately or provide accurate enough results when used on an application as disjoint as WebKit: It uses both C and C++ object smart pointers, it mixes custom allocators and the system allocator (malloc) and the usage and behaviour of memory allocation is completely different for each of the shared libraries used (ie libc, libxml, freetype, icu, libjpeg, …).
Since we were coming off the week of CPU performance analysis we had lots of different trace files hanging around so we decided to take a crank at them with the System Profiler. In addition to the new Path Manager event in 6.4.0, there are two system events for tracking mmap events … one for named events and one for un-named events.
We set up some traces, using the same technique as with the Path Manager events, and leveraged that information to give us some insight to what major allocations (those that routed to mmap()) were occuring and correspondingly to see what shared libraries, files and other objects were being mmap’ed by WebKit.
This tuning once again pointed us to the font handing and some less than optimal code in the way that WebKit uses the FreeType and FontConfig API’s. After last week’s CPU tuning experience with the font configuration, I’m rapidly coming to the conclusion that fonts and internationalization is nothing but a work generator!
Just for confirmation, we dropped the entire application into the debugger, put a breakpoint on the mmap() function and stepped through the major WebKit operations all while double verifying using pidin mem to see the memory effect on the system.
To make a short story long, we concluded that there is no reason to map in Asian fonts, especially at 6M a pop, if you have no characters on screen that require such functionality! Not a revelation that most people would be knocked over by, but the effect on this software was. With a little bit of tuning around the fonts and a few other memory areas, our memory footprint rocketted down to a very respectable (and stable) 4M when we target www.google.com.
… and of course to speak to the title, when you drop from 32M to 4M (the less) we pick up a few extra CPU cycles (the more) in the general operation. We’ve got a few more tricks up our sleeve but we’re getting very close to our GF release point!