Thursday, August 21, 2008

Open source browsers war: Mozilla and Webkit raw builds numbers on Maemo (ARM)

The big picture

Bosses are only pleased when they get numbers and charts in reports on top of their tables. At this round, we (André, Diego, Fernanda and I) had to benchmark Mozilla and Gtk-Webkit on arm, and so we did ... Text below is an informal summary of the official report.

Scenario
  • Experiments were performed using Nokia N810 Internet tablets (400MHz CPU and 128Mb of RAM memory) with Chinook installed.
  • Applications under test were GTK+ embedding sample browsers from both Mozilla (TestGtkEmbed - available from the mozilla-central repository) and WebKit (GtkLauncher - from the WebKit trunk), cross-compiled in Scratchbox 1.0.8, gcc-2005q3-2.
About the tests

These were the items measured:

1. Page load speed

The page load test focuses on measuring the absolute and relative time needed by each browser to open a given set of locally stored webpages. The test is driven by a JavaScript script that fires the load of each page in the test set. Both the time speed for each Web page to load (relative time) and the total time taken by loading the whole set (absolute time) are measured. The test set is formed by 37 real-world webpages fetched using the httrack crawler and is accessible over an Ad-hoc wi-fi network via the Apache Web server.

2. Memory and CPU consumption of 1)

The goal of this test is to evaluate how well browsers manage system memory during the page load test (above): virtual and physical memory allocated by each browser were monitored and printed out in a comparing chart. As such, a Bash script was developed to "watch" system memory and CPU numbers. It is basically a timer used to poll at every 1 second the browser's virtual and physical memory values from "/proc", and CPU usage from "top" (in batch mode).

3. Javascript engine performance.

Both browsers were ran against Dromaeo and SunSpider Javascript test suites and memory consumption during these tests was measure (same way as in 2).

4. CSS compliance

Both browsers were ran against Acid3 CSS test suite.

Results and Charts

As there is no such general browser benchmark suite (is this even possible ?), we just developed your own tools (bash and JS) for those item above that do not have renowned public benchmarks available (pageload and resource consumption).

ps: I tried Mozilla's Talos suite, which seems fine to test Mozilla but it is not portable for other browsers.

1. Page load speed

Mozilla's and Webkit's page load speed against the given pageset (see table below).

Individual pageload speeds (in ms).

DISCLAIMER
  • Original page set was formed by 85 webpages, from Talos Mozilla Test Suite. Although TestGtkEmbed ran well through the entire original testset, GtkLauncher always got OOM-killed after a while running the page load test, due to lack of memory (see memory chart in Memory Consumption section). Then, from the original 85 webpages, 37 were chosen to make GtkLauncher to finish this test.
  • 10 out of 37 of the remaining webpages in the pageset contain non UTF-8 characters (Russian, Japanese, Chinese, ...). While WebKit misrendered most these fonts, Mozilla went fine for all. Example showed below:
www.3721.com in TestGtkEmbed

www.3721.com in GtkLauncher

2. Memory and CPU consumption during 1)

Memory consumption while doing page load test in 1).

UPDATE: VIRTUAL AND PHYSICAL MEMORY LABELS ARE CHANGED HERE.

CPU load while doing page load test in 1).

3. Javascript engine performance.

Dromaeo numbers.
SunSpider numbers.

Browsers memory use while doing Dromaeo.

Browsers memory use while doing SunSpider.

4. CSS compliance

Mozilla on Acid3.

Webkit on Acid3.

Conclusion

Some outstandings from the numbers:
  • Mozilla managed better memory while doing page load tests, although Webkit was faster. It might probably been had affected by the fact that Mozilla rendered well all non-western chars, Webkit fails.
  • Webkit was faster and used less memory while doing both Dromaeo and SunSpider test.
UPDATE: Some things that have to be pointed out about my Mozilla build are:
  • I do not jemalloc enabled, but would love to.
  • Mozilla guys are doing some amazing job on speeding up their Javascript engine: tracemonkey will probably get things much (5x at least ?) faster.
ps: I personally would not mind to do a 2nd recond of tests and charts w/ these two items above enabled in my Mozilla build.


--Antonio Gomes
tonikitoo at gmail dot com

21 comments:

blizzard said...

You should definitely re-run these tests with jemalloc on since that will affect both performance and memory size. TestGtkEmbed should build with it and includes the proper link line to have it added in.

Also, what did you use to build the Mozilla engine? Did you change any optimizations on the configure line?

blizzard said...

Also re-reading this post I'm a little surprised that you didn't open with the fact that while WK ran these tests a little bit faster it did so while running out of memory and mis-rendering pages on the web. It would seem that being able to render the web would be the first goal and doing so quickly would be second?

Unknown said...

IMO the pages that were not rendered properly by WebKit shall be excluded from the test. A web engine spends significant amount of time for the non UTF-8 font rendering on pango/gtk platforms. While WebKit skips to render these fonts properly it may get some extra speed.

blizzard said...

It seems like you can't really compare the two engines in this case as it's clear that WebKit isn't really ready to render the web as it exists. Lots of tiny browsers have this problem - they go faster but they don't try to deal with all of the edge cases or fully support everything that's needed to actually render the whole web. And that's part of what makes a full-fledged engine a bit slower and bigger - it actually supports everything.

blizzard said...

Also, can you tell us what prefs you were using for TestGtkEmbed? When I did testing I made sure to make sure that I set the memory cache to the same size as the hard-coded WebKit value (8MB, iirc) and did things like disabled the disk cache because WebKit doesn't have one of those either.

blizzard said...

Also can you tell us what HTTP stack you were using in WebKit (I think that it supports libsoup and/or libcurl?) We've been doing a lot of optimizations to our pipelining and http code in Fennec and I'm wondering if you set up your prefs to those new values? It would also help knowing what kind of connection you were running under. Was it WiFi or was it Bluetooth to GPRS or something else? We've done a lot of profiling with various types of networks and that data is starting to turn into code changes.

Thanks!

Anonymous said...

@leonid - what exactly is the point of removing pages from the test that webkit fails to render? shouldn't correctly rendering webpages be part of the test?

blizzard said...

We're a little confused about your use of "physical memory" and "virtual memory." Virtual memory should always be higher than physical memory but in your graphs they aren't. Have you tried using the ps_mem.py script? It does a pretty good job of actually telling what your heap + shared memory numbers are through the lifetime of a process.

Unknown said...

@Mark - It depends on what you really want to test. If the test target is to evaluate the page loading speed than all the pages in the test set shall be supported by the both engines on the identical manner. If one of the engines does not support some particular features of the pages provided like tags, fonts or embedded content than it will not spend any second on this unsupported contend and the page loading speed is faster for this engine.

Anonymous said...

@leonid - yeah, I see your point

Anonymous said...

The fonts not rendered properly in WebKit is obviously either a bug or a misconfiguration, not a limitation of the engine.

Antonio said...

Blizzard_1, I would really love to (and will) re-run the tests w/ jemalloc enabled. Any working patch for bug 451193 ?

fwiw, this is the mozconfig I used to build:
# Options for client.mk.
mk_add_options MOZ_BUILD_PROJECTS="xulrunner prism mobile"
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-chinook

# Global options
ac_add_options --disable-debug
ac_add_options --enable-optimize
ac_add_options --disable-tests
ac_add_options --disable-logging
ac_add_options --enable-strip
ac_add_options --disable-jemalloc
ac_add_options --disable-xmlextras
ac_add_options --disable-javaxpcom
#ac_cv_visibility_pragma=no
NS_OSSO=1

# XULRunner options
ac_add_app_options xulrunner --enable-application=xulrunner
ac_add_app_options xulrunner --enable-extensions="softkb"
ac_add_app_options xulrunner --disable-installer
# Enabling --with-arm-kuser implies Linux on ARM and enables kernel
# optimizations for that platform
#ac_add_app_options xulrunner --with-arm-kuser

# prism options
ac_add_app_options prism --enable-application=prism
ac_add_app_options prism --with-libxul-sdk=../xulrunner/dist

# mobile options
ac_add_app_options mobile --enable-application=mobile
ac_add_app_options mobile --with-libxul-sdk=../xulrunner/dist

# configure will be automatically generated using the 'autoconf-2.13'
# command. If autoconf-2.13 isn't the right name for your system, as
# is the case on OS X using MacPorts, use the real command name as
# demonstrated below.
mk_add_options AUTOCONF=autoconf2.13

Antonio said...

Blizzard_2: "Also re-reading this post I'm a little surprised that you didn't open with the fact that while WK ran these tests a little bit faster it did so while running out of memory and mis-rendering pages on the web."

we considered that, that can explain webkit's faster results. I mentioned briefly something about it in the blogspot (see disclaimer, and conclusion sections), but should elaborate more ... Fonts misrendering is probably *the* problem here, and it makes harder to say which is best or faster, we did not even want to point it out either, but just put numbers we got off of raw builds out .

Antonio said...

Leoz, Mark, Johan: I will check how to get fonts working (locale or encoding issue), and the tests should be more fair. Johan, could you help here ?

Antonio said...

Blizzard_4: I will post the prefs used soon...

Antonio said...

Blizzard_5: webkit used libcurl, as http stack. I would really love to use fennec network/pipelining optimization while re-run the tests. Could you please point them out.
As said in the post, all webpages in the pageload test, asl well as acid3 and JS suite tests were locally stored in a Dell laptop, and accessible via adhoc network w/ the device.

Antonio said...

Blizzard_6: Humm, you are right. Labels are changed. The higher one if the virtual memory. My bad, I will fix it and upload them again. Thanks.

Unknown said...

@ Johan - Perhaps, this is the bug:
https://bugs.webkit.org/show_bug.cgi?id=18546
IMO bug IS a limitation. Or the other way around - there are no limitations, just bugs.
;-)

blizzard said...

Configure line looks fine, except for the jemalloc stuff. I'll look into that. I thought that we were building fine on ARM with that now.

As for prefs, you can see the posts in about:mobile that have information on what to set. Also, have a look at the prefs in fennec that already have the right http pipeline depth. (Note those are being optimized for wireless access instead of desktop, so it's not clear what happens with a super-high bandwith connection.)

I would also point out that we spend a _tremendous_ amount of our rendering and measure time as a result of supporting pango. So if your current WK builds aren't using pango to render i18n pages, I would love to hear how you plan on getting around that (we have a backend that works pretty well and doesn't use pango at this point.) But I would also say that it's not even slightly fair to compare anything from page load time to memory usage. Because pango plays such a _huge_ part in those measurement.

Thanks for the followups, Antonio! And good luck with your current project! :)

romaxa said...

Here is the simple script which shows free memory in the system:
.............................
#!/usr/bin/perl -w
$summ = 0;
open(FILE, "/proc/meminfo");
while(my $line = FILE)
{
$summ += $1 if ($line=~/MemFree\:\s+(\d+)\s/);
$summ += $1 if ($line=~/Buffers\:\s+(\d+)\s/);
$summ += $1 if ($line=~/Cached\:\s+(\d+)\s/);
}
close(FILE);
print ("SUMM = $summ\n");
.............................
FILE - should be in <>

I was testing with next steps:
0) swap disabled, preferences for cache the same as in webkit.
1) open target browser about:blank
2) check free memory
3) load target page
4) check free memory
5) close browser
6) memory usage per page = 2) - 4)

This way of measuring will show very interesting results...

especially for pages such like hs.fi, iltalehti.fi ...

Vinay H said...

Antonio, published results very inetresting and informative.
Any plans to use jemalloc with WK?
Good luck..