This report analyses performance measurements from the browsertime test harness across the Firefox TV vehicle powered by different rendering engines.

This report was commissioned to try to replicate similar measurements performed using a different test harness and methodology, the nimbledroid test suite. As data was collected, the focus of the project shifted away from validating the nimbledroid test suite, and in fact the tested site corpus is almost entirely disjoint from the site corpus tested by the nimbledroid test suite. Prior to shifting the corpus, however, the browsertime measurements were consonant with the nimbledroid measurements.

Conclusions

browsertime reports that GeckoView is slower than WebView

When running with Turbo Mode disabled, GeckoView is perhaps 10% slower than WebView on the more powerful FireTV Stick 4k and Fire TV Pendant and perhaps 50% slower than WebView on the underpowered FireTV Cube. It seems that GeckoView is also much more variable than WebView – perhaps as much as 200% more.

The performance gap does decrease with Turbo Mode enabled: GeckoView is perhaps level with WebView on the FireTV Stick 4k and Fire TV Pendant and perhaps 40% slower than WebView on the FireTV Cube. However, GeckoView becomes significantly more variable than WebView. This is very surprising because the set of processed resources should be smaller than with Turbo Mode disabled, which one expects to translate to less variability.

browsertime reports that Turbo Mode improves performance at the expense of variability

With GeckoView: enabling turbo mode produces a performance increase, more significant on the underpowered FireTV Cube, and a significant noise increase.

We see the same general behaviour with WebView: a performance increase on the underpowered FireTV Cube and a noise increase.

Recommendations for Future Work

  1. The value of the load event, as measured by the loadEventStart timestamp captured by pageLoadTime, is questionable. The next step is to capture similar numbers for various visual metrics that the Performance Team will provide more valuable measurements.

  2. GeckoView fires the load event generally after the system WebView does. This effect is larger on the FireTV Cube, which we consider to be less powerful than the FireTV Pendant and the Fire TV Stick 4k, which suggests that Gecko has significant room to improve on low-end devices.

  3. The impact of Turbo Mode is surprising: it is counter-intuitive that reducing the total set of resources to process increases variability. More investigation into how the Content Blocking system is working in Firefox TV and in GeckoView itself is needed. It may also be the case that more valuable visual metrics improve while the Web Navigation Performance API metrics are stable.

  4. Quantifying the differences in the content served to GeckoView and WebView for the corpus under test could let us be more confident that measured differences are truly delivered by the underlying engines.

Per-device, per-site engine comparisons

The following graphs give some insight into how GeckoView and WebView compare on the sites in the test corpus.

Turbo Mode enabled

Turbo Mode disabled

Methodology

Vehicles tested

The data were collected from the following vehicle configurations:

vehicle engine turbo mode
Firefox for Fire TV GeckoView enabled
Firefox for Fire TV GeckoView disabled
Firefox for Fire TV WebView enabled
Firefox for Fire TV WebView disabled

All vehicle configurations shared a single User Agent string. See this issue.

Sites tested

22 of the 25 sites in the product mobile corpus were tested. The sites not tested were:

site reason
https://m.facebook.com/Cristiano requires a login
https://hubs.mozilla.com/spES8RP/treasured-spirited-huddle Web Sockets break record and replay proxy
https://www.allrecipes.com fails to render in WebView due to invalid protocol error

Some sites witnessed transient network errors: in these cases the number of recorded measurements is fewer than expected. In any individual run, no site was measured fewer than 4 times on the Fire TV Pendant and Cube, or fewer than 8 times on the Fire TV Stick 4k.

The entire corpus was tested end-to-end twice in succession on the Fire TV Pendant and Cube, and end-to-end once on the Fire TV Stick.

Single site test

For each site, the four vehicle configurations were tested as follows:

  1. An initial recording of the live site was captured. The record and replay proxy was started in recording mode, and browsertime with --iterations 1 launched the vehicle and (cold-)loaded the site under test. The replay proxy was stopped and an archive of the network activity captured.

  2. The record and replay proxy was started in replay mode, backed by the archive of captured network activity. browsertime, with --iterations 5 (respectively, --iterations 9) on the Fire TV Pendant and Cube (respectively, Fire TV Stick 4k) launched the vehicle and cold-loaded the site under test the specified number of times. Between each cold-load the vehicle was force-stopped and its on-device package-data cleared.

  3. For each cold-load, browsertime reports a wide range of timings, mostly from the Performance Navigation Timing API.

Results

Processed data

The data from the three test runs above can be found in the following CSV file. The columns of the CSV are as follows:

column description
device the target device, one of “Fire TV Stick 4k”, “Fire TV Cube”, or “Fire TV Pendant”
run the test run number
site the URL of the page being loaded
engine the tested engine, either “GeckoView” or “WebView”
turbo whether Turbo Mode was enabled, either “true” (meaning Turbo View was enabled) or “false” (meaning Turbo View was disabled)
proxy the proxy state, either “record” (meaning the pageload was from the live network and the proxy was recording) or “replay” (meaning the pageload was from the replaying proxy and not from the live network)
timestamp the local timestamp when the pageload was initiated
pageLoadTime the loadEventStart timestamp reported by the engine under test, as captured by this JavaScript code

Inter-vehicle reliability

Test harness

The data were collected using an ad-hoc Python harness driving the browsertime testing suite. browsertime drives the underlying vehicles using Web Driver automation; for WebView this means chromedriver driving the engine via the Chrome Debug Protocol and for GeckoView this means geckodriver driving the engine over the Marionette protocol.

The version of browsertime used was lightly modified to support Android-specific WebView engine configuration and to support the GeckoView engine. None of these modifications are believed to impact engine performance.

The version of geckodriver was heavily modified to support the GeckoView engine over the adb TCP/IP protocol. These modifications principally concern launching the target vehicle and connecting to the underlying protocol handler; any impact on engine performance has to do with servicing the underlying protocol and ambient engine configuration (for example, custom profiles in GeckoView).

Network weather

Both mitmproxy and Web Page Replay Go were used to minimize the impact of network weather. Because older versions of adb do not allow to reverse port-forward over TCP/IP [link], the test host and the target device were always on the same network. Because Web Page Replay Go is not a true HTTP proxy [link] but instead requires transparent port-mapping [link] and because Gecko does not support such port-mapping [link], mitmproxy was used to perform the port-mapping [link to script]. Record and replay were provided by wpr-go, although it is likely that mitmproxy could provide this function.

Using a proxy and a custom CA certificate for both WebView and GeckoView sacrifices real-world characteristics for cross-engine consistency. GeckoView requires a true HTTP proxy for this type of record and replay, and such a proxy requires either a custom CA certificate or for the engine to allow insecure connections. Allowing insecure connections is decidely not real-world, hence we took the lesser of two evils.

Record and replay differences

The turbo mode option should change the network activity captured by the record and replay proxy. However, it is also possible that the two engines witness different network activity – for example, by User Agent sniffing sites. This means that each individual site and vehicle configuration should have stable network activity, but between vehicle configuraitons there could be network activity differences.

Live site differences

Some of the sites serve dynamic content and/or advertisements. This means that between the first and second whole-corpus iteration, the underlying network archives may have changed significantly.

Gecko profile conditioning

It is well known that the Gecko profile significantly impacts the performance of the Gecko engine: preferences, certificate databases, and the network cache itself can have major impacts on measurements.

To minimize volatility, for each GeckoView-based vehicle configuration, i.e., for both turbo enabled and turbo disabled, a Gecko profile was conditioned as follows. First, a profile template with cert9.db and key4.db containing the custom CA certificate used by the record and replay proxy was produced. Second, this template was copied to the target device, and the vehicle was started from a cleared state with this profile. The single page http://example.com was visited and then the vehicle was left idle for 2 minutes. The vehicle was then force-stopped and the conditioned profile retrieved from the device.

This conditioned profile was then copied to the device at the beginning of every test run: that is, every cold pageload started with exactly the same Gecko profile.

Versions

Fire TV versions

The output of adb shell getprops for each device is available:

device
Fire TV Cube
Fire TV Pendant
Fire TV Stick

Software versions

package version link
mitmproxy 4.0.4
wpr-go XXX ede50ff4d
browsertime XXX
chromedriver 2.32
geckodriver XXX
firefox-tv xxx
GeckoView XXX
system WebView 59.XXX