Monday, December 28, 2009

ABCL on Google App Engine (2)

Over the past month further work was done to improve ABCL startup times. This effort is especially directed at supporting ABCL on Google App Engine (GAE).

The startup time of the trivial GAE example application in ABCL's source tree takes 19 seconds to startup, as mentioned in an earlier blog item. Although this is only a "Google theoretical time", because the page is served in only 12 seconds, this is clearly a lot. I heard many GAE applications have startup times between 5 and 10 seconds. It surely would be nice if our trivial application could get closer to that.

A number of different solutions have been evaluated:
  1. Reduction of the number of classes loaded at startup; making better use of ABCL's auto-loader facility
  2. Supporting binary fasls
  3. Create a system for finer-grained auto-loading support
The reason exactly these scenarios were evaluated is the fact that I looked for help regarding this performance issue on irc (irc://irc.freenode.net/#...); the answer was "you're doing too much during setup or at the first request". My first reaction was that we didn't have any options: that's how ABCL is designed. After some discussion these scenarios came up though.

The first and third scenarios are the result from many profiling sessions of "ABCL startup" time. The conclusion was that 35 to 45% of ABCL startup time is spent in Java reflection: when loading function classes ABCL needs to look up the class constructors to instantiate an object of the given class.

Scenario 1 is about delaying loading of FASLs until a function in them is required. Scenario 3 goes into more detail about the use of a function: even when a FASL is loaded, not all functions in it will be used (immediately or ever). The idea behind scenario 3 is to delay reflection API access until a function is actually used.

Using scenario (1) startup times could be reduced somewhat, especially in the case of our minimal servlet application: it uses relatively few Lisp functions and the ones it does use are related to printing and streams. Those are concentrated in a limited number of fasls.

In order to implement scenario (3), a quite bit more effort was required. The basic idea - as explained above - is that many functions in a FASL won't be used until a later stage in the application. In order to be able to delay resolution of the bytecode of the function, we introduced an object which - like the auto-loader - acts as a proxy for the unresolved function. This proxy class doesn't exhibit the same overhead, because it is resolved only once.

Upon the first call to the function, its bytecode gets resolved and the proxy in the function slot gets replaced with the actual function. After that, the first call is forwarded to the real function, as if it had been called directly. Although the actual implementation is a bit more complex to account for the loading of nested functions, that's basically it.

With scenario (3) applied to function definitions only, we were able to reduce startup time of the first request on GAE from 19 seconds to 11 seconds (roughly 40%). Today, we started to apply the same strategy to macro functions too. The result is - measured on my local PC, not GAE - a savings of another roughly 13%. Assuming that the same applies to GAE (as it did with the other 40%), we've realized a saving of 50% startup time!

Binary fasls - scenario 2 - were an attempt to reduce the amount of work that needed to be done at startup: because the normal fasl loading process is driven by a text file containing Lisp code, that could have been one of the causes. We didn't remove support for them, but they didn't turn out to be a big saver; that can be explained because the binary fasls are just another ABCL function object which needs to be loaded using reflection.

All in all did we save 50% start up time. Let this be an invitation to start experimenting with ABCL on GAE.