The startup time of the trivial GAE example application in ABCL's source tree takes 19 seconds to startup, as mentioned in an earlier blog item. Although this is only a "Google theoretical time", because the page is served in only 12 seconds, this is clearly a lot. I heard many GAE applications have startup times between 5 and 10 seconds. It surely would be nice if our trivial application could get closer to that.
A number of different solutions have been evaluated:
- Reduction of the number of classes loaded at startup; making better use of ABCL's auto-loader facility
- Supporting binary fasls
- Create a system for finer-grained auto-loading support
The first and third scenarios are the result from many profiling sessions of "ABCL startup" time. The conclusion was that 35 to 45% of ABCL startup time is spent in Java reflection: when loading function classes ABCL needs to look up the class constructors to instantiate an object of the given class.
Scenario 1 is about delaying loading of FASLs until a function in them is required. Scenario 3 goes into more detail about the use of a function: even when a FASL is loaded, not all functions in it will be used (immediately or ever). The idea behind scenario 3 is to delay reflection API access until a function is actually used.
Using scenario (1) startup times could be reduced somewhat, especially in the case of our minimal servlet application: it uses relatively few Lisp functions and the ones it does use are related to printing and streams. Those are concentrated in a limited number of fasls.
In order to implement scenario (3), a quite bit more effort was required. The basic idea - as explained above - is that many functions in a FASL won't be used until a later stage in the application. In order to be able to delay resolution of the bytecode of the function, we introduced an object which - like the auto-loader - acts as a proxy for the unresolved function. This proxy class doesn't exhibit the same overhead, because it is resolved only once.
Upon the first call to the function, its bytecode gets resolved and the proxy in the function slot gets replaced with the actual function. After that, the first call is forwarded to the real function, as if it had been called directly. Although the actual implementation is a bit more complex to account for the loading of nested functions, that's basically it.
With scenario (3) applied to function definitions only, we were able to reduce startup time of the first request on GAE from 19 seconds to 11 seconds (roughly 40%). Today, we started to apply the same strategy to macro functions too. The result is - measured on my local PC, not GAE - a savings of another roughly 13%. Assuming that the same applies to GAE (as it did with the other 40%), we've realized a saving of 50% startup time!
Binary fasls - scenario 2 - were an attempt to reduce the amount of work that needed to be done at startup: because the normal fasl loading process is driven by a text file containing Lisp code, that could have been one of the causes. We didn't remove support for them, but they didn't turn out to be a big saver; that can be explained because the binary fasls are just another ABCL function object which needs to be loaded using reflection.
All in all did we save 50% start up time. Let this be an invitation to start experimenting with ABCL on GAE.
Nice to see constant progress of ABCL. Do you have any plans to improve performance of the slime fuzzy completion in the near future?
ReplyDeleteHi Anton,
ReplyDeleteWe don't have plans to improve fuzzy completion performance as a goal itself. However, if you can help analyse where the main performance bottlenecks are - presumably in our CLOS implementation - we're happy to help find out what we can do about it.
For the short term I'm more focussed on CLOS completion (DEFINE-METHOD-COMBINATION) and correctness (Gray streams, argument checking, etc.) than on performance. We appreciate that others might have different needs and want to be supportive of helping to achieve them.
Hello Erik,
ReplyDeleteThe bottleneck I was able to find was not difficult to fix myself [1] and it improved the fuzzy-completion speed substantially, but something else still remains, that I wasn't able to track down. So the problem is exactly in identifying it... I used ABCL profiler, but the cause of the remaining slowness didn't become apparent.
Why I care about the fuzzy-completion it's because in the cases I have in mind - using embedded ABCL as an interactive shell to explore/customize existing java systems - the fuzzy-completion speed has great influence to user experience: it's used for almost every identifier I type in; while I didn't use advanced CLOS features.
So this post is to draw you attention to the fuzzy-completion.
Unfortunately I can't work on this in the near future too - overloaded by other tasks.
And I also appreciate that you may have different and maybe better-grounded perspective and priorities in the work on ABCL. Anyway, I believe the fuzzy-completion speed will be fixed sooner or later.
Best regards,
- Anton
1 - http://common-lisp.net/pipermail/armedbear-devel/2009-July/000062.html
Hi Anton,
ReplyDeleteI've filed a ticket not to loose track of this issue: http://trac.common-lisp.net/armedbear/ticket/76
I also quickly ran a profile on the call you described [note that the profiler is in better shape now than it was when you tried to use it]. I found an easy optimization regarding FORMAT, however, after I applied it, it seems that 72% of the time is spent in FUZZY-CONVERT-MATCHING-FOR-EMACS. Maybe that gives someone a place to start looking.
Can we have any followup discussion on armedbear-devel at common dash lisp dot net, please? I find these comment sections hard to use, when compared with e.g. GMail.
Yes, sure. And thanks.
ReplyDelete