Sunday, March 31, 2013

Pitfalls of shared structure, or: fixing XPATH compilation

In recent blog posts, Hans Hübner mentions how ABCL starts to be usable by the average Common Lisp programmer. Yet he also ran into the longer standing issue that the XPATH library doesn't want to be compiled by ABCL 1.1.1. (This is where his remark how the CXML-STP library doesn't work comes from.)

Last weekend I finally found out what the underlying problem was that XPATH triggered: shared structure in forms being compiled. Even though it's quite easy to generate code with shared structure, apparently few projects use it in a way which triggered ABCL into incorrect behaviour.

Case in point was a compiler macro expansion containing a literal list. After expansion, the literal became part of the code being compiled and the compiler modified the literal in place, resulting in problems on all subsequent expansions.

The cause of the issue dates all the way back to even before I started working on ABCL. Back then the compiler used to modify the CARs and CDRs of the forms it was compiling. Much of this behaviour was already replaced before last week as my intuition told me this is undesirable, with the replacement being nice side-effect-free functional code. Even though the reason for this behaviour isn't documented, I'll assume it was to reduce consing and thereby reduce pressure on the JVM's garbage collector. In today's world with new garbage collectors and much improved JIT compilers in the JVM, this issue isn't an issue anymore.

To cut a long story short: XPATH compilation fixed, compiler changed to functional style and libraries depending on XPATH (like CXML-STP) also fixed.

On to the next cl-test-grid failure...

Friday, March 22, 2013

M$FT Excel format from Common Lisp.

In light of Hans Hübner's recent spike into the "no more Java the language" railhead. I'll certainly share recipes for using the monkeys' libraries to manipulate various historical Microsoft Excel formats.  The dirty secret of computing on the modern enterprise trapped in the black iron prison of non-free document formats is that current versions of Office don't always handle previous versions very well.  Since Apache POI has been around for the better part of a decade, it provides a fairly reasonably DWIM fallbacks for reading such formats.  Somehow, once whatever you are working with is touchable with ANSI Common Lisp, all seems to be so much better…

Two examples that come to mind in swigging the Apache POI libraries over to a syntax that the Bear can paw through:
  1. Alan Ruttenberg used JSS to work with Apache POI as read-ms-docs which was released  as part of the lsw Semantic Web toolkit wrapping RDF(S) and OWL inference implementations in Java
  2. Mark Evenson extended Alan's work as part of reading to/from SQL to be able to roundtrip the data within Excel spreadsheets to Oracle databases.  The database reflected the pure data of the Excel document, using the assorted Microsoft "macros" to sanitize input.