Tuesday 12 June 2007

XML position at NAG

NAG is looking for someone to work in the XML Technologies Group, that is, "my" group. So if you are interested, please contact the address specified in the above posting.

Tuesday 5 June 2007

The Big Switch

There was a thread a while ago on xsl-list discussing when was a good time to switch from XSLT1 to XSLT2. The consensus seemed to be, "it depends...."

For NAG, the answer is NOW!

After a lot of regression testing, staring at diff files and crossing of fingers, we just switched the processor for our main stylesheets over from saxon 6 to saxon 8.

This is quite a large (set of) stylesheets (around 26K lines of XSLT) so is quite a substantial test of XSLT2's compatibility with XSLT1. The results were pretty good, although it's not exactly surprising as an earlier version of these stylesheets were used to request improvements to backward compatibility mode as defined in an earlier draft, all the main changes requested in that report were made.

We are taking a multi-stage process to switching to version 2:

  1. Import an XSLT2 stylesheet that defines a few extension functions in the saxon 6 extension namespace using xsl:function and standard function calls. saxon:node-set defined as identity function, saxon:tokenize defined using XPath2 tokenize, saxon:distinct defined using XPath2 distinct-values etc.
  2. Process the stylesheets with saxon 8 over the NAG Library documentation for Fortran and C and check the results were the same.
  3. The only significant difference was due not so much to a change of language but to the change of processor. Saxon has changed its default ordering for xsl:sort. As documented, adding lang="en" reverted the behaviour.
  4. Start using XSLT2 features in the stylesheet, and start removing the existing calls to saxon6 extension functions. (This is where we are now.) This will eventually remove the need for the functions defined in step 1.
  5. (Some time soon.) Change the specified version on the stylesheets from 1.0 to 2.0. This will turn off BC mode and will require another round of regression testing. As noted in the old email above, the stylesheets often pass parameters to named templates that do not define those parameters, which will become an error. this will be easy to find as the error is fatal and so you just fix them. Harder will be finding all the places where XSLT1's implicit "first node in document order" semantics have been used. Fortunately in our case we have a large, but relatively stable document collection to process, so it's feasible to make this change and then run automated comparisons over the results of processing the entire collection in the two modes.

This is far from being the first use of XSLT2 at NAG, but it's the first time we've switched a stylesheet collection from version 1 to version 2, so far things have gone pretty smoothly...