Friday 8 July 2011

Converting to Strict Content MathML

Some OpenMath History

MathML and OpenMath have always had a shared history and more or less documented ways of converting between them. Conceptually the conversion is very simple and OpenMath symbol abc could be expressed as the Content MathML symbol <csymbol>abc</csymbol>. Successive versions of MathML have in fact added features that made this conversion simpler and better information preserving. MathML2 added csymbol which is a better fit for OpenMath symbols than ci. In MathML1 and MathML2 further information about the OpenMath Symbol would have to be packaged into the definitionURL attribute. In MathML3 we added explicit support for recording Content Dictionaries by adding a cd attribute to csymbol.

Although the basic idea of the transformation was simple, the details of the transformation were complicated by a desire to map to the pre-defined MathML elements where available. So <OMS name="sin" cd="trans1"/> should map to the MathML element <sin/> rather than <csymbol>sin</csymbol>. The relationship between the predefined MathML Content MathML forms and the simpler, more regular, but much more verbose, OpenMath syntax was not formally specified by MathML so had to be specified as part of the transformation to OpenMath. An early version of such a transformation description is this 10 year old document still available from the OpenMath site. Conversion between MathML and OpenMath. Around the same time, the conversions were implemented in XSLT. The original versions predated XSLT 1, although the versions currently available from the OpenMath site use XSLT 2. Converting from OpenMath to MathML: om2cmml and from MathML to OpenMath: cmml2om.

MathML3 Strict Content MathML

The relationship between <sin/> and <csymbol>sin</csymbol> may rightfully be seen as a purely MathML issue and not something that should be a by-product of converting to the OpenMath form <OMS name="sin"/> and so MathML3 introduced, for each of its Content layout forms, an explicit rewrite rule expressing the construct in Strict Content MathML which is a restricted form just using csymbol. Section 4.6 of Chapter 4 of the MathML 3 spec specifies an explicit multi-pass algorithm applying these rewrite rules to convert any valid Content MathML expression into an expression just using the restricted Strict Content MathML vocabulary. 4.6 The Strict Content MathML Transformation.

Conceptually a conversion to Strict Content MathML could be made which first converted to OpenMath, then converted back to MathML using a stylesheet that removed all the special case rules originally added to om2cmml and documented in the OpenMath report referenced above. This was in fact implemented and is how the majority of the Strict Content MathML examples in Chapter 4 of the MathML spec were constructed. However there were some choices to be made in the mapping specification and in some cases the old OpenMath stylesheets made different choices. Where feasible I updated the stylesheets to match, but one essential structural difference remained. cmml2om implements a typical XSLT depth first walk over the input document, applying whichever template matches at that point. However the working group felt that the exposition of the algorithm was clearer if it was expressed as a multi-pass algorithm where each rule is applied in order over the whole tree, with the result being passed on to the next stage, to be rewritten by the next rewrite rule. These two approaches usually produce the same result, but in edge cases where the order of transformations matter they produce results that are (baring bugs) mathematically equivalent, but are structurally different.

As a requirement for MathML3 to proceed to W3C Recommendation status, we needed to show that the algorithm in Section 4.6 was implementable and did the right thing. It was clear that while my conversion via OpenMath was fairly reasonable it wasn't implementing the algorithm as specified and didn't produce the specified results in all cases.

C2S Implementation

Fortunately Robert Miner (co-chair of the Math WG) stepped up and offered to implement the algorithm as specified. It was good that he did, as inevitably, implementation experience showed some gaps or inconsistencies in the first drafts of the algorithm, and the final published form is much improved as a result of this implementation.

The initial home of the new stylesheet was in the W3C member area of the W3C CVS repository. Recently Robert suggested that we make it public, and asked if I'd host it at my google code web-xslt project site, since it is there and hosts other MathML related XSLT stylesheets.

So Robert's implementation is now available (under W3C or MIT licen[cs]e) from google code: c2s.

Comments on the stylesheet are probably best addressed to the www-math mailing list, but comments may also be dropped here on this blog or on the google code wiki pages.

No comments: