Sunday, 7 November 2010

Unicode 6:XML Entities draft

Unicode 6.0 was published last month, and the proposals for Unicode 6.1 are firming up, both of these releases have significant new characters for mathematical use, so I have updated the Editors' draft of “XML Entity Definitions for Characters ”.

The main source file, unicode.xml has been updated to contain information for all characters in Unicode 6.0, and the provisional allocations for the Arabic Math Alphabets in Unicode 6.1. There is no change to the set of entity names or the MathML or HTML dtd derived from these sources.

Although this document is styled as an editors' draft for an update to the current recommendation, there are no immediate plans to publish a formal update to the W3C recommendation. However I hope to track changes to Unicode in this editors' draft, and perhaps once the proposals to add Arabic mathematical characters to Unicode are all processed, we may try to submit this for formal review as a Proposed Edited Recommendation.

Unicode 6.0

Most of the new characters in Unicode 6.0 are not directly related to Mathematics, although the large collection of “emoji” derived from characters used in the Japanese mobile phone industry provides some interesting characters that I'm sure could be used for mathematical operators (U+1F4A9 perhaps?). However there are some specifically mathematical characters including new heavy (ultra bold) plus and minus (U+2795 and U+2796) which may find use either in display contexts or as additional operators distinct from the usual plus and minus.

Unicode 6.1 (proposals)

The mathematical alphabets (bold, fraktur, double-struck, etc. ) that are in Unicode, and available as values in MathML 2's mathvariant attribute fit well with the mathematical traditions using the Roman and Greek alphabets but don't really work with other alphabets, notably Arabic.

Azzeddine Lazrek proposed that MathML and Unicode be extended with additional math alphabets corresponding to conventions used in Arabic typeset Mathematics. (initial, tailed, looped, stretched) these were added to MathML in the recently finalised MathML 3,0, and the corresponding code points have been allocated to Unicode (all in the block 1EE??) and planned to be standardised in Unicode 6.1. I have provisionally added this data to unicode.xml, and added a table showing the characters to the entities draft.

Thursday, 21 October 2010

MathML 3.0 Recommendation

I'm pleased to report that MathML 3.0 is today published as a W3C Recommendation. Seven years to the day since the last MathML Recommendation, MathML 2.0 2nd edition was published on 21st October 2003.

So, what have we been doing for seven years?

My Working Group colleague, Neil Soiffer has just posted a blog entry describing some of the main features and linking to a nice summary of the main new features in MathML3, so I won't list them all in detail here, however some of the headline features are listed below.

Additions to control bi-directional layout, for Arabic styles in particular

The Arabic Note detailed some extensions to MathML 2.0 that would enable a richer variety of right-to-left layouts as used in typesetting Arabic mathematics. In MathML 3.0 this bi-directional control is fully integrated into the language, allowing the effect of each of the presentation layout elements to be specified in RTL and LTR modes.

Elementary math layouts (long division, etc)

Previously long division and long multiplication etc could be typeset using table layout and a lot of spacing adjustment but this was difficult to produce and very hard to process in an accessible way, by for example a speech renderer. MathML 3 introduces new elements (mstack and mlongdiv, principally) which enable a much more natural and accessible encoding of these layout schemes.

Linebreaking of mathematics

For the first time MathML 3 specifies control over automatic linebreaking and provides improved features for manual forced linebreaking and alignment of expressions.

Officially registered MIME Types

The mime type application/mathml+xml has been unofficially used for some time, but it is now officially registered (along with two more mime types specific to Presentation and Content MathML). These Mime types might prove particularly useful in systems that use MIME to label clipboard fragments.

Closer alignment between Content MathML and OpenMath

Chapter 4, describing Content MathML has been totally rewritten to provide a direct, explicit alignment with OpenMath. This alignment was always in the background (the two languages being developed at roughly the same time, by overlapping groups of people) but making the alignment explicit, and making small changes on both the MathML and OpenMath side has allowed many rough edges to be removed, and I think gives a much clearer presentation of the “semantics” underlying the Content MathML elements.

RelaxNG Schema

The normative DTD used for MathML 1 and 2 is replaced in MathML 3 by a Relax NG schema. this is much more expressive than a DTD, and allows many of the constraints that previously could only be expressed in words to be built in to the grammar. A DTD (and XSD schema) are still provided in the Math Working Group pages, as a convenience.

Many clarifications and improvements throughout the spec

If you stare at the MathML2 spec for 7 years you may notice that some parts are clearer than others.

Integration with HTML5

This is briefly mentioned in Chapter 6, but mainly specified in the HTML 5 draft specification. One of the main difficulties of using MathML on the web has always been that it was designed as an XML application to fit with XHTML, and using XHTML has proved to be far more difficult than envisaged in 1998 when XML started. Some notable browsers are only now starting to support XHTML in beta releases, and even in browsers with good XML/XHTML support, it is difficult to integrate XHTML with HTML based document systems (such as the blogger system hosting this blog). HTML5, allowing MathML (and SVG) in text/html systems will be a massive boost to getting Mathematics into web based systems.

A MathML for CSS Profile 2nd PR

A MathML for CSS Profile Has today been edited in place to confirm that it has passed all the ballots and implementation requirements to proceed to full W3C Recommendation status.

The final hurdle is that it has a normative dependency on CSS 2.1 which is currently at Candidate Recommendation stage, so MathML for CSS is blocked at Proposed Recommendation stage until CSS 2.1 reaches Recommendation. Hopefully early next year.

There are apparently several W3C Working drafts similarly blocked by CSS 2.1 (which has been at Candidate Recommendation stage for over a year). Hopefully the CSS Working Group will get the remaining issues resolved and testing reports done this year!

Tuesday, 10 August 2010

MathML3 PR

The W3C today published MathML3 and the MathML for CSS Profile as proposed recommendations.

PR is almost the final hurdle, and assuming none of the W3C members object at this stage, then we should be on course to reach Recommendation status in September.

As always, comments are welcome on, or via this blog.

Thursday, 10 June 2010

MathML3 2nd Last Call

A new draft of MathML3 has been released:

This is (again) a Last Call draft. There were sufficient changes and clarifications resulting from the Candidate Recommendation review draft that we decided to issue a Last Call again, to allow people to comment on the changes before (hopefully) we progress to Proposed Recommendation status.

We have provided a Diff marked version and a List of changes to make it easier to review the changes since the CR draft.

Friday, 28 May 2010

STIX Fonts 1.0

15 years in the making, they finally released the fonts!

Thursday, 1 April 2010

XML Entity Definitions for Characters

The W3C today published XML Entity Definitions for Characters as a Recommendation (i.e., Web standard). It's taken a long time (over a decade) but I hope you find them useful...

Thursday, 11 March 2010


Mostly these days I suppose I'm known for MathML and XSLT stuff. Before that I used TeX rather a lot (I must check one of these days whether I have yet posted as often to xsl-list as I have to comp.text.tex). Around the time I started to use TeX (1987 or thereabouts) I was mainly programming in the functional programming language Standard ML. I haven't done so much with ML since, but the ML family of languages have been having something of a renaissance recently, especially with Microsoft's F# language.

So I've been returning to my roots and doing a bit of functional programming over the last few days, writing an article hosted at NAG on calling the NAG library from F#.

Sunday, 21 February 2010

XML Entities: W3C Vote

As noted in a previous post, XML Entity Definitions for Characters is now a Proposed Recommendation.

To help the W3C make the decision to make the final transition to Recommendation status the Advisory Committee members are asked to submit comments. If you are in a W3C member company, please ask your AC representative to comment via the form
by the 11th of March!

Thursday, 11 February 2010

Entities: Proposed Recommendation

I'm pleased to be able to report that today the W3C has published
XML Entity Definitions for Characters
as a Proposed Recommendation.

It's only really a list of names and numbers, but it seems to have taken up a large chunk of my life since I accidentally fell into supporting these things.

Thursday, 21 January 2010

MathML on the Clipboard

I got a new machine at work with Windows 7 on it.

One of the more interesting applications coming with Windows 7 is the Math Input Panel. This is designed for pen input on a tablet-style device and performs pretty impressively accurate recognition of mathematical expressions. While designed for a tablet, it also works pretty well if you are just “writing” the expression with a finger on a small laptop trackpad, which is how I have been using it.

The Math Input Panel is designed with a very simple interface with virtually no customisation options. It offers no way of saving the expressions generated and just offers a simple insert button that tries to insert the math expression at the insertion point in a currently open application. This works well for Word 2007 which accepts MathML from the clipboard and transparently converts it to its internal form and renders it, but other more generic tools such as XML editors that could use the MathML do not accept MathML from the clipboard in this way. Unlike MathPlayer or Word, The Math Input Panel doesn't offer fallback text representations of the XML markup on the clipboard. Marko Panic, the program manager for the development of this tool confirmed to me that this was a design decision as they didn't want the end user to be faced with raw XML. This is not unreasonable but not what I wanted personally (I like to see my XML raw:-). Marko confirmed that the MathML is on the clipboard and it should be possible to extract it with a few lines of code, or if I wanted a more extensive customisation there was documentation of the API offered by the underlying DLL available at

I decided to brush up my C# forms programming and produced a small form that shows any MathML on the clipboard. The main code (everything apart from the boilerplate Visual Studio files) is available on google code While it's particularly useful to see the MathML generated by the Math Input panel, it also works with other applications, notably MathPlayer and Word, that place MathML on the clipboard.

While looking via Google for some programming tips on my form, I came across a very similar blog posting from last year. That form had some differences though (displaying the IE folding tree view of the XML) so I completed my form here. The screenshot shows the Math Input Panel interpreting my appalling handwriting, and the mmlclipboard form displaying the generated MathML.