Sunday, 7 November 2010

Unicode 6:XML Entities draft

Unicode 6.0 was published last month, and the proposals for Unicode 6.1 are firming up, both of these releases have significant new characters for mathematical use, so I have updated the Editors' draft of “XML Entity Definitions for Characters ”.

The main source file, unicode.xml has been updated to contain information for all characters in Unicode 6.0, and the provisional allocations for the Arabic Math Alphabets in Unicode 6.1. There is no change to the set of entity names or the MathML or HTML dtd derived from these sources.

Although this document is styled as an editors' draft for an update to the current recommendation, there are no immediate plans to publish a formal update to the W3C recommendation. However I hope to track changes to Unicode in this editors' draft, and perhaps once the proposals to add Arabic mathematical characters to Unicode are all processed, we may try to submit this for formal review as a Proposed Edited Recommendation.

Unicode 6.0

Most of the new characters in Unicode 6.0 are not directly related to Mathematics, although the large collection of “emoji” derived from characters used in the Japanese mobile phone industry provides some interesting characters that I'm sure could be used for mathematical operators (U+1F4A9 perhaps?). However there are some specifically mathematical characters including new heavy (ultra bold) plus and minus (U+2795 and U+2796) which may find use either in display contexts or as additional operators distinct from the usual plus and minus.

Unicode 6.1 (proposals)

The mathematical alphabets (bold, fraktur, double-struck, etc. ) that are in Unicode, and available as values in MathML 2's mathvariant attribute fit well with the mathematical traditions using the Roman and Greek alphabets but don't really work with other alphabets, notably Arabic.

Azzeddine Lazrek proposed that MathML and Unicode be extended with additional math alphabets corresponding to conventions used in Arabic typeset Mathematics. (initial, tailed, looped, stretched) these were added to MathML in the recently finalised MathML 3,0, and the corresponding code points have been allocated to Unicode (all in the block 1EE??) and planned to be standardised in Unicode 6.1. I have provisionally added this data to unicode.xml, and added a table showing the characters to the entities draft.