FUD - OOXML Protection of Old Word Documents

There has been a lot of commentary over the last few days about whether or not OOXML is a vehicle for ensuring a whole lot of old binary formatted documents will be protected. In other words, whether we will be able to accurately represent our historical archives in the future because OOXML is/is not a standard.

Let's be very clear about this. OOXML does not provide this capability. Indeed, if this were Microsoft's key concern there would be much better ways of achieving that goal.

The issue of parsing the billions of legacy binary documents is addressed outside of the OOXML ECMA standard in binary specifications provided by Microsoft under various licences (which are “open depending your definition of open” as Gray Knowlton, MS Office Product Manager, said at the StandardsNZ workshop on OOXML). This particular issue is not open for interpretation — there is no information on parsing binary formats in the current OOXML spec, and there is no stated plan anywhere for legacy binary parsing to be added to OOXML. If that were the plan OOXML would need to be resubmitted to the standards process for complete technical reevaluation.

Do not confuse this issue with binary format quirks in OOXML through the use of XML nodes like autoSpaceLikeWord95. OOXML does not currently provide enough detail for this although Gray Knowlton has said this will be addressed in a later revision.

In fact, the whole approach from Microsoft is to accept that the OOXML standard is technically very deficient, but to just to promise to fix it. That's asking us to buy a pig in a poke - just endorse a standard full of problems and we'll sort it out later. If Microsoft, through ECMA, does address all the technical problems that we have ientified - and that Microsoft agreed at the meeting - we would end up with a very different standard, which again should be subject to full technical reevaluation.

Microsoft are being very cynical about the way they are presenting this issue. The fact that they have failed to convince National Archives and a whole swag of Government agencies speaks volumes.

Note - For those looking to convert legacy binary formats into OpenDocument there’s mature conversion software libraries like Sun’s ODF Toolkit (whose code base goes back through 20 years of erratic behaviour!), and locally we have Matthew Cruikshank's Docvert.org software. There’s a long list here.