Opentopia Directory Encyclopedia Tools

Comparison of OpenDocument and Microsoft XML formats

Encyclopedia : C : CO : COM : Comparison of OpenDocument and Microsoft XML formats



 

Both OpenDocument and Open XML embed XML files and other (binary) media files together in compressed archives files (ZIP/JAR), where they use the XML files as the office document data source and markup language and support common media types such as GIF, JPEG, PNG, W3C, MathML and Dublin Core metadata. There are significant differences in details.

Advantages of OpenDocument over Microsoft XML formats

Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship argue that OpenDocument has several technical advantages over Microsoft XML (Hudson, 2005[Hudson, 2005]):

  1. OpenDocument uses a mixed content model, whereas the MS XML format does not. "Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing). This sort of mismatch leads to awkward markup... The mixed-content model makes more sense, and is closer to what a developer will be familiar to."http://opendocumentfellowship.org/introduction/odf_vs_oxml_part_II
  2. OpenDocument is similar to XHTML, while MS XML is not. It uses mixed content and marks styles in a similar way. This makes it easier to transform data accurately between OpenDocument and XHTML, and also simplifies the reuse of existing skills.
  3. OpenDocument gives better separation between content and presentation. "Both formats give you some separation, and neither format gives you perfect separation. But OpenDocument goes much further in that direction."http://opendocumentfellowship.org/introduction/odf_vs_oxml_part_II
  4. OpenDocument hyperlinks are designed to be easy to process (they use XLink-namespace property, and do not require processing a separate file).#redirect [[Template:Fact]]
  5. OpenDocument reuses existing standards whenever possible. It uses SVG for drawings, MathML for equations, XLink for linking, Dublin Core for metadata, etc. "This makes the format infinitely more transparent to someone familiar with XML technologies. It also allows you to reuse existing tools that understand these standards." #redirect [[Template:Fact]]
:In contrast, critics say that the Microsoft XML formats do not use appropriate standards, but reinvent almost everything, imposing significant additional costs to translate them to standard formats. However, there are two standards that Microsoft's Office Open XML formats do use: MathML and Dublin Core.

Advantages of Microsoft XML formats over OpenDocument

  1. Microsoft has stated that a design goal for its formats was 100% compatibility with the existing base of documents and formatting used by its customers. This opens up legacy documents for better archiving.
  2. Microsoft states that the OpenDocument format lacks support for the complete set of functionality in Microsoft Office applications (such as VBA and OLE support, support for highlighting, and other features), so any converter that saved into OpenDocument format would necessarily be lossy.#redirect [[Template:Fact]] No technical analysis is publicly available that confirms or refutes the claim.
  3. Microsoft Excel has a well-defined, well-known formula language that has been brought in its entirety into the new XML formats, whereas OpenDocument provides no such specification. MS Office program manager Brian Jones notes on his [Open XML blog] that the Open XML draft specification has about 200 pages on the subject, whereas the OpenDocument specification has a few lines.
  4. OpenXML supports several non-western languages better than ODF. More specifically it has good Arabicization (a weakness in ODF) and Internationalization .
  5. The OpenXML spreadsheet format appears to be much faster than the ODF spreadsheet format. [ZDNet has tested] both XML formats in their native applications OpenOffice.org 2.0 and MSOffice 2003. Allthough that could be due to poor implementations by OO it seems relevant that Microsoft has a long lasting experience in performance enhancing of their spreadsheet application Excel. Of course Microsoft's Brian jones had added some [info] on this subject as wel. Note also that the native propriaty Excel XLS binary format appears to be much faster than both XML implementations.
  6. All external references, such as hyperlinks or linked files, reside in a single relationships XML file contained in the document archive. This allows for easy access to all external references in the document. This makes it much easier to do link fix-up if your moving files from one server to another. Or if you want to remove all external references for security reasons, you just edit the relationships.

Custom XML schema definitions (XSDs)

The Valoris Report noted that Microsoft's XML format supported custom XML schema definitions (XSDs), while OpenDocument did not. XSDs made it "possible to attach one or more custom schemas to a given Word document. It allows the users to annotate the document with the elements found in the attached schemas.

In response to this, the OASIS OpenDocument TC added XForms to the OpenDocument specification. This is meant to provide equivalent functionality while reusing relevant standards and hence reducing the potential loss of interoperability.

It is extremely controversial, however, whether these embedded XSDs are actually an advantage or not. Proponents of OpenDocument claim that it is not valuable and possibly harmful, while some users of Microsoft's formats find it highly beneficial:

In many eyes, combining custom XML with the presentation XML impedes rather than aids interoperability, which would be a serious problem since the whole point of these standards is to promote interoperability.#redirect [[Template:Fact]]

A contrasting opinion is that such markup allows documents to take on semantic meaning that is not possible with "presentation-only" approaches. In this view, for example, an invoice can have both presentation as well as meaning for each of its elements such as quantity, item description, part number, total, etc. Such a document can be consumed by a process that wants to glean information from a database full of such documents. Since such data is often unique to a single entity such as an individual corporation with a specific database design, the XML needs to be free to be defined by the user and cannot be a priori determined by a standards body. On the other hand, this functionality is essentially "metadata" (data about data) and others believe it makes more sense to use a standard designed for metadata, like RDF.

The OASIS OpenDocument committee had considered adding this direct capability before public review of OpenDocument began, but decided not to, saying that this was "not essential for the current version of the specification." Conversely, others have claimed that this is a primary value of an XML format - to carry business data that is machine readable in context, rather than divorce it from the document or keep documents purely about generic presentation.

Instead, the OpenDocument Technical Committee developers embedded the ability to support XForms. XForms is a W3C standard, and supports the ability to handle custom schemas but add additional constraints that are believed to counter the perceived weaknesses of custom schemas. In addition, the OASIS OpenDocument committee has created a metadata sub-committee to expand metadata support (e.g. RDF) to OpenDocument. Adding RDF to OpenDocument would satisfy the use scenario above while avoiding the loss of interoperability that custom schemas present.

The above discussion of Microsoft's XML formats covers primarily the Office 2003 formats (and more specifically the Word2003 format). The formats produced by the upcoming "Office 2007" (currently in beta) are significantly different in approach. For example, a major difference is that the custom XML is stored as its own "part" in a ZIP container and the presentation XML contains references to that custom schema so values can be mapped to the document when it is loaded.

Cross-platform interoperability

The Valoris Report notes that using XML does not guarantee that the result is portable across heterogeneous platforms with full preservation of semantics. The report points out that OpenDocument was designed to support cross-platform interoperability. In contrast, the Valoris Report had reservations about the ability of Microsoft's XML formats to support true cross-platform interoperability, which many view as a fundamental requirement of any exchange format. Microsoft's XML format has not yet been fully reviewed by independent parties for its ability to support interoperability, and it was designed to only support one product which runs on a single platform. Nevertheless, OpenOffice 2.0 currently supports the Word2003 "WordprocessingML" format.

Microsoft submitted the "office 2007" formats to the ECMA standards body for full documentation and has indicated it will take them to ISO once the ECMA process is complete. Microsoft has also indicated that Macintosh versions of its applications will support these new formats, and Corel has expressed plans to support these formats in its applications.

The report noted that the Microsoft schemas can contain proprietary objects; they may be encoded in a standard-compliant fashion, but if some of them can only be executed on a Microsoft environment (e.g., OLE) the result is not interoperable. It was also reported that the "spreadsheet macros are spread within the content XML elements. It is therefore very difficult to isolate the code from the text by a third-party program. Furthermore, these macros cannot be executed outside the MS-Office environment." The same criticism applies to OLE objects in OpenDocument files, which are also necessarily encoded in binary and not XML, and suffer the same issues - primarily because there is no other sensible implementation#redirect [[Template:Fact]].

Supporters of OpenDocument point to the independent implementations in the KOffice and OpenOffice.org codebases as evidence that the format is inherently interoperable. KOffice, which was implemented independently by KDE developers, was the first office suite to provide broad support for OpenDocument.KOffice (2005). [KOffice 1.4 Announcement] Supporters of OpenDocument also claim that although OpenOffice 2.0 has already implemented Word 2003's XML format, Microsoft's Open XML must necessarily have interoperability weaknesses, and that the depth of those weaknesses is not yet fully known.

Example XML comparisons

First an example of the mixed vs non mixed examples as provided in the groklaw comparison of the two formats Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing).

Non-Mixed (Open XML)

MS XML 


This is a 





very basic


 document 





with some


 formatting, and a 






hyperlink


Mixed (ODF):

 

This is a 
very basic document  with some 
formatting, and a hyperlink

Secondly an example ([provided by Brian Jones weblog]) to support Microsoft's choice for smaller tagging. For this example, the top example is using SpreadsheetML from the Ecma Office Open XML format. The second example is using the OpenDocument format.

Short tag example (Open XML):

123
456 
Long tag example (ODF):
12 3
45 6 

References

See also

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.


Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: