81
Satisfy Your Technical Curiosity Open XML Deep Dive Open XML Deep Dive Doug Mahugh Doug Mahugh Technical Evangelist, Technical Evangelist, Microsoft Microsoft http://blogs.msdn.com/dma hugh

Open XML Deep Dive

Embed Size (px)

Citation preview

Page 1: Open XML Deep Dive

Satisfy Your Technical Curiosity

Open XML Deep DiveOpen XML Deep Dive

Doug MahughDoug MahughTechnical Evangelist, MicrosoftTechnical Evangelist, Microsoft

http://blogs.msdn.com/dmahugh

Page 2: Open XML Deep Dive

Satisfy Your Technical Curiosity

Application type: Document AssemblyServer environment: Linux, Java, Apache, MySqlDesktop environment: Office 2007

Page 3: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Session ObjectivesSession ObjectivesSatisfy your curiosity about Open XML:Satisfy your curiosity about Open XML:

ArchitectureArchitectureThe three main Open XML schemasThe three main Open XML schemasDevelopment optionsDevelopment optionsCustom XML supportCustom XML supportDevelopment scenariosDevelopment scenarios

Page 4: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Today is the tip of the icebergToday is the tip of the iceberg

Comprehensive 2-day Open XML Developer Comprehensive 2-day Open XML Developer workshop scheduled for Belgium on May 21workshop scheduled for Belgium on May 21Contact Imma Verheyen, Partner Development Contact Imma Verheyen, Partner Development Manager: Manager: [email protected]

Page 5: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Diverse EnvironmentsDiverse EnvironmentsAll you need is ZIP and XML supportAll you need is ZIP and XML support

Linux Java Microsoft COM

ZIP LibraryMinizip

zLib

J2SEjava.util.zip

.NET Framework 3.0

System.IO.Packaging *

Xceed .NET controls

Xceed ActiveX controls

XML Library Apache Xerces JAXP .NET Framework 3.0System.Xml MSXML

* Also includes abstractions for OPC concepts (Open Packaging Convention)

Page 6: Open XML Deep Dive

Satisfy Your Technical Curiosity

Scenario ExampleDocument AssemblyServer-based or user-assisted construction of documents from archived content or database content.

Create sales reports from financial and forecast data stored in a CRM system.

Integration & Content ReuseMuch easier to move content between documents, including different document types.

Quickly and efficiently apply content stored in Word documents to Web pages.

Document SanitizationRemove unwanted content like comments, embedded code or potentially sensitive items from your document when appropriate.

Remove all tracked changes and comments from a Word document before it is published.

Document InterrogationQuery document repositories based on custom data, content types or document metadata.

Search for all documents containing a specific company name or sales contact.

Content TaggingAdding a tagging schema to content can dramatically improve content searches and the value of the data stored in documents.

Organizations can create their own smart tags then use them as the basis for searches.

Document ArchivalEnsuring document formats can be consumed long into the future without vendor-specific clients or applications.

XML-based document archives include the data and presentation information.

Development ScenariosDevelopment Scenarios

Page 7: Open XML Deep Dive

Satisfy Your Technical Curiosity

XML in Office: the last 10 XML in Office: the last 10 yearsyears

Office 2000Early InnovationXML Document Properties

Office 97Existing binary file formats designed in 1994, launched in Office 97

Office XPFirst XML FormatsSpreadsheet XML

Office 2003Breakthrough XML SupportWordProcessingML, SpreadsheetMLCustom-defined schema

2007 Office systemNew XML-based FormatsXML File format DefaultXML PowerPoint Format

Page 8: Open XML Deep Dive

Satisfy Your Technical Curiosity

Open XML ArchitectureOpen XML Architecture

WordprocessingML SpreadsheetML PresentationML

ZIP XML + Unicode

DrawingML

Content Types

Custom XML Bibliography

Shared Vocabularies

Relationships

Metadata

DigitalSignatures

VML (legacy) Equations

Markup Languages

Open Packaging Convention

Core Technologies

Page 9: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Open Packaging ConventionOpen Packaging ConventionLow-level conventions that define the structure of Low-level conventions that define the structure of an Office Open XML documentan Office Open XML document

Also used by XPS, and some third-party Also used by XPS, and some third-party implementations are under developmentimplementations are under development

Key concepts: package, parts, relationships, and Key concepts: package, parts, relationships, and content typescontent types

Page 10: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

PartsParts

Stored inside the package in a specific locationStored inside the package in a specific locationReachable via a URIReachable via a URIAssociated with a specific content typeAssociated with a specific content type

Often XML, but can be of any defined content type (including custom types)Often XML, but can be of any defined content type (including custom types)

Page 11: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

CCoontent Typesntent Types

Every part must have a content typeEvery part must have a content typeMost OXML parts are content type XMLMost OXML parts are content type XMLConsumers support a specific set of content Consumers support a specific set of content typestypes

You can define custom content types, and You can define custom content types, and consumers will preserve them – this is a key consumers will preserve them – this is a key area of opportunity for developer innovationarea of opportunity for developer innovation

Page 12: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

RelationshipsRelationships

Tie elements inside the package to each otherTie elements inside the package to each other

Allow you to step through the document without Allow you to step through the document without parsing partsparsing parts

Are required: Are required: a part without a relationship is not a part without a relationship is not part of the package, and may be discardedpart of the package, and may be discarded

Page 13: Open XML Deep Dive

Satisfy Your Technical Curiosity

OPC is a OPC is a LogicalLogical Structure Structure

Files and folders – NO!Files and folders – NO!These details may vary.These details may vary.

Parts should be referenced by Parts should be referenced by their their relationship type.relationship type.

Page 14: Open XML Deep Dive

Satisfy Your Technical Curiosity

Reference SchemasReference SchemasDisplay-orientedDisplay-orientedEnables Enables technicaltechnical interoperability interoperability

Custom-defined SchemasCustom-defined SchemasData-orientedData-orientedEnables Enables semanticsemantic interoperability interoperability

Brian Jones, ODC2006

Types of InteroperabilityTypes of Interoperability

Page 15: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

WordprocessingMLWordprocessingMLDocument Document aarchitecturerchitecture

Document

bodyproperties

fontTable

headers/footers

images

numberingDefinitions

styles

customXML

footnotes/endnotes

comments

Page 16: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Paragraphs, Runs and TextParagraphs, Runs and TextHow text is stored in wordprocessingMLHow text is stored in wordprocessingML

The document elementThe document element• Contains a body elementContains a body element

• Contains paragraphsContains paragraphs• Contains runsContains runs

• Contains text elementsContains text elements<document> <body> <p> <r> <t>HELLO!</t> </r> </p> </body></document>

Page 17: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 18: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Direct Formatting ExampleDirect Formatting ExampleSimple formatting at paragraph/run levels:Simple formatting at paragraph/run levels:

Paragraph properties specify bold (default for the entire paragraph)

<w:p> <w:pPr> <w:b/> </w:pPr> <w:r> <w:t>The quick</w:t> </w:r> <w:r> <w:rPr> <w:i/> </w:rPr> <w:t>brown</w:t> </w:r> <w:r> <w:t>fox.</w:t> </w:r></w:p>

Run properties specify italics (override for this run)

Page 19: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Paragraph PropertiesParagraph PropertiesCan be set directly or in a paragraph styleCan be set directly or in a paragraph style24 total property settings24 total property settings

<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>

Page 20: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Define formatting forDefine formatting forindividual charactersindividual charactersFont attributes, size/position,Font attributes, size/position,other settingsother settings24 total properties24 total properties

Run PropertiesRun Properties

<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />

Page 21: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Text Text <w:t><w:t>

The only element in the main story that can The only element in the main story that can contain text contain text – all other text is in attributes– all other text is in attributesThree other types of text are allowed in runs:Three other types of text are allowed in runs:

Deleted text Deleted text <w:delText><w:delText>Field code Field code <w:instrText><w:instrText>Deleted field codes Deleted field codes <w:delInstrText><w:delInstrText>

By looking to <w:t> nodes, you can be sure By looking to <w:t> nodes, you can be sure you’re seeing only displayed textyou’re seeing only displayed text

Page 22: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Revision IDs (RSIDs)Revision IDs (RSIDs)RSID values are used to identify a set of RSID values are used to identify a set of changes that were made during the same changes that were made during the same editing sessionediting sessionFound in many elements:Found in many elements:

Paragraphs, runs, sections, stylesParagraphs, runs, sections, stylesTable rows, table properties, charts, diagramsTable rows, table properties, charts, diagrams

Allows for merging revisions, without the Allows for merging revisions, without the privacy and security issues involved in tracking privacy and security issues involved in tracking who who changed changed whatwhatOptional, but recommended for applications Optional, but recommended for applications that modify existing documentsthat modify existing documents

Page 23: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

ImagesImagesAn image is a An image is a w:pictw:pict element inside a run element inside a run <w:r><w:r>The The v:imagedatav:imagedata element is defined in VML: element is defined in VML:

xmlns:v="urn:schemas-microsoft-com:vml"xmlns:v="urn:schemas-microsoft-com:vml"

The actual image is referenced via a relationship:The actual image is referenced via a relationship:

The relationship points to an image part in the package:The relationship points to an image part in the package:

<w:pict> <v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:250; height:200"> <v:imagedata r:id="rId4"/> </v:shape></w:pict>

<Relationship Id="rId4” Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image” Target="image1.jpg"/>

Page 24: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 25: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

TablesTablesTables are a set of paragraphs which are Tables are a set of paragraphs which are arranged into rows and columnsarranged into rows and columns

In WordprocessingML, tables are block level In WordprocessingML, tables are block level content, and are specified using the content, and are specified using the tabletable elementelement

Analogous to the HTML <table> elementAnalogous to the HTML <table> element

Page 26: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

What’s in a table?What’s in a table?

PropertiesPropertiesGridGridRowsRowsCellsCells

<w:tbl>

<w:tblPr> <w:tblStyle w:val=“TableGrid”/> <w:tblW w:w=“0” w:type=“auto”/> <w:tblLook w:val=“01E0”/> </w:tblPr>

<w:tblGrid> <w:gridCol w:w=“2952”/> <w:gridCol w:w=“2952”/> <w:gridCol w:w=“2952”/> </w:tblGrid>

<w:tr>

<w:tc> <w:tcPr> <w:tcW w:w=“2952” w:type=“dxa”/> </w:tcPr> <w:p> <w:r> <w:t>1,1</w:t> </w:r> </w:p> </w:tc> <w:tc> <w:tcPr> <w:tcW w:w=“2952” w:type=“dxa”/> </w:tcPr> <w:p> <w:r> <w:t>1,2</w:t> </w:r> </w:p> </w:tc> </w:tr></w:tbl>

Page 27: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

StylesStylesA A style style defines a specific set of values for formatting properties that may be applied as a single logical unitdefines a specific set of values for formatting properties that may be applied as a single logical unit

For example, the Normal style in Word 2007 defines these formatting properties:For example, the Normal style in Word 2007 defines these formatting properties:Font = Calibri (body)Font = Calibri (body)Font Size = 11 pointFont Size = 11 pointFont Language = Word default (as configured by user)Font Language = Word default (as configured by user)Justification = LeftJustification = LeftLine Spacing = SingleLine Spacing = SingleWidow/Orphan controlWidow/Orphan control

Page 28: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Style TypesStyle TypesWordprocessingML supports six style types:WordprocessingML supports six style types:

Paragraph stylesParagraph stylesCharacter stylesCharacter stylesLinked stylesLinked stylesTable stylesTable stylesList stylesList stylesDefault style (linked type, but applies when no style Default style (linked type, but applies when no style specified)specified)

Page 29: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Paragraph Styles ExampleParagraph Styles ExampleStep 1: define a paragraph styleStep 1: define a paragraph style

Styles are defined in the style part:Styles are defined in the style part:

Paragraph Properties

Character (Run) Properties

Common Properties

<w:style w:type=“paragraph” w:styleid=“TestParagraphStyle”>

<w:name w:val=“Test Paragraph Style”/> <w:qformat/> <w:rsid w:val=“009E253E”/>

<w:pPr> <w:pStyle w:val=“TestParagraphStyle”/> <w:spacing w:line=“480” w:lineRule=“auto”/> <w:ind w:firstLine=“1440”/> </w:pPr>

<w:rPr> <w:rFonts w:ascii=“Algerian” w:hAnsi=“Algerian”/> <w:b/> <w:color w:val=“ED1C24”> <w:sz w:val=“40”/> </w:rPr>

</w:style>

Page 30: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Paragraph Styles ExampleParagraph Styles ExampleStep 2: apply the style to a paragraphStep 2: apply the style to a paragraph

The pStyle element associates a style with a The pStyle element associates a style with a paragraph:paragraph:

The paragraph is displayed with the style applied:The paragraph is displayed with the style applied:

<w:p> <w:pPr> <w:pStyle w:val=“TestParagraphStyle”/> </w:pPr> <w:r> <w:t>Text</w:t> </w:r></w:p>

Page 31: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Numbering StylesNumbering StylesFlexible hierarchical definitionFlexible hierarchical definition

Numbering styles are styles which define the Numbering styles are styles which define the structure of a multi-level numbering formatstructure of a multi-level numbering formatNumbering definition instances are based on an Numbering definition instances are based on an abstract numbering definitionabstract numbering definitionAbstract numbering definitions define paragraph Abstract numbering definitions define paragraph properties for up to 9 hierarchical levelsproperties for up to 9 hierarchical levelsNOTE: items in a list are simply paragraphs. There NOTE: items in a list are simply paragraphs. There is no list “container” as in HTML.is no list “container” as in HTML.

Page 32: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Table StylesTable StylesA A table style is associated with a table via the table style is associated with a table via the tblStyle tblStyle element in the table properties:element in the table properties:

<w:tbl> <w:tblPr> <w:tblStyle w:val=“Style20”/> <w:tblW w:w=“5000” w:type=“pct”/> <w:tblLook w:val=“0220”/> </w:tblPr> … tblGrid, table rows and cells …</w:tbl>

Table style Style20 is applied to the table

Page 33: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Style Application HierarchyStyle Application HierarchyDirect formatting overrides style settingsDirect formatting overrides style settings

Table

Paragraph

Character

Direct Formatting

Numbering

Document Defaults

Page 34: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 35: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SubdocumentsSubdocumentsMechanism for “rolling up” documentsMechanism for “rolling up” documents

Subdocuments are well-formed Open XML Subdocuments are well-formed Open XML documents and can be edited independentlydocuments and can be edited independentlySubdocuments don’t know they’re part of Subdocuments don’t know they’re part of something bigger – they’re just stand-alone something bigger – they’re just stand-alone documentsdocuments

Page 36: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SubdocumentsSubdocumentsImplementation detailsImplementation details

Main document part contains Main document part contains subDocsubDoc elements that indicate where to elements that indicate where to insert subdocumentsinsert subdocumentsThe subdocument’s location is stored in a relationshipThe subdocument’s location is stored in a relationship

<w:body> <w:subDoc r:id=“rId1”/> <w:subDoc r:id=“rId2”/> <w:subDoc r:id=“rId3”/>

<Relationship Id=“rId1” Type=“…/subDocument” Target=“Part1.docx” TargetMode=“external”/><Relationship Id=“rId2” Type=“…/subDocument” Target=“Part2.docx” TargetMode=“external”/><Relationship Id=“rId3” Type=“…/subDocument” Target=“Part3.docx” TargetMode=“external”/>

Main document part:

Relationships:

Page 37: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Document SectionsDocument Sections

A document may be divided into sectionsA document may be divided into sectionsAllows formatting at a higher level than Allows formatting at a higher level than paragraphs:paragraphs:

Landscape/portrait orientationLandscape/portrait orientationPage margins, etc.Page margins, etc.

Section properties are defined in Section properties are defined in sectPrsectPr::<w:sectPr> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440“ w:left="1800“ w:header="720" w:footer="720" w:gutter="0"/> <w:cols w:space="720"/> <w:docGrid w:linePitch="360"/></w:sectPr>

Page 38: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Section PropertiesSection PropertiesExampleExample

In Word, section properties are In Word, section properties are specified in the Page Setup dialogspecified in the Page Setup dialog

<w:sectPr>  <w:pgSz w:w="12240" w:h="15840" />  <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" />   <w:cols w:space="720" />   <w:docGrid w:linePitch="360" />   </w:sectPr>

Page 39: Open XML Deep Dive

Satisfy Your Technical Curiosity

Custom XML SupportCustom XML Support

Merging the worlds of documents and dataMerging the worlds of documents and data

Page 40: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Why Custom XML?Why Custom XML?Enables Enables semantic semantic interoperabilityinteroperability

Documents can provide a rich view of back-end dataDocuments can provide a rich view of back-end dataDocuments can update back-end data sourcesDocuments can update back-end data sources

Exposes business data within documents to Exposes business data within documents to heterogenous systemsheterogenous systemsBusiness-specific semantics can be applied to Business-specific semantics can be applied to document datadocument dataSeparates presentation and dataSeparates presentation and data

Custom XML schema support was a key design Custom XML schema support was a key design objective for Open XML: objective for Open XML: any schema any schema can be used can be used in Open XML documents.in Open XML documents.

Page 41: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Custom XMLCustom XMLDeveloper options for custom XML supportDeveloper options for custom XML support

Page 42: Open XML Deep Dive

Microsoft Confidential

Custom-defined XML isCustom-defined XML isstored in its own discrete partstored in its own discrete part

Any XML can be stored, withAny XML can be stored, withor without a schemaor without a schema

Only one requirement:Only one requirement:must be well-formed XMLmust be well-formed XML

External applications (client/server) can process External applications (client/server) can process the store or populate the storethe store or populate the store

Document Template

Visualdocument

partsXMLdata

External System

Page 43: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Custom XML PropertiesCustom XML Properties

Information about a custom XML part is stored Information about a custom XML part is stored in a in a custom XML properties custom XML properties partpartStored via an implicit Stored via an implicit customXmlProps customXmlProps relationship from the custom XML partrelationship from the custom XML partContains two types of information:Contains two types of information:

Part IDPart IDUniquely identifies a part within a documentUniquely identifies a part within a documentMaintained through editing sessionsMaintained through editing sessions

XML Schema referencesXML Schema references

Page 44: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Structured Document TagsStructured Document TagsKnown as "content controls" in MS-OfficeKnown as "content controls" in MS-Office

Smart tags and custom XML markup add semantics, Smart tags and custom XML markup add semantics, but do not have any effect on presentationbut do not have any effect on presentationSometimes you Sometimes you want want to affect presentationto affect presentation

Data-entry restrictions, multi-select, etc.Data-entry restrictions, multi-select, etc.

Solution: the structured document tag Solution: the structured document tag <sdt><sdt>

Page 45: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Types of Content ControlsTypes of Content Controls

Plain textPlain textComboboxComboboxDropdown listDropdown listDocument building blockDocument building blockDate pickerDate pickerRich textRich textPicturePicture

Page 46: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Data BindingData Binding

2-way synchronization between:2-way synchronization between:Content controls (structured document tags)Content controls (structured document tags)Custom XML nodes (data in Custom XML nodes (data in your schemayour schema))

Page 47: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 48: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Data Binding BasicsData Binding BasicsHow to bind xml nodes to structured document tagsHow to bind xml nodes to structured document tags

Add a Add a <dataBinding> <dataBinding> element to the structured element to the structured document tag properties document tag properties <sdtPr><sdtPr><dataBinding><dataBinding> specifices a custom Xml part (by Custom specifices a custom Xml part (by Custom XML Data Identifier) and an Xpath to a specific node XML Data Identifier) and an Xpath to a specific node within that partwithin that part

Custom XML Data Identifier? What’s that?Custom XML Data Identifier? What’s that?The custom XML part has a properties partThe custom XML part has a properties part

Implicit relationship in Implicit relationship in customXmlPart.xmlcustomXmlPart.xml.rels.relsThe properties part specifies a Custom XML Data IdentifierThe properties part specifies a Custom XML Data Identifier

Page 49: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Content Control ToolkitContent Control Toolkit

Open-source developer toolOpen-source developer toolhttp://www.codeplex.com/Wiki/View.aspx?ProjectName=dbe

Automatically generates Automatically generates parts, relationships, and parts, relationships, and markup to bind custom XML markup to bind custom XML parts to content controlsparts to content controls

Page 50: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Custom XML MarkupCustom XML MarkupTagging document content with custom semanticsTagging document content with custom semantics

Allows embedding the structure from any XML schema into a WordprocessingML Allows embedding the structure from any XML schema into a WordprocessingML documentdocument

Schema not requiredSchema not requiredXML doesn’t have to validate against your schemaXML doesn’t have to validate against your schemaCustom XML elements may have custom attributesCustom XML elements may have custom attributesConsumers/producers preserve your attributesConsumers/producers preserve your attributes

Page 51: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Custom XML MarkupCustom XML MarkupExampleExample

Page 52: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

XML Mapping in SpreadsheetMLXML Mapping in SpreadsheetML

XML elements and attributes may be mapped XML elements and attributes may be mapped to cells and tablesto cells and tables

Store a copy of the schema in the workbookStore a copy of the schema in the workbook

Data is in an external XML fileData is in an external XML file

Page 53: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 54: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SpreadsheetMLSpreadsheetMLDocument architectureDocument architecture

Workbook properties

tablechart

styles

calcChain

sharedStrings

sheet1..Nsheet1..Nsheet1..Nsheet1..N

sheet1..Nsheet1..Nsheet1..Ndrawing

Page 55: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SpreadsheetMLSpreadsheetMLPerformance optimizationsPerformance optimizations

SpreadsheetML has been optimized based on SpreadsheetML has been optimized based on analysis of typical spreadsheet usage patterns:analysis of typical spreadsheet usage patterns:

Small tag size (often a single character)Small tag size (often a single character)Shared stringsShared stringsShared formulasShared formulasSparse table markup allowedSparse table markup allowedOptional r=“A1” attribute for faster loadingOptional r=“A1” attribute for faster loading

Page 56: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SpreadsheetML StringsSpreadsheetML StringsTwo alternatives for storing text stringsTwo alternatives for storing text strings

1.1. Inline stringsInline strings• Provided for ease of translation/conversionProvided for ease of translation/conversion• Useful in XSLT scenariosUseful in XSLT scenarios• Excel and other consumers may convert to shared Excel and other consumers may convert to shared

strings on document savestrings on document save2.2. An entry in the shared-strings tableAn entry in the shared-strings table• May be either a simple string or formatted textMay be either a simple string or formatted text

These approaches may be mixed/combinedThese approaches may be mixed/combined

Page 57: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Shared StringsShared StringsRepetitive strings are common in typical spreadsheetsRepetitive strings are common in typical spreadsheets

Strings are stored in a shared-strings part:Strings are stored in a shared-strings part:Each unique string is stored onceEach unique string is stored onceCells store the index (0-based) of the stringCells store the index (0-based) of the string

Benefits:Benefits:Users: reduced file size, improved performanceUsers: reduced file size, improved performanceDevelopers: all strings are in one part, simplifying Developers: all strings are in one part, simplifying search, localization, and other common string-handling search, localization, and other common string-handling taskstasks

Page 58: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Shared StringsShared StringsSampled shared-strings tableSampled shared-strings table

<sst xmlns="..." count="6" uniqueCount="4"> <si> <t>Paris</t> </si> <si> <t>Seattle</t> </si> <si> <t>London</t> </si> <si> <t>Copenhagen</t> </si></sst>

6 string references, 4 unique strings

Paris = string 0

<row r="1" spans="1:1"> <c r="A1" t="s"> <v>0</v> </c></row>

Page 59: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Inline StringsInline Strings

No shared-strings part requiredNo shared-strings part requiredEspecially useful in XSLT scenariosEspecially useful in XSLT scenariosIf you’re consuming Open XML documents, you must If you’re consuming Open XML documents, you must handle both cases: inline strings and/or shared stringshandle both cases: inline strings and/or shared stringsExcel 2007 converts to shared strings on saveExcel 2007 converts to shared strings on save

<sheetData> <row><c t="inlineStr"><is><t>Paris</t></is></c></row> <row><c t="inlineStr"><is><t>Seattle</t></is></c></row> <row><c t="inlineStr"><is><t>London</t></is></c></row> <row><c t="inlineStr"><is><t>Copenhagen</t></is></c></row> <row><c t="inlineStr"><is><t>Paris</t></is></c></row> <row><c t="inlineStr"><is><t>London</t></is></c></row></sheetData>

Page 60: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 61: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SpreadsheetML TablesSpreadsheetML TablesDesign goals for SpreadsheetML tables:Design goals for SpreadsheetML tables:1.1. Separate presentation and dataSeparate presentation and data

Data stays in the worksheetData stays in the worksheetTable definition is in a separate part (referenced via a relationship)Table definition is in a separate part (referenced via a relationship)

2.2. Cell definition lightweight but extensibleCell definition lightweight but extensibleComplex type with future storage capabilitiesComplex type with future storage capabilitiesNamed ranges written in their own collection instead of on each cellNamed ranges written in their own collection instead of on each cell

Open XML has different types of tables for each Open XML has different types of tables for each document type, optimized for different scenarios:document type, optimized for different scenarios:

WordprocessingML has its WordprocessingML has its tbltbl element elementSpreadsheetML has its SpreadsheetML has its tabletable element elementPresentationML uses DrawingML tables (PresentationML uses DrawingML tables (tbl tbl inside inside graphicDatagraphicData))

Page 62: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SpreadsheetML Table ExampleSpreadsheetML Table Example

<sheetData> <row r="1" spans="1:2"> <c r="A1" t="s"><v>0</v></c> <c r="B1" t="s"><v>1</v></c> </row> <row r="2" spans="1:2"> <c r="A2"><v>1</v></c> <c r="B2"><v>4</v></c> </row> <row r="3" spans="1:2"> <c r="A3"><v>2</v></c> <c r="B3"><v>5</v></c> </row> <row r="4" spans="1:2"> <c r="A4"><v>3</v></c> <c r="B4"><v>6</v></c> </row></sheetData>...<tableParts count="1"> <tablePart r:id="rId2"/></tableParts>

Headings = shared strings

Worksheet part:

Table-definition part:<table … ref="A1:B4” …> <autoFilter ref="A1:B4”/> <tableColumns count="2"> <tableColumn id="1" name="Column1" /> <tableColumn id="2" name="Column2" /> </tableColumns> <tableStyleInfo …/> </table>

Page 63: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

AutoFilter ExampleAutoFilter Example

Page 64: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

FormulasFormulas

Stored as plain textStored as plain text

Documented in the specDocumented in the specto provide for predictableto provide for predictableinteroperabilityinteroperability

<row> <c> <v>1</v> </c></row><row> <c> <v>2</v> </c></row><row> <c> <v>3</v> </c></row><row> <c> <f>SUM(A1:A3)</f> </c></row>

Page 65: Open XML Deep Dive

Satisfy Your Technical Curiosity

DrawingMLDrawingML

Page 66: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

DrawingML vs. VMLDrawingML vs. VMLPer the Ecma spec: Per the Ecma spec: “VML should be considered “VML should be considered a deprecated format included in Office Open a deprecated format included in Office Open XML for legacy reasons only.”XML for legacy reasons only.”VML was not entirely replaced by DrawingML VML was not entirely replaced by DrawingML before submission to Ecmabefore submission to Ecma

Main remaining uses of VML:Main remaining uses of VML:WordprocessingML: OfficeArt shapes, textboxesWordprocessingML: OfficeArt shapes, textboxesSpreadsheetML/PresentationML: comments, SpreadsheetML/PresentationML: comments, embedded OLE objectsembedded OLE objects

Page 67: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

3-D Effects3-D Effects

3-D Scene Definition

Before Apply 3-D Scene

Apply 3-D Bevels

Adjust Material types

Page 68: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

DrawingMLDrawingMLImplementation varies for each document typeImplementation varies for each document type

Location varies (main body, drawing part, slide)Location varies (main body, drawing part, slide)Packaging (“shim”) variesPackaging (“shim”) varies

WordprocessingML(in Word):

SpreadsheetML(in Excel):

PresentationML(in PowerPoint):

Page 69: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

WordprocessingMLWordprocessingMLDrawingML is stored in the DrawingML is stored in the document bodydocument body

Shim defines graphic frame and locked canvas

Shape definition is DrawingML

Page 70: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SpreadsheetMLSpreadsheetMLDrawing is in a separate Drawing is in a separate drawing partdrawing part

Shim defines anchorposition and type

Shape definition usesspreadsheetDrawing namespacefor non-visual properties

Page 71: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

PresentationMLPresentationMLDrawingML is stored in the slide partDrawingML is stored in the slide part

No shim – the shape is in the shape tree

Shape definition is DrawingML

Page 72: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

PresentationMLPresentationMLDocument architectureDocument architecture

View Properties

PresentationProperties

Code

Themes

Fonts

Notes Masters

Slides

HandoutMasters

Slide Masters

Notes Slides

Slide Layouts

Presentation

Page 73: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Sample SlideSample SlideTypical presentationML contentTypical presentationML content

Shape ChartTextbox

Page 74: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Slide PartSlide PartShape tree contains slide content definitionsShape tree contains slide content definitions

<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr>   <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr>   <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>

Shape

Chart

Textbox

Page 75: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Shape ChartTextbox

Chart Part (chart1.xml)

Data source

Page 76: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

PresentationML TablesPresentationML TablesSlide part contains table definitionSlide part contains table definitionIn a graphicFrame elementIn a graphicFrame elementAll DrawingML is in the slide – no separate “table part”All DrawingML is in the slide – no separate “table part”

Table position

Table definition

Header-row formatting

Banded-row formatting

TableStyleID = GUID

Page 77: Open XML Deep Dive

Satisfy Your Technical Curiosity

Page 78: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

OpenXmlDeveloper.orgOpenXmlDeveloper.orgFormed by 40 companies to share developer Formed by 40 companies to share developer information about the Office Open XML file formatsinformation about the Office Open XML file formatsArticles with source code for C#, VB, Java, PHP, XSLTArticles with source code for C#, VB, Java, PHP, XSLTForums for posting technical questionsForums for posting technical questions

Page 79: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

The Ecma SpecThe Ecma Spec1. Fundamentals1. Fundamentals2. Open Packaging Convention2. Open Packaging Convention3. Primer 3. Primer (start here)(start here)4. Markup Language Reference 4. Markup Language Reference (huge!)(huge!)5. Markup Compatibility and Extensibility5. Markup Compatibility and ExtensibilityReference Schemas (XSD, RelaxNG)Reference Schemas (XSD, RelaxNG)

Tips:Tips:• Start with part 3, PrimerStart with part 3, Primer• Use the PDF version of part 4 to look up elements/attributesUse the PDF version of part 4 to look up elements/attributes

Page 80: Open XML Deep Dive

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Open XML BlogsOpen XML Blogs

Brian Jones: Brian Jones: http://blogs.msdn.com/brian_jonesDoug Mahugh: Doug Mahugh: http://blogs.msdn.com/dmahughKevin Boske: Kevin Boske: http://blogs.msdn.com/kboskeWouter Van Vugt: Wouter Van Vugt: http://blogs.infosupport.com/woutervErika Ehrli: Erika Ehrli: http://blogs.msdn.com/erikaehrli

See complete list on www.OpenXmlDeveloper.orgSee complete list on www.OpenXmlDeveloper.org

Page 81: Open XML Deep Dive

Satisfy Your Technical Curiosity