Case Study on Redlining at the ISOChandi [email protected]
ISOInternational Organization for Standardization
World's largest developer and publisher of International Standards
164 national standard bodies
100000 experts into 3484 technical bodies
Collaboration with 649 international organizations
Total of 19977 International Standards
Single Source Publishing
Redline Publications
• Traditionally done manually
• Only for the most popular 50-60 standards documents
• Always for adjacent versions only
• High cost of production
• Delayed release
Automation Approach
• Identify the changes
• Render the changes
Track Changes Approach
Track Changes ApproachPros
• Changes can be seen by committee members as they author or update a standard.
• Side notes can be applied to the content to explain changes.
• Supported in Microsoft Word
Track Changes ApproachCons
• There is a requirement for editors and authors to keep track changes enabled.
• Any author or editor can “hide” changes by accepting tracked changes.
• There is a likelihood of lots of false changes being captured during intensive editing stages.
Track Changes ApproachCons
• The Word document becomes cluttered if there are lots of changes.
• The ability to redline across non-adjacent versions.
• Not everyone uses Microsoft Word, and copy and pasting from another application will mark the whole section as changed
XML Differencing
• Compare two versions of the same XML document and get an intelligent result
<someTag attributeA="ValueA" attributeB="ValueB"/><someTag attributeB="ValueB" attributeA="ValueA"></someTag>
XML DifferencingPros
• Not dependent on author behavior or applications
• No need to retrain contributors
• Only comparing final documents
• Can compare non adjacent versions
XML DifferencingCons
• More complex to implement
• Requires additional software
XML DifferencingHow it works
• Output XML contains both versions
XML DifferencingHow it works
Markup Meaning
deltaxml:deltaV2="A=B" the content of that element (and its children) has not changed
deltaxml:deltaV2="A" content is unique to document A
deltaxml:deltaV2="B" content is unique to document B
deltaxml:deltaV2="A!=B" indicates that that
element (or one or more of its descendants) has changed between documents
Creating the Redline
Redline Workflow
Rendering the Changes
• Text
• Images
• Tables
• Mathematics
RenderingText
<p>Unchanged text <deltaxml:textGroup deltaxml:deltaV2="A!=B"> <deltaxml:text deltaxml:deltaV2="A"> old/deleted text </deltaxml:text> <deltaxml:text deltaxml:deltaV2="B"> new/added text </deltaxml:text> </deltaxml:textGroup></p>
RenderingImages
Detecting ChangeImages
• Change in the figure filename/file path
• Change in the image file signature
• Change to figure XMP metadata
RenderingTables
• Content change only: Render the same as text changes
• Structure changes: Render the same as figures
BUT!!!!
• Sometimes tables are used improperly
New edition
Change
Solution
• Address the editorial process and style
• Change table structure only when necessary
RenderingMathematics
• Compare the MathML
• Render the images in changes
• Can lead to false positives between different versions of MathML or different ways of expressing the same equation
RenderingReducing Clutter
• Do not render the changes that does not change the meaning
RenderingRules
• addition or deletion of semi-colons
• change from hyphen to non-breaking hyphen
• change from hyphen to en rule and vice versa
• change from hyphen to em rule and vice versa
• addition or deletion of white space
RenderingRules
• change from one type of white space to another (i.e. from space to non-breaking space)
• change from apostrophe to prime and vice versa
• change from hyphen to minus and vice versa
• change from flat text to hyperlinked text
• change from “equation” to “formula”
BUT!!!Be Careful
• Addition and removal of commas
• Capitalization changes
• Changes from italic to roman or back
ChallengesInconsistent Markup
ChallengesInconsistent Markup
• Due to XML being generated from scratch each time there is no consistent id attributes
• Consistent @id would significant assist in the redline process
ChallengesLack of @id
• Automated creation of Redline Publications is possible
• If you cannot control the authoring process change tracking approach will not work
• ISO-STS (by extension JATS) provides reasonable support for XML Differencing/Redline
• Consistency of tagging between versions is critical
Conclusions
QUESTIONSChandi [email protected]