36
Static Validation of XSL Transformations Static Validation of XSL Transformations Anders Møller Mads Østerby Olesen Michael I. Schwartzbach http://www.brics.dk/~amoeller/talks/xslt.pdf University of Aarhus

Static Validation of XSL Transformationscs.au.dk/~amoeller/talks/xslt.pdf · Static Validation of XSL Transformations 3 XSLT 1.0 XSLT (XSL Transformations) is designed for stylesheet

  • Upload
    others

  • View
    50

  • Download
    0

Embed Size (px)

Citation preview

Static Validation of XSL TransformationsStatic Validation of XSL Transformations

Anders MøllerMads Østerby Olesen

Michael I. Schwartzbach

http://www.brics.dk/~amoeller/talks/xslt.pdf

University of Aarhus

2Static Validation of XSL Transformations

PlanPlan

Brief summary of XSLT (1.0)

Stylesheet mining

Type checking XSLT stylesheets

3Static Validation of XSL Transformations

XSLT 1.0XSLT 1.0

XSLT (XSL Transformations) is designed for stylesheet transformations for document-centric XML languages

A declarative domain-specific languagebased on templates and pattern matchingusing XPath

An XSLT program consists of template rules, each having a pattern and a template

4Static Validation of XSL Transformations

Processing ModelProcessing Model

A source XML tree is transformed by processing its root node

A single node is processed by• finding the template rule with the best

matching pattern• instantiating its template

• may create result fragments• may select other nodes for processing

A node list is processed by processing each node and concatenating the results

5Static Validation of XSL Transformations

ExampleExample

<xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”

xmlns:b=”http://businesscard.org” xmlns="http://www.w3.org/1999/xhtml">

<xsl:template match="b:card">

<html><head><title><xsl:value-of select="b:name/text()"/></title></head>

<body bgcolor="#ffffff"><table border="3">

<tr><td>

<xsl:apply-templates select="b:name"/><br/>

<xsl:apply-templates select="b:title"/><p/>

<tt><xsl:apply-templates select="b:email"/></tt><br/>

<xsl:if test="b:phone">

Phone: <xsl:apply-templates select="b:phone"/><br/>

</xsl:if>

</td><td>

<xsl:if test="b:logo"><img src="{b:logo/@uri}"/></xsl:if>

</td></tr>

</table></body>

</html>

</xsl:template>

<xsl:template match="b:name | b:title | b:email | b:phone">

<xsl:value-of select="text()"/>

</xsl:template>

</xsl:stylesheet>

6Static Validation of XSL Transformations

TemplatesTemplates

Main template constructs:literal result fragments • character data, non-XSLT elements

recursive processing• apply-templates, call-template, for-each, copy, copy-of

computed result fragments• element, attribute, value-of, ...

conditional processing• if, choose

variables and parameters• variable, param, with-param

use XPath for computing values

7Static Validation of XSL Transformations

Pattern MatchingPattern Matching

Patterns are simple XPath 1.0 expressions evaluating to node sets• they are (unions of) location paths• only child (default), attribute (@), and

descendant-or-self (//) axes are permitted

A given node N matches a given pattern P iff

∃context C: N ∈ eval(P,C)

8Static Validation of XSL Transformations

Processing ModesProcessing Modes

mode attribute on template and apply-templates

Allows a node to be processed multiple times in different ways

9Static Validation of XSL Transformations

Variables and ParametersVariables and Parameters

Allow reuse of computations and parameterization of template rules and of the entire stylesheet• values of type string, number, boolean, node-set, or

result tree fragment• static scope rules, declared globally or locally• purely declarative

Declaration: variable / paramUse: $x

Actual parameter: with-param

10Static Validation of XSL Transformations

The ChallengeThe Challenge

Given • an XSLT stylesheet S, • two DTD schemas, Din and Dout,

assuming that X is valid relative to Din,

is S applied to X always valid relative to Dout?

11Static Validation of XSL Transformations

Stylesheet MiningStylesheet Mining

How are the many features of XSLT being used?• Typical stylesheet size?• Complexity of select expressions?• Complexity of match expressions?

Obtained via Google: 499 stylesheets with a total of 186,726 lines of code

12Static Validation of XSL Transformations

Stylesheet SizesStylesheet Sizes

0

20

40

60

80

100

120

100 200 300 400 500 600 700 800 900 1K 2K 3K 4K 5K 6K 7K

lines of code

number of stylesheets

13Static Validation of XSL Transformations

Complexity of Complexity of selectselect ExpressionsExpressions

100.0%10,962Total

1.1%120nasty

0.1%9parent known

0.1%11set of parents known

0.3%31sibling axis

0.6%69set of names known

1.7%190parent and name known

2.3%250name known

3.3%365$x

0.1%8/

0.3%32..

0.4%43/a[...]/b[...]/c[...]

0.6%68@a

0.7%82a[...]/b[...]/c[...]

1.0%110/a/b/c

2.0%223a[...]

2.1%235text()

4.3%473a | b | c

6.8%740*

10.5%1,153a/b/c

30.4%3,335a

31.2%3,415default

FractionNumberCategory

14Static Validation of XSL Transformations

Complexity of Complexity of matchmatch ExpressionsExpressions

100.0%8,739Total

1.1%97nasty

0.0%1.../a:*

0.1%11.../text()

2.7%24.../@a

1.2%108.../a

2.6%225.../a[...]

2.6%225a[...]

0.0%4@n:*

0.1%11processing-instruction()

0.1%12a:*

0.2%16@*

0.3%24@a

0.6%52text()

2.0%177a | b | c

2.5%217*

2.9%256/

4.8%423a/b/c

5.3%467a[@b=‘...’]

6.0%523a/b

15.7%1,369absent

53.9%4,710a

FractionNumberCategory

15Static Validation of XSL Transformations

The XSLT Validation AlgorithmThe XSLT Validation Algorithm

Our strategy:1. reduce to core features of XSLT2. analyze flow

– apply-templates → template ?– possible context nodes when templates

are instantiated?3. construct summary graph (using Din)4. validate summary graph relative to Dout

16Static Validation of XSL Transformations

LimitationsLimitations

Not supported:text output method, disable-output-escaping

implementation-specific extensionsnamespace nodes with for-each and variables/parameters

17Static Validation of XSL Transformations

Semantics Preserving SimplificationsSemantics Preserving Simplifications

22 steps – some highlights:make defaults explicit (built-in template rules, default select, default axes, coercions, ...)insert imported/included stylesheetsconvert literal elements and attributes to element/attribute instructionsconvert text to text instructionsexpand variable uses (not parameters)reduce if to choosereduce for-each, call-template, and copy to apply-templates instructions and new template rulesmove nested templates (in when/otherwise) to new template rules

18Static Validation of XSL Transformations

Validity Preserving SimplificationsValidity Preserving Simplifications

remove all processing-instruction and comment instructions

(we can’t ignore text since DTD can constrain attribute values)

19Static Validation of XSL Transformations

Approximating SimplificationsApproximating Simplifications

12 steps – some highlights:replace each number by a value-of with xslv:unknownString()

replace each value-of expression by xslv:unknownString(), except for string(self::node()) and string(attribute::a)replace when conditions by xslv:unknownBoolean()replace name attributes in attribute and elementinstructions by {xslv:unknownString()}, except for constants and {name()}

(Note: we want to handle almost-identity transformations precisely!)

20Static Validation of XSL Transformations

Reduced XSLTReduced XSLT

The only features left:template rules with match, priority, mode, paramapply-templates with select, mode, sort, with-paramchoose where each condition is xslv:unknownBoolean()and each branch template is an apply-templatescopy-of with a parameter as argumentattribute and element whose name is a constant, {name()} or {xslv:unknownString()} and the contents of attribute is a value-ofvalue-of where the argument is xslv:unknownString(), string(self::node()) or string(attribute::a)top-level param declarations (no variables)

21Static Validation of XSL Transformations

Flow AnalysisFlow Analysis

For each apply-templates instruction, what are the possible target template rules?Which templates may be instantiated when the document root is processed?For each template rule, what are the possible types and names of context nodes when the template is instantiated?

Goal: conservative approximations (“too large” is OK)Algorithm sketch: • find entry nodes (easy)• for each apply-templates, find outgoing edges

and context sets (this is the difficult part!)• iterate until fixed point

22Static Validation of XSL Transformations

SelectSelect--Match CompatibilityMatch Compatibility

A necessary compatibility condition:There exists an XML document X valid relative to Din with nodes a, b, c, d such that

a b, b c, d c

and b is a node of type σ

If this condition is satisfied for σ∈context(n),then add an edge from the i’th apply-templatesinstruction in template rule n to template rule m

matchn selectin matchm

23Static Validation of XSL Transformations

SelectSelect--Match CompatibilityMatch Compatibility

A necessary compatibility condition (simplified):There exists an XML document X valid relative to Din with nodes a, c, d such that

a c, d c

(assumes that selectin does not start with ‘/’)

matchn / type(σ) / selectin matchm

24Static Validation of XSL Transformations

DecidabilityDecidability

Everything has been reduced to regular tree languages and regular expressions on trees, so select-match compatibility is decidable

However, building an algorithm on this presumably wouldn’t be efficient...

25Static Validation of XSL Transformations

A Pragmatic Approach, Part 1A Pragmatic Approach, Part 1

More than 90% of all select expressions are“downwards only”!!!

The set of valid downwards paths relative to Din is a regular (string) language – and making a DFA is easy!Downwards XPath locations paths can be encoded as simple regular expressions!

Select-Match compatibility test:

REGEXP(matchn / type(σ) / selectin) ∩ REGEXP(matchm) ∩ DFA(Din) ≠ Ø

26Static Validation of XSL Transformations

A Pragmatic Approach, Part 2A Pragmatic Approach, Part 2

The remaining 10%? Approximate!

Example:

s1/s2/... /si /.../snwhere si is the right-most step with a non-downwards axis

is rewritten to//si+1 /.../sn

27Static Validation of XSL Transformations

Another Pragmatic ApproachAnother Pragmatic Approach

Simulate select/match expressions on DinAdd edge if non-empty intersection of results

Compared to the first approach,• more precise on non-downwards axes

example:select = ../following-sibling::*match = b/*

• less precise on correlated downwards expressionsexample:

select = a/b/cmatch = d/c

Only add flow edges if both approaches say so

28Static Validation of XSL Transformations

Propagating Context InformationPropagating Context Information

Which contexts flows along the edges?

For the first approach: just check the incoming edges of the accept states of the resulting automaton

For the second approach:just take the intersection of the sets that result from the select/match simulation

29Static Validation of XSL Transformations

RefinementsRefinements

Modes: only add edges/context if modes match

Priorities: skip edge if always overridden

Predicates: use primitive theorem prover to avoid impossible edges

Parameters: global flow-insensitive (weak updates) – this also eliminates all copy-of instructions

30Static Validation of XSL Transformations

StatusStatus

We now have:• flow edges (apply-templates → template)

• initial template rules• context set for each template rule

Next:• construction of summary graph• validation relative to Dout

31Static Validation of XSL Transformations

Summary GraphsSummary Graphs

Also used in JWIG / XACT program analyses

Nodes ~ elements/attributes/gaps• root nodes ~ outermost

Edges ~ potential “plug” operations• template edges ~ template plugs• string edges ~ string plugs, labeled with regular languages

A summary graph can theoretically be unfolded to a (potentially infinite) set of concrete XML documents

32Static Validation of XSL Transformations

Construction of Summary GraphsConstruction of Summary Graphs

Convert each template to a summary graph fragment,relative to a context:

• element → an element node with appropriate contents• attribute → an attribute node• value-of → a gap node and a string edge• choose → a gap node with a template edge for each branch• apply-templates → ???

Elements/attributes whose name is xslv:unknownString()immediately result in validity errors being reported

Initial template rules become root nodes

33Static Validation of XSL Transformations

Connecting Summary Graph FragmentsConnecting Summary Graph Fragments

Converting apply-templates:we have the outgoing flow edges!

if select is a children-only step:• build a summary graph fragment corresponding to the

content model of the current context,• connect with flow edge targets• if sort is used, scramble order

sequence of children-only steps: handled similarly...parent::node(), /, self::node():

template edge to each flow edge targetotherwise: any order and any number of occurrences

of each flow edge target

34Static Validation of XSL Transformations

Validating Summary GraphsValidating Summary Graphs

Input: a summary graph and DoutOutput: valid?

Solution: use algorithm from JWIG / XACT !

35Static Validation of XSL Transformations

ExperimentsExperiments

17 benchmarks consisting of (stylesheet, Din, Dout),all written by others

Real errors detected in 10 triples! (total: 44 errors)• misplaced elements• undefined elements / attributes / attribute values• missing elements / attributes

Spurious errors detected in 6 triples• 90% caused by inadequate string analysis (e.g. NMTOKENs)

Soundness ensures that no errors are missed!Efficiency: 4 minutes on a 2,528 line stylesheet with

2,561 + 1,198 line DTDs, generates summary graph with 12,182 nodes

36Static Validation of XSL Transformations

ConclusionConclusion

Program analyzer for statically checking validity of output of XSLT 1.0 transformations

Main ideas:• reduce to core features• pragmatic flow analysis• exploit summary graph formalism

from JWIG / XACT