Upload
beverley-burke
View
214
Download
2
Embed Size (px)
Citation preview
Streaming Processing of Large XML Data
Jana Dvořáková, Filip Zavoral
• processing of large XML data using XSLT with optimal memory complexity
• formal model / implementation framework• analyzer, SSXT / BUXT transformer
SSXT - streaming transducer
• Simple Streaming Xml Transducer• no backward axis, no predicates, no variables• order-preserving• branch-disjoint stack / document depth
• BUXT - Buffering Transducer
Xord framework - Analyzer
AnalyzerXSLT & XSD: virtually applies templates to schemaall possible node sequences are processed
regexpall possible node sequences selected by XPath expressionspossible reading orders of the elements
names sequence of element names in the order they are calledrepresents the processing order of the elements
SSXT Transformer
• Polymorphic stack– two types of transformation states - DFA & CC– related to current document level
• sequence of deterministic finite automata states– concurrent evaluation of XPath expressions– single DFA for each expression– start-tag → DFA transition– final state → template call
• cycle configuration– template and template call being processed
Evaluation & Comparison
Memory consumption (MB) of SSXT algorithm and tree-based XSLT processors for input XML data of different size DBLP.xml ≈ 700 MB
0
2
4
6
8
10
12
14
16
18
20
10K 30K 100K 300K 1M 10M
Saxon Xerces LibXslt SSXT
92
MBelements
168
• Future work– buffering transformer optimizations and evaluation– multipass streaming algorithms– overcoming some restrictions to XSLT constructs
Future work