Upload
william-vykintas-narmontas
View
94
Download
0
Embed Size (px)
Citation preview
XML Processing inWilliam Narmontas
Dino Fancelluwww.scala.contractors
XML LONDON 2014
Dino Fancellu35 years IT
Scala • Java • XML
William Narmontas10 years IT
Scala • XML • Web
What is Scala?
Scala processes XML fast
It is powerful
Concise
Functional
Object-oriented
Modular
Statically-typedStrongly-typed
Type-safe
Performant
Java-interoperable
Composable Unopinionated
First-class XML
Who uses Scala?
Apple
Bank of America
Barclays
BBC
BSkyB
Cisco
Citigroup
Credit Suisse
Morgan Stanley
Netflix
Novell
Rackspace
Sky
Sony
Springer
eBay
eHarmony
EDF
FourSquare
Gawker
HSBC
ITV
Klout
The Guardian
TomTom
Trafigura
Tumblr
UBS
VMware
Xerox
Projects in Scala
- Less code to write = less to maintain
- Communication clearer
- Testing easier
- Software robust
- Time to market: fast
- Happier developers
Scala language: Intro
Values
val conferenceName = "XML London 2014"
let $conferenceName := "XML London 2014"
Scala
XQuery
var conferenceName = "XML London 2014"
conferenceName = "XML London 2015"Scala (Mutable)
Strings
val language = "Scala"
s"XML Processing in $language"
| XML Processing in Scala
s"""An introduction to:
|The "$language" programming language""".stripMargin
| An introduction to:
| The "Scala" programming language
s"$language has ${language.length} chars in its name"
| Scala has 5 chars in its name
Functions
def fun(x: Int, y: Double) =
s"$x: $y"
declare function local:fun(
$x as xs:integer, $y as xs:double
) as xs:string {
concat($x, ": ", $y)
};
Scala
XQuery
Everything is an expression
val trainSpeed =
if ( train.speed.mph >= 60 ) "Fast"
else "Slow"
def divide(numerator: Int, denominator: Int) =
try {
s"${numerator/denominator}"
} catch {
case _: java.lang.ArithmeticException =>
s"Cannot divide $numerator by $denominator"
}
Types: Explicit
def withTitle(name: String, title: String): String =
s"$title. $name"
val x: Int = {
val y = 1000
100 + y
}
| x: Int = 1100
Functions: named parameters
Further clarity in method calls:
def makeLink(url: String, text: String) =
s"""<a href="$url">$text</a>"""
makeLink(text = "XML London 2014", url = "http://www.xmllondon.com")
| <a href="http://www.xmllondon.com">XML London 2014</a>
Functions: default parameters
Reduce repetition in method calls:
def withTitle(name: String, title: String = "Mr") =
s"$title. $name"
withTitle("John Smith")
| Mr. John Smith
withTitle("Mary Smith", "Miss")
| Miss. Mary Smith
Functional
def incrementedByOne(x: Int) = x + 1
(1 to 5).map(incrementedByOne)
| Vector(2, 3, 4, 5, 6)
Lambdas
(1 to 5).map(x => x + 1)
| Vector(2, 3, 4, 5, 6)
(1 to 5).map(_ + 1)
| Vector(2, 3, 4, 5, 6)
For comprehensions
for { x <- (1 to 5) }
yield x + 1
| Vector(2, 3, 4, 5, 6)
Implicit classes: Enrich types
implicit class stringWrapper(str: String) {
def wrapWithParens = s"($str)"
}
"Text".wrapWithParens
| (Text)
Powerful features for scalability
- Case classes
- Traits
- Partial functions
- Pattern matching
- Implicits
- Flexible Syntax
- Generics
- User defined operators
- Call-by-name
- Macros
Scala & XML
Values: Inline XML
val url = "http://www.xmllondon.com"
val title = "XML London 2014"
val xmlTree = <div>
<p>Welcome to <a href={url}>{title}</a>!</p>
</div>
| xmlTree: scala.xml.Elem =
| <div>
| <p>Welcome to <a href="http://www.xmllondon.com/">XML London
2014</a>!</p>
| </div>
XML Lookupsval listOfPeople = <people>
<person>Fred</person>
<person>Ron</person>
<person>Nigel</person>
</people>
listOfPeople \ "person"
| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)
listOfPeople \ "_"
| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)
XML Lookupsval fact = <fact type="universal">
<variable>A</variable> = <variable>A</variable>
</fact>
fact \\ "variable"
| NodeSeq(<variable>A</variable>, <variable>A</variable>)
fact \ "@type"
| : scala.xml.NodeSeq = universal
fact \@ "type"
| : String = universal
XML Loadingval pun = """<pun rating="extreme">
| <question>Why do CompSci students need glasses?</question>
| <answer>To C#<!-- C# is a Microsoft's programming language -->.</answer>
|</pun>""".stripMargin
scala.xml.XML.loadString(pun)
| <pun rating="extreme">
| <question>Why do CompSci students need glasses?</question>
| <answer>To C#.</answer>
| </pun>
Collections: expressiveval root = <numbers>
{for {i <- 1 to 10} yield
<number>{i}</number>}
</numbers>
val numbers = root \ "number"
numbers(0)
| <number>1</number>
numbers.head
| <number>1</number>
numbers.last
| <number>10</number>
numbers take 3
| NodeSeq(<number>1</number>, <number>2</number>, <number>3</number>)
Collections: expressivenumbers filter (_.text.toInt > 6)
| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)
numbers(_.text.toInt > 6)
| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)
numbers maxBy (_.text)
| <number>9</number>
numbers maxBy (_.text.toInt)
| <number>10</number>
numbers.reverse
| NodeSeq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>,
<number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>)
numbers.groupBy(_.text.toInt % 3)
| Map(
| 2 -> NodeSeq(<number>2</number>, <number>5</number>, <number>8</number>),
| 1 -> NodeSeq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>),
| 0 -> NodeSeq(<number>3</number>, <number>6</number>, <number>9</number>))
XML Methods: a rich API%
:+
aggregate
attributes
combinations
copyToArray
diff
dropWhile
flatMap
foreach
head
init
isInstanceOf
lastIndexOfSlice
map
mkString
padTo
prefixLength
reduceRight
runWith
segmentLength
sortWith
strict_==
takeRighttoBuffertoSeqtransposewithFilterzipAll
++:\andThenbuildStringcompanioncopyToBufferdistinctendsWithflattengenericBuilderheadOptioninitsisTraversableAgainlastIndexWheremaxnameToStringparproductreduceRightOptionsameElementsseqsortedstringPrefixtakeWhiletoIndexedSeqtoSetunionxmlTypezipWithIndex
++:\applycanEqualcomposecorrespondsdoCollectNamespacesexistsfoldgetNamespaceindexOfintersectiteratorlastOptionmaxBynamespacepartitionreducereprscansizespansumtexttoIterabletoStreamunzipxml_!=
+:\@applyOrElsechildcontainscountdoTransformfilterfoldLeftgroupByindexOfSliceisAtomlabellengthminnonEmptypatchreduceLeftreversescanLeftslicesplitAttailtheSeqtoIteratortoStringunzip3xml_==
/:\\asInstanceOfcollectcontainsSlicedescendantdropfilterNotfoldRightgroupedindexWhereisDefinedAtlastlengthCompareminBynonEmptyChildrenpermutationsreduceLeftOptionreverseIteratorscanRightslidingstartsWithtailstotoListtoTraversableupdatedxml_sameElements
/:\addStringattributecollectFirstcopydescendant_or_selfdropRightfindforallhasDefiniteSizeindicesisEmptylastIndexOfliftminimizeEmptyorElseprefixreduceOptionreverseMapscopesortBystrict_!=taketoArraytoMaptoVectorviewzip
For-comprehensions: similar to XQuery
<bib>{
for $b in $xml/book
let $year := $b/@year
where $b/publisher = "Addison-Wesley" and
$year > 1991
return <book year="{ $year }">
{ $b/title }
</book>
}</bib>
<bib>{ for {
b <- xml \ "book"
year = b \@ "year"
if b \ "publisher" === "Addison-Wesley" &&
year > 1991
} yield <book year={ year }>
{ b \ "title" }
</book>
}</bib>
<bib>{
for $b in $xml/book
let $year := $b/@year
where $b/publisher = "Addison-Wesley" and
$year > 1991
return <book year="{ $year }">
{ $b/title }
</book>
}</bib>
<bib>{ for {
b <- xml \ "book"
year = b \@ "year"
if b \ "publisher" === "Addison-Wesley" &&
year > 1991
} yield <book year={ year }>
{ b \ "title" }
</book>
}</bib>
For-comprehensions: similar to XQuery
For-comprehensions: similar to XQuery
<bib>{
for $b in $xml/book
let $year := $b/@year
where $b/publisher = "Addison-Wesley" and
$year > 1991
return <book year="{ $year }">
{ $b/title }
</book>
}</bib>
<bib>{ for {
b <- xml \ "book"
year = b \@ "year"
if b \ "publisher" === "Addison-Wesley" &&
year > 1991
} yield <book year={ year }>
{ b \ "title" }
</book>
}</bib>
For-comprehensions: similar to XQuery
<bib>{
for $b in $xml/book
let $year := $b/@year
where $b/publisher = "Addison-Wesley" and
$year > 1991
return <book year="{ $year }">
{ $b/title }
</book>
}</bib>
<bib>{ for {
b <- xml \ "book"
year = b \@ "year"
if b \ "publisher" === "Addison-Wesley" &&
year > 1991
} yield <book year={ year }>
{ b \ "title" }
</book>
}</bib>
For-comprehensions: similar to XQuery
<bib>{
for $b in $xml/book
let $year := $b/@year
where $b/publisher = "Addison-Wesley" and
$year > 1991
return <book year="{ $year }">
{ $b/title }
</book>
}</bib>
<bib>{ for {
b <- xml \ "book"
year = b \@ "year"
if b \ "publisher" === "Addison-Wesley" &&
year > 1991
} yield <book year={ year }>
{ b \ "title" }
</book>
}</bib>
For-comprehensions: similar to XQuery
... yet is general purposeNice!
<bib>{
for $b in $xml/book
let $year := $b/@year
where $b/publisher = "Addison-Wesley" and
$year > 1991
return <book year="{ $year }">
{ $b/title }
</book>
}</bib>
<bib>{ for {
b <- xml \ "book"
year = b \@ "year"
if b \ "publisher" === "Addison-Wesley" &&
year > 1991
} yield <book year={ year }>
{ b \ "title" }
</book>
}</bib>
Hybrid XML
- XQuery for Scala
- java.xml.* for free
- Look up: XPath
- Transform: XSLT
- Stream: StAX
XQuery for Scala (XQS)
- Wraps XQuery API for Java (javax.xml.xquery)
- Scala access to XQuery in:
- MarkLogic, BaseX, Saxon, Sedna, eXist, …- Converts DOM to Scala XML & vice versa
- http://github.com/fancellu/xqs
XQuery via XQSval widgets = <widgets>
<widget>Menu</widget>
<widget>Status bar</widget>
<widget id="panel-1">Panel</widget>
<widget id="panel-2">Panel</widget>
</widgets>
import com.felstar.xqs.XQS._
val conn = new net.xqj.basex.local.BaseXXQDataSource().getConnection
val nodes: NodeSeq = conn("for $w in /widgets/widget order by $w return $w", widgets)
| NodeSeq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>,
| <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)
XPathimport com.felstar.xqs.XQS._
val widgets = <widgets>
<widget>Menu</widget>
<widget>Status bar</widget>
<widget id="panel-1">Panel</widget>
<widget id="panel-2">Panel</widget>
</widgets>
val xpath = XPathFactory.newInstance().newXPath()
val nodes = xpath.evaluate("/widgets/widget[not(@id)]", toDom(widgets),
XPathConstants.NODESET).asInstanceOf[NodeList]
(nodes: NodeSeq)
| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)
Natively in Scala:(widgets \ "widget")(widget => (widget \ "@id").isEmpty)
| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)
XSLTval stylesheet = <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="john">
<xsl:copy>Hello, John.</xsl:copy>
</xsl:template>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
import com.felstar.xqs.XQS._
val xmlResultResource = new java.io.StringWriter()
val xmlTransformer = TransformerFactory.newInstance().newTransformer(stylesheet)
xmlTransformer.transform(peopleXml, new StreamResult(xmlResultResource))
xmlResultResource.getBuffer
| <?xml version="1.0" encoding="UTF-8"?><people>
| <john>Hello, John.</john>
| <smith>Smith is here.</smith>
| <another>Hello.</another>
| </people>
val peopleXml = <people>
<john>Hello, John.</john>
<smith>Smith is here.</smith>
<another>Hello.</another>
</people>
XML Stream Processing// 4GB file, comes back in a second
val src = Source.fromURL("http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-abstract.xml")
val er = XMLInputFactory.newInstance().createXMLEventReader(src.reader)
implicit class XMLEventIterator(ev:XMLEventReader) extends scala.collection.Iterator[XMLEvent]{
def hasNext = ev.hasNext
def next = ev.nextEvent()
}
er.dropWhile(!_.isStartElement).take(10).zipWithIndex.foreach {
case (ev, idx) => println(s"${idx+1}:\t$ev") }
src.close()
| 1: <feed>
| 2:
|
| 3: <doc>
| 4:
|
| 5: <title>
| 6: Wikipedia: Anarchism
| 7: </title>
| 8:
|
| 9: <url>
| 10:
http://en.wikipedia.org/wiki/Anarchism
Use Cases
- Data extraction
- Serving XML via REST
- Dynamically generated XSLT
- Interfacing with XML databases
- Flexibility to choose the best tool for the job
Excellent Ecosystem
SBT
ScalaTest
scala-xml
macro-paradise
Akka
Spray
scalaz
shapeless
JVMscala-maven-plugin
Spark
Scaladin
Specs
Conclusion
- Practical
- Practical for XML processing
Where do I start?
- atomicscala.com
- typesafe.com/activator
- scala-lang.org
- scala-ide.org
- IntelliJ
Matt Stephens Charles Foster
Open to consulting
www.scala.contractors
Follow us on Twitter:
@DinoFancellu
@ScalaWilliam
@MaffStephens