49
XML Processing in William Narmontas Dino Fancellu www.scala.contractors XML LONDON 2014

XML Processing in Scala (XML London 2014)

Embed Size (px)

Citation preview

Page 1: XML Processing in Scala (XML London 2014)

XML Processing inWilliam Narmontas

Dino Fancelluwww.scala.contractors

XML LONDON 2014

Page 2: XML Processing in Scala (XML London 2014)

Dino Fancellu35 years IT

Scala • Java • XML

William Narmontas10 years IT

Scala • XML • Web

Page 3: XML Processing in Scala (XML London 2014)

What is Scala?

Page 4: XML Processing in Scala (XML London 2014)

Scala processes XML fast

Page 5: XML Processing in Scala (XML London 2014)
Page 6: XML Processing in Scala (XML London 2014)
Page 7: XML Processing in Scala (XML London 2014)

It is powerful

Page 8: XML Processing in Scala (XML London 2014)

Concise

Functional

Object-oriented

Modular

Statically-typedStrongly-typed

Type-safe

Performant

Java-interoperable

Composable Unopinionated

First-class XML

Page 9: XML Processing in Scala (XML London 2014)

Who uses Scala?

Apple

Bank of America

Barclays

BBC

BSkyB

Cisco

Citigroup

Credit Suisse

LinkedIn

Morgan Stanley

Netflix

Novell

Rackspace

Sky

Sony

Springer

eBay

eHarmony

EDF

FourSquare

Gawker

HSBC

ITV

Klout

The Guardian

TomTom

Trafigura

Tumblr

Twitter

UBS

VMware

Xerox

Page 10: XML Processing in Scala (XML London 2014)

Projects in Scala

- Less code to write = less to maintain

- Communication clearer

- Testing easier

- Software robust

- Time to market: fast

- Happier developers

Page 11: XML Processing in Scala (XML London 2014)

Scala language: Intro

Page 12: XML Processing in Scala (XML London 2014)

Values

val conferenceName = "XML London 2014"

let $conferenceName := "XML London 2014"

Scala

XQuery

var conferenceName = "XML London 2014"

conferenceName = "XML London 2015"Scala (Mutable)

Page 13: XML Processing in Scala (XML London 2014)

Strings

val language = "Scala"

s"XML Processing in $language"

| XML Processing in Scala

s"""An introduction to:

|The "$language" programming language""".stripMargin

| An introduction to:

| The "Scala" programming language

s"$language has ${language.length} chars in its name"

| Scala has 5 chars in its name

Page 14: XML Processing in Scala (XML London 2014)

Functions

def fun(x: Int, y: Double) =

s"$x: $y"

declare function local:fun(

$x as xs:integer, $y as xs:double

) as xs:string {

concat($x, ": ", $y)

};

Scala

XQuery

Page 15: XML Processing in Scala (XML London 2014)

Everything is an expression

val trainSpeed =

if ( train.speed.mph >= 60 ) "Fast"

else "Slow"

def divide(numerator: Int, denominator: Int) =

try {

s"${numerator/denominator}"

} catch {

case _: java.lang.ArithmeticException =>

s"Cannot divide $numerator by $denominator"

}

Page 16: XML Processing in Scala (XML London 2014)

Types: Explicit

def withTitle(name: String, title: String): String =

s"$title. $name"

val x: Int = {

val y = 1000

100 + y

}

| x: Int = 1100

Page 17: XML Processing in Scala (XML London 2014)

Functions: named parameters

Further clarity in method calls:

def makeLink(url: String, text: String) =

s"""<a href="$url">$text</a>"""

makeLink(text = "XML London 2014", url = "http://www.xmllondon.com")

| <a href="http://www.xmllondon.com">XML London 2014</a>

Page 18: XML Processing in Scala (XML London 2014)

Functions: default parameters

Reduce repetition in method calls:

def withTitle(name: String, title: String = "Mr") =

s"$title. $name"

withTitle("John Smith")

| Mr. John Smith

withTitle("Mary Smith", "Miss")

| Miss. Mary Smith

Page 19: XML Processing in Scala (XML London 2014)

Functional

def incrementedByOne(x: Int) = x + 1

(1 to 5).map(incrementedByOne)

| Vector(2, 3, 4, 5, 6)

Page 20: XML Processing in Scala (XML London 2014)

Lambdas

(1 to 5).map(x => x + 1)

| Vector(2, 3, 4, 5, 6)

(1 to 5).map(_ + 1)

| Vector(2, 3, 4, 5, 6)

Page 21: XML Processing in Scala (XML London 2014)

For comprehensions

for { x <- (1 to 5) }

yield x + 1

| Vector(2, 3, 4, 5, 6)

Page 22: XML Processing in Scala (XML London 2014)

Implicit classes: Enrich types

implicit class stringWrapper(str: String) {

def wrapWithParens = s"($str)"

}

"Text".wrapWithParens

| (Text)

Page 23: XML Processing in Scala (XML London 2014)

Powerful features for scalability

- Case classes

- Traits

- Partial functions

- Pattern matching

- Implicits

- Flexible Syntax

- Generics

- User defined operators

- Call-by-name

- Macros

Page 24: XML Processing in Scala (XML London 2014)

Scala & XML

Page 25: XML Processing in Scala (XML London 2014)

Values: Inline XML

val url = "http://www.xmllondon.com"

val title = "XML London 2014"

val xmlTree = <div>

<p>Welcome to <a href={url}>{title}</a>!</p>

</div>

| xmlTree: scala.xml.Elem =

| <div>

| <p>Welcome to <a href="http://www.xmllondon.com/">XML London

2014</a>!</p>

| </div>

Page 26: XML Processing in Scala (XML London 2014)

XML Lookupsval listOfPeople = <people>

<person>Fred</person>

<person>Ron</person>

<person>Nigel</person>

</people>

listOfPeople \ "person"

| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)

listOfPeople \ "_"

| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)

Page 27: XML Processing in Scala (XML London 2014)

XML Lookupsval fact = <fact type="universal">

<variable>A</variable> = <variable>A</variable>

</fact>

fact \\ "variable"

| NodeSeq(<variable>A</variable>, <variable>A</variable>)

fact \ "@type"

| : scala.xml.NodeSeq = universal

fact \@ "type"

| : String = universal

Page 28: XML Processing in Scala (XML London 2014)

XML Loadingval pun = """<pun rating="extreme">

| <question>Why do CompSci students need glasses?</question>

| <answer>To C#<!-- C# is a Microsoft's programming language -->.</answer>

|</pun>""".stripMargin

scala.xml.XML.loadString(pun)

| <pun rating="extreme">

| <question>Why do CompSci students need glasses?</question>

| <answer>To C#.</answer>

| </pun>

Page 29: XML Processing in Scala (XML London 2014)

Collections: expressiveval root = <numbers>

{for {i <- 1 to 10} yield

<number>{i}</number>}

</numbers>

val numbers = root \ "number"

numbers(0)

| <number>1</number>

numbers.head

| <number>1</number>

numbers.last

| <number>10</number>

numbers take 3

| NodeSeq(<number>1</number>, <number>2</number>, <number>3</number>)

Page 30: XML Processing in Scala (XML London 2014)

Collections: expressivenumbers filter (_.text.toInt > 6)

| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)

numbers(_.text.toInt > 6)

| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)

numbers maxBy (_.text)

| <number>9</number>

numbers maxBy (_.text.toInt)

| <number>10</number>

numbers.reverse

| NodeSeq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>,

<number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>)

numbers.groupBy(_.text.toInt % 3)

| Map(

| 2 -> NodeSeq(<number>2</number>, <number>5</number>, <number>8</number>),

| 1 -> NodeSeq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>),

| 0 -> NodeSeq(<number>3</number>, <number>6</number>, <number>9</number>))

Page 31: XML Processing in Scala (XML London 2014)

XML Methods: a rich API%

:+

aggregate

attributes

combinations

copyToArray

diff

dropWhile

flatMap

foreach

head

init

isInstanceOf

lastIndexOfSlice

map

mkString

padTo

prefixLength

reduceRight

runWith

segmentLength

sortWith

strict_==

takeRighttoBuffertoSeqtransposewithFilterzipAll

++:\andThenbuildStringcompanioncopyToBufferdistinctendsWithflattengenericBuilderheadOptioninitsisTraversableAgainlastIndexWheremaxnameToStringparproductreduceRightOptionsameElementsseqsortedstringPrefixtakeWhiletoIndexedSeqtoSetunionxmlTypezipWithIndex

++:\applycanEqualcomposecorrespondsdoCollectNamespacesexistsfoldgetNamespaceindexOfintersectiteratorlastOptionmaxBynamespacepartitionreducereprscansizespansumtexttoIterabletoStreamunzipxml_!=

+:\@applyOrElsechildcontainscountdoTransformfilterfoldLeftgroupByindexOfSliceisAtomlabellengthminnonEmptypatchreduceLeftreversescanLeftslicesplitAttailtheSeqtoIteratortoStringunzip3xml_==

/:\\asInstanceOfcollectcontainsSlicedescendantdropfilterNotfoldRightgroupedindexWhereisDefinedAtlastlengthCompareminBynonEmptyChildrenpermutationsreduceLeftOptionreverseIteratorscanRightslidingstartsWithtailstotoListtoTraversableupdatedxml_sameElements

/:\addStringattributecollectFirstcopydescendant_or_selfdropRightfindforallhasDefiniteSizeindicesisEmptylastIndexOfliftminimizeEmptyorElseprefixreduceOptionreverseMapscopesortBystrict_!=taketoArraytoMaptoVectorviewzip

Page 32: XML Processing in Scala (XML London 2014)

For-comprehensions: similar to XQuery

<bib>{

for $b in $xml/book

let $year := $b/@year

where $b/publisher = "Addison-Wesley" and

$year > 1991

return <book year="{ $year }">

{ $b/title }

</book>

}</bib>

<bib>{ for {

b <- xml \ "book"

year = b \@ "year"

if b \ "publisher" === "Addison-Wesley" &&

year > 1991

} yield <book year={ year }>

{ b \ "title" }

</book>

}</bib>

Page 33: XML Processing in Scala (XML London 2014)

<bib>{

for $b in $xml/book

let $year := $b/@year

where $b/publisher = "Addison-Wesley" and

$year > 1991

return <book year="{ $year }">

{ $b/title }

</book>

}</bib>

<bib>{ for {

b <- xml \ "book"

year = b \@ "year"

if b \ "publisher" === "Addison-Wesley" &&

year > 1991

} yield <book year={ year }>

{ b \ "title" }

</book>

}</bib>

For-comprehensions: similar to XQuery

Page 34: XML Processing in Scala (XML London 2014)

For-comprehensions: similar to XQuery

<bib>{

for $b in $xml/book

let $year := $b/@year

where $b/publisher = "Addison-Wesley" and

$year > 1991

return <book year="{ $year }">

{ $b/title }

</book>

}</bib>

<bib>{ for {

b <- xml \ "book"

year = b \@ "year"

if b \ "publisher" === "Addison-Wesley" &&

year > 1991

} yield <book year={ year }>

{ b \ "title" }

</book>

}</bib>

Page 35: XML Processing in Scala (XML London 2014)

For-comprehensions: similar to XQuery

<bib>{

for $b in $xml/book

let $year := $b/@year

where $b/publisher = "Addison-Wesley" and

$year > 1991

return <book year="{ $year }">

{ $b/title }

</book>

}</bib>

<bib>{ for {

b <- xml \ "book"

year = b \@ "year"

if b \ "publisher" === "Addison-Wesley" &&

year > 1991

} yield <book year={ year }>

{ b \ "title" }

</book>

}</bib>

Page 36: XML Processing in Scala (XML London 2014)

For-comprehensions: similar to XQuery

<bib>{

for $b in $xml/book

let $year := $b/@year

where $b/publisher = "Addison-Wesley" and

$year > 1991

return <book year="{ $year }">

{ $b/title }

</book>

}</bib>

<bib>{ for {

b <- xml \ "book"

year = b \@ "year"

if b \ "publisher" === "Addison-Wesley" &&

year > 1991

} yield <book year={ year }>

{ b \ "title" }

</book>

}</bib>

Page 37: XML Processing in Scala (XML London 2014)

For-comprehensions: similar to XQuery

... yet is general purposeNice!

<bib>{

for $b in $xml/book

let $year := $b/@year

where $b/publisher = "Addison-Wesley" and

$year > 1991

return <book year="{ $year }">

{ $b/title }

</book>

}</bib>

<bib>{ for {

b <- xml \ "book"

year = b \@ "year"

if b \ "publisher" === "Addison-Wesley" &&

year > 1991

} yield <book year={ year }>

{ b \ "title" }

</book>

}</bib>

Page 38: XML Processing in Scala (XML London 2014)

Hybrid XML

- XQuery for Scala

- java.xml.* for free

- Look up: XPath

- Transform: XSLT

- Stream: StAX

Page 39: XML Processing in Scala (XML London 2014)

XQuery for Scala (XQS)

- Wraps XQuery API for Java (javax.xml.xquery)

- Scala access to XQuery in:

- MarkLogic, BaseX, Saxon, Sedna, eXist, …- Converts DOM to Scala XML & vice versa

- http://github.com/fancellu/xqs

Page 40: XML Processing in Scala (XML London 2014)

XQuery via XQSval widgets = <widgets>

<widget>Menu</widget>

<widget>Status bar</widget>

<widget id="panel-1">Panel</widget>

<widget id="panel-2">Panel</widget>

</widgets>

import com.felstar.xqs.XQS._

val conn = new net.xqj.basex.local.BaseXXQDataSource().getConnection

val nodes: NodeSeq = conn("for $w in /widgets/widget order by $w return $w", widgets)

| NodeSeq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>,

| <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)

Page 41: XML Processing in Scala (XML London 2014)

XPathimport com.felstar.xqs.XQS._

val widgets = <widgets>

<widget>Menu</widget>

<widget>Status bar</widget>

<widget id="panel-1">Panel</widget>

<widget id="panel-2">Panel</widget>

</widgets>

val xpath = XPathFactory.newInstance().newXPath()

val nodes = xpath.evaluate("/widgets/widget[not(@id)]", toDom(widgets),

XPathConstants.NODESET).asInstanceOf[NodeList]

(nodes: NodeSeq)

| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)

Natively in Scala:(widgets \ "widget")(widget => (widget \ "@id").isEmpty)

| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)

Page 42: XML Processing in Scala (XML London 2014)

XSLTval stylesheet = <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="john">

<xsl:copy>Hello, John.</xsl:copy>

</xsl:template>

<xsl:template match="node()|@*">

<xsl:copy>

<xsl:apply-templates select="node()|@*"/>

</xsl:copy>

</xsl:template>

</xsl:stylesheet>

import com.felstar.xqs.XQS._

val xmlResultResource = new java.io.StringWriter()

val xmlTransformer = TransformerFactory.newInstance().newTransformer(stylesheet)

xmlTransformer.transform(peopleXml, new StreamResult(xmlResultResource))

xmlResultResource.getBuffer

| <?xml version="1.0" encoding="UTF-8"?><people>

| <john>Hello, John.</john>

| <smith>Smith is here.</smith>

| <another>Hello.</another>

| </people>

val peopleXml = <people>

<john>Hello, John.</john>

<smith>Smith is here.</smith>

<another>Hello.</another>

</people>

Page 43: XML Processing in Scala (XML London 2014)

XML Stream Processing// 4GB file, comes back in a second

val src = Source.fromURL("http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-abstract.xml")

val er = XMLInputFactory.newInstance().createXMLEventReader(src.reader)

implicit class XMLEventIterator(ev:XMLEventReader) extends scala.collection.Iterator[XMLEvent]{

def hasNext = ev.hasNext

def next = ev.nextEvent()

}

er.dropWhile(!_.isStartElement).take(10).zipWithIndex.foreach {

case (ev, idx) => println(s"${idx+1}:\t$ev") }

src.close()

| 1: <feed>

| 2:

|

| 3: <doc>

| 4:

|

| 5: <title>

| 6: Wikipedia: Anarchism

| 7: </title>

| 8:

|

| 9: <url>

| 10:

http://en.wikipedia.org/wiki/Anarchism

Page 44: XML Processing in Scala (XML London 2014)

Use Cases

- Data extraction

- Serving XML via REST

- Dynamically generated XSLT

- Interfacing with XML databases

- Flexibility to choose the best tool for the job

Page 45: XML Processing in Scala (XML London 2014)

Excellent Ecosystem

SBT

ScalaTest

scala-xml

macro-paradise

Akka

Spray

scalaz

shapeless

JVMscala-maven-plugin

Spark

Scaladin

Specs

Page 46: XML Processing in Scala (XML London 2014)

Conclusion

- Practical

- Practical for XML processing

Page 47: XML Processing in Scala (XML London 2014)

Where do I start?

- atomicscala.com

- typesafe.com/activator

- scala-lang.org

- scala-ide.org

- IntelliJ

Page 48: XML Processing in Scala (XML London 2014)

Matt Stephens Charles Foster

Page 49: XML Processing in Scala (XML London 2014)

Open to consulting

www.scala.contractors

Follow us on Twitter:

@DinoFancellu

@ScalaWilliam

@MaffStephens