Https:// License: CC BY 2.0 Crossmedia- Publishing mit NoSQL-Techniken: Möglichkeiten,...

Preview:

Citation preview

https://www.flickr.com/photos/jdhancock/5307754233; License: CC BY 2.0

Crossmedia-Publishing mit NoSQL-Techniken: Möglichkeiten, Einsatzszenarien, Bewertung

Christian Kohl, De Gruyter23.06.2015Cross-Media-Forum, München

1) Kurze Erläuterung NoSQL + XML

2) NoSQL + XML im Crossmedia-Publishing bei De Gruyter

Source: http://www.flickr.com/photos/ravescuritiba/773032554/

A very very short history of DB technology

1960s Hierarchical Era

Applikations- und Hardware spezifische DatenspeicherungIBM Mainframes bspw.

1970s+ Relational Era

Granularer Zugriff auf hochstrukturierte DatenTabellen: Spalten/ZeilenIBM, MS, Oracle, …+ SQL

2000s+ Any Structure Era

Schema agnostic, Massive scale, Query and search, Heterogeneous data, Unstructered, Faster time-to-resultsAmazon, Google, Facebook, LinkedIn, MarkLogic, …+ XQUERY, SPARQL, Gremlin, …

Image Source: https://www.flickr.com/photos/infocux/8450190120; License: CC BY-NC 2.0

Information Continuum

RDBMS

Semi- or Un-StructuredStructured

Free textRelational

Hierarchical Semi-structured

Emails DocumentsTime-varying

XML Metadata

Content

Geospatial

Sparse

Graph

Suchmaschine

Volumen von Information

Datenlandschaft heute

Source: Frank Föge, MarkLogic Corporation, 2014.

Datenmenge

Verlinkung

Semi-/Un-Structured Data

Verteilte, horizontal skalierbare Architekturen

Datenkomplexität / -heterogenität

Perf

orm

ance

Lohnliste

Großzahl d. Webanwendungen

Soziales Netzwerk

Semantic Trading?

Relationale DB

Anforderung der Applikation

RDBMS Performance

Source: Sam Bisbee, http://www.ibmbigdatahub.com/blog/exploring-nosql-family-tree.

Image Source: http://h5inc.files.wordpress.com/2011/04/warning-brain-explosion-zone.png

• Riak, Dynamo, Voldemort, …

Key/Value

• Cassandra, Hbase, BigTable, …

Column Oriented

• MarkLogic, CouchDB, MongoDB, …

Document Store

• Neo4j, InfiniteGraph, …

Graph

(Zu) Einfache NoSQL Taxonomie

Image Source: https://steenschledermann.files.wordpress.com/2014/05/no-thanks-were-too-busy1.jpg?w=611

NoSQL ermöglicht …

• Schnellere App Entwicklung• Heterogene Datentypen• Rapid Deployment• Starke horizontale Skalierbarkeit

hinsichtlich• Größe• Komplexität

Source: http://media.gamemanx.com/flv/sf4-ehonda-sagat.jpg

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10

Developer Journey

Iterate

Load Data Sources“as-is”

(XML, JSON, Binary)

SearchTransformCombine

Data

Define Indexesfor Analytics

Data AccessWeb Application

User Interface

== Agile Process

Image Source: http://www.flickr.com/photos/rs-foto/1242024959/

DOXMLDBs

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12

A book table looks like this…??Book

InfoTitle = “I Love Penguins”Author = “S. Lion”

Section• Chapter

PageParagraph = “I love penguins because…”

PageParagraph = “On the subject of food…”

• ChapterPage

Section• Chapter• Chapter• Chapter

• Paragraph• Paragraph

title author section …

I Love Penguins S. Lion

Issues with Sections? How many columns?

Option: Modeling hierarchies with relations (foreign key) is not efficient.

DB Schema mapping

Shredding

Foreign Keys & Joins

Performance Overhead

Maintenance Overhead

<meta> <URI> http://thewobbitaparody.blogspot.de</URI> <title> The Superfriends Of The Ring</title> <author>Paul Erickson</author></meta><body> (…) <section nr=„11“ title=„Promo‘s Afterparty“> <paragraph>Promo came in soon afterwards. He glanced about the condo and then quietly

asked "Is Uncle Bulbo gone yet?“ "Yes, at last," said Pantsoff. "I thought he'd never leave. Oh, he left something for you." He handed Promo the inter-office envelope. "Don't bother unwinding the string. Inside is his will, his trust documents, and his tax records. I think he left you his ring, too.“ "Oh, great," said Promo. "How long do I have to keep that stuff? Five years? Seven years? Forever? I hate filing." He stopped complaining for a moment. "You said his magic ring is in there too? Cool! I'll never have to pay a cover charge to enter a nightclub again!“ "Promo, you've inherited Bulbo's fortune, so stop thinking small for a change. Actually, don't think about the ring at all. Just put it away. Keep it secret, and keep it safe!"</paragraph>

(…)</body>

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

Dokument als Informations-Container<SAR>

<title>Suspicious vehicle…Suspicious vehicle near airport<date><type><threat>

2012-11-12Zobservation/surveillance

<type>suspicious activity<category>suspicious vehicle

<location><lat>37.497075<long>-122.363319

<subject>IRIID<subject>IRIID

<predicate><predicate>

isavalue

<triple><triple>

<object>license-plate<object>ABC 123

<description>A blue van…A blue van with license plate ABC 123 was observed parked behind the airport sign…

</title></date>

</type>

</type></category>

</threat>

</lat></long>

</location>

</subject></subject>

</predicate></predicate>

</object></object>

</description></SAR>

</triple></triple>

Metadaten, Daten, Beziehungen und Inhalte

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

Dokument als Informations-Container <SAR>

<title>

Suspicious vehicle…

<date>

2012-11-12Z

<type>

<threat>

suspicious activity<category>

suspicious vehicle

<location>

<lat>

37.497075

<long>

-122.363319

<description>

A blue van…

<subject><subject>

<predicate>

<object>

IRIID

IRIID

isa

value

license-plate

ABC 123<predicate>

<object>

observation/surveillance<type>

<triple>

<triple>

Semantic

(RDF)

Triples

Unstructured full-

text

Geospati

alValues

XML ist für Verlage

Source: http://www.flickr.com/photos/scotthudson/3448785931/

• De Gruyter Online• De Gruyter CMS• Maybe Asset

Management?• Maybe DataWarehouse?

NoSQL bei DG

De Gruyter Online

Dokumente

MetadatenAssets

Entitlements

Starkes Wachstum

Unterschiedlichste Daten

De Gruyter CMS

Dokumente

Metadaten

Triples

Assets

Häufiges Re-Arrangement der Daten: Änderungen bei Struktur und Verlinkung

Semantik