26
GRDDL GRDDL The Why, What, How, and Where Chimezie Ogbuji Cleveland Clinic Foundation

GRDDL: The Why, What, How, and Where

Embed Size (px)

Citation preview

Page 1: GRDDL: The Why, What, How, and Where

GRDDLGRDDLThe Why, What, How, and Where

Chimezie OgbujiCleveland Clinic Foundation

Page 2: GRDDL: The Why, What, How, and Where

GRDDL: The AcronymGRDDL: The Acronym Gleaning Resource Descriptions (from) Dialects (of) Language

Rather long and intimidating

Page 3: GRDDL: The Why, What, How, and Where

GRDDL: By DeconstructionGRDDL: By Deconstruction

Wordnet Definition of Glean:◦ (gather, as of natural products)◦ Synonyms: reap, harvest.

Resource Description Framework (RDF)◦ Logical assertions

Dialects of Language ◦ XML document families (XHTML, for instance)

Page 4: GRDDL: The Why, What, How, and Where

GRDDL: By AnalogyGRDDL: By AnalogyGRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.

Page 5: GRDDL: The Why, What, How, and Where

The WhyThe Why Vast amount of latent semantics in markup

Web content today is primarily built for human consumption

Text indexing will only get you so far for document retrieval

If machines are meant to harvest RDF from documents, reproducible protocols are needed

<span>Chimezie Ogbuji<span>

Page 6: GRDDL: The Why, What, How, and Where

The Why (Cont.)The Why (Cont.) Microformats, eRDF, and RDFa

Specific to a particular family of documents

XHTML and HTML If the goal is machine consumption, the

bar needs to be raised beyond XHTML

Page 7: GRDDL: The Why, What, How, and Where

The Why (Cont.)The Why (Cont.) It seems easy to forget that XHTML is

indeed an XML dialect You would think the (X) would make

that obvious What was needed was a standard way to

harvest RDF that is applicable to all XML dialects

Page 8: GRDDL: The Why, What, How, and Where

The WhatThe What Faithful rendition Transformations GRDDL result Source documents GRDDL-aware Agents

Page 9: GRDDL: The Why, What, How, and Where

Faithful RenditionFaithful Rendition“By specifying a GRDDL transformation, the author of a document

states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”

Licenses an author-certified interpretation of an XML document

A powerful paradigm for messaging See David Booths “RDF and SOA” http://www.w3.org/2007/01/wos-papers/booth

Page 10: GRDDL: The Why, What, How, and Where

GRDDL TransformationsGRDDL Transformations Functions that take an XML document and

return an RDF graph Transformations can be written in any

particular language The “reference” transformation language is

XSLT “[XSLT1] is the format most widely supported by GRDDL-

aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good

safety characteristics”

Page 11: GRDDL: The Why, What, How, and Where

Other Transformation LanguagesOther Transformation Languages “.. technically Javascript, C, or virtually any

other programming language may be used to express transformations for GRDDL”

However, these transformations need to be deterministic in order to ensure the result is a faithful rendition

Hence, they must be functions

Page 12: GRDDL: The Why, What, How, and Where

GRDDL ResultGRDDL Result The result of applying the transformation is

an RDF serialization The RDF graph that corresponds to the

serialization is a GRDDL result of the original document

The “reference” result format is RDF/XML Other formats can be used (Turtle, N3,etc.)

Page 13: GRDDL: The Why, What, How, and Where

GRDDL Source DocumentsGRDDL Source Documents The class of documents for which GRDDL

defines a way to extract a result graph: XML Documents XML Namespace Documents Valid XHTML XHTML Profiles

Page 14: GRDDL: The Why, What, How, and Where

GRDDL Source DocumentsGRDDL Source Documents

Page 15: GRDDL: The Why, What, How, and Where

GRDDL: XML DocumentsGRDDL: XML Documents GRDDL Namespace (grddl prefix)

http://www.w3.org/2003/g/data-view#

transformation attribute<?xml version=“1.0” encoding=“UTF-8”?>

<root

xmlns:grddl='http://www.w3.org/2003/g/data-view#’

grddl:transformation=“.. path to transform ..”>

… XML content ..

</root>

Page 16: GRDDL: The Why, What, How, and Where

Namespace DocumentsNamespace Documents“Transformations can be associated not only with individual

documents but also with whole dialects that share an XML namespace”

A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)

The GRDDL result of the namespace document has a statement of the form:

?nsDoc grddl:namespaceTransformation ?txDoc

• txDoc is the location of a transformation applicable to such XML documents

Page 17: GRDDL: The Why, What, How, and Where

Valid XHTML DocumentsValid XHTML Documents<html xmlns="http://www.w3.org/1999/xhtml">

<head

profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title>

<link rel="transformation"

href=”.. path to transformation .. " />

...

</head>

</html>

Refers to the GRDDL XHTML profile Licenses the interpretation of

rel=“transformation” links

Page 18: GRDDL: The Why, What, How, and Where

XHTML ProfilesXHTML Profiles“Adding a GRDDL profileTransformation assertion to a profile

document is much like adding a namespaceTransformation assertion to a namespace document”

A GRDDL source document lives at the location of the profile URI an XHTML document

The GRDDL result of the profile document has a statement of the form:

?profileDoc grddl:profileTransformation ?txDoc

• txDoc is the location of a transformation applicable to such XML documents

Page 19: GRDDL: The Why, What, How, and Where

The HowThe How GRDDL builds on existing XML & RDF

standards An implementation mostly needs to

orchestrate: Parsing of data representations Resolving representations from web locations The necessary XML processing to peek into and

harvest RDF from the various sources The highly recursive nature of GRDDL 

Page 20: GRDDL: The Why, What, How, and Where

Technological OverlapTechnological Overlap

Page 21: GRDDL: The Why, What, How, and Where

Anatomy of a GRDDL Anatomy of a GRDDL Implementation: GRDDL.pyImplementation: GRDDL.py A reference implementation from scratch 650 LOC

RDFLib, 4Suite-XML, and Python control logic

A layered approach Core module that handles transformations One module per source type stacked on top of the

core A top layer that orchestrates the recursion and

identification of which ‘class’ a source document belongs to

Page 22: GRDDL: The Why, What, How, and Where

GRDDL.py CoreGRDDL.py Core

Page 23: GRDDL: The Why, What, How, and Where

Component StackComponent Stack

Page 24: GRDDL: The Why, What, How, and Where

The WhereThe Where GRDDL services online:

http://triplr.org/ (Stuff in, triples out) http://www.w3.org/2007/08/grddl/ (W3C GRDDL

Service) Primary GRDDL implementations:

Redland GRDDL.py Virtuoso GRDDL Reader for Jena

RDFa is most common GRDDL source content format in the wild

Page 25: GRDDL: The Why, What, How, and Where

Hidden Value PropositionHidden Value Proposition Supports separation of concerns:

XML for messaging, data collection, structural validation

RDF for Expressive assertions, inference, etc.

A way to invest in data richness and accessibility

Page 26: GRDDL: The Why, What, How, and Where

GRDDL UsecasesGRDDL Usecases Embedding scheduling assertions on

personal pages Using GRDDL for extracting RDF from XML

medical record documents Cleveland Clinic use case (clinical

research) Aggregating web-based product reviews Embedding web service descriptions Adding semantic assertions to XML schemas Embedding semantic assertions to Wikis