57
David De Roure rom SALAMI to Social Machines: usic Information Retrieval as an xemplar of Digital Research rom SALAMI to Social Machines: usic Information Retrieval as an xemplar of Digital Research, or ourth Quadrant Semantic Media

Fourth Quadrant Semantic Media

Embed Size (px)

DESCRIPTION

"From SALAMI to Social Machines: Music Information Retrieval as an Exemplar of Digital Research, or Fourth Quadrant Research". Keynote by David De Roure at Semantic Media Launch, Barbican, 3 October 2012

Citation preview

David De Roure

From SALAMI to Social Machines:

Music Information Retrieval as an

Exemplar of Digital Research

From SALAMI to Social Machines:

Music Information Retrieval as an

Exemplar of Digital Research, or

Fourth Quadrant Semantic Media

• Digital Research

• SALAMI as an exemplar

• Reconstruct / Reuse / Repurpose

• Social Machines

Overview

...the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites

Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203

Bio

Ess

ays,

, 26

(1):

99–

105,

Jan

uary

200

4

http://research.microsoft.com/en-us/collaboration/fourthparadigm/

www.einfrastructuresouth.ac.uk

More people

Mor

e m

achi

nes

A Big Picture

Big DataBig Compute

Conventional Computation

The Future!

SocialNetworking

e-infrastructure

onlineR&D

The Fourth Quadrant

• Data Analysis Pipelines• Workflows are the new

rock and roll• Machinery for

coordinating the execution of services and linking together resources

• Repetitive and mundane boring stuff made easier

E. Science laboris

Carole Goble

“Automating the Audio Production Process”

“Blink”

• Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle

• Paul meets Jo. Jo is investigating Whipworm in mouse.

• Jo reuses one of Paul’s workflow without change.• Jo identifies the biological pathways involved in

sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.

• Previously a manual two year study by Jo had failed to do this.

Reuse, Recycling, Repurposing

Carole Goble

“A biologist would rathershare their toothbrush than

their gene name”

“Data mining: mydata’s mine and

your data’s mine”

http://www.myexperiment.org/

Results

Logs

Results

Metadata PaperSlides

Feeds into

produces

Included in

produces Published in

produces

Included in

Included in Included in

Published in

Workflow 16

Workflow 13

Common pathways

QTLPaul’s PackPaul’s Research

Object

Research Objects Reproducibility, Integrated Publishing

• Workflow– Provenance– Conservation & Preservation– Executable Publication

• Human– Credit Tracking– Unit of Scholarship– Crowd management

• Semantics– Acquisition & Publishing– Encoding, Encapsulation & Annotation: OAI-ORE, AO…

Distributed Third Party Tenancy

Alien Store

Technical Objects Social Objects

Carole Goble

1. Use URIs as names for things

2. Use HTTP URIs so that people

can look up those names

3. When someone looks up

a URI, provide useful

information, using the

standards (RDF*, SPARQL)

4. Include links to other URIs

so that they can discover

more things

Linked Data support rdf.myexperiment.org

SELECT?pack ?contribWHERE { ?pack rdf:type mepack:Pack. ?pack ore:aggregates ?contrib.}

SELECT?wf ?uriWHERE { ?wf mebase:has-current-version ?v. ?v mecomp:executes-dataflow ?d. ?d mecomp:has-component ?c. ?c rdf:type mecomp:WSDLProcessor. ?c mecomp:processor-uri ?uri.}

Sean Bechhofer

“Publication @ Source”

Jeremy Frey

A digital lab book

replacement that chemists were able to

use, and liked.

1 1 2 2 1 3 1 4

Sample of 4-flourinatedbiphenyl

Add CoolReflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

Weigh

2.0719 g

text

3 5

Add

g

Sample ofBr11OCB

2 6

Reflux

2 7

Cool

Water

Measure

30 ml

9

Liquid-liquid

extraction

DCM

Measure

3 of 40 ml

10

Dry

MgSO4

11

Filter(Buchner)

12

RemoveSolvent

by RotaryEvaporation

13

Fuse

Silica

14

ColumnChromatography

Ether/PetrolRatio

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Inorganics dissolve 2layers. Added brine

~20ml.

Organics are yellowsolution

Washed MgSO4 withDCM ~ 50ml

Measure

excess

Observation Types

weight - grammes

measure - ml, drops

annotate - text

temperature - K, °C

Key

Process

Input

Literal

Observation

Add CoolRefluxAddAdd Reflux Cool Dry Filter Remove

Solventby Rotary

Evaporation

Fuse ColumnChromatography

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

Cool and addBr11OCB

Heat atreflux untilcompletion

Cool and addwater (30ml)

Combine organics,dry over MgSO4 &filter

Removesolvent invacuo

Liquid-liquid

extraction

Extract withDCM(3x40ml)

Fuse compound to silica &column in ether/petrol

4 8

Add

Add

text

Annotate

Annotate

text

Weigh

Annotate

g

Annotate Annotate

text text

Future Questions

Whether to have many subclasses of processes or fewer with annotations

How to depict destructive processes

How to depict taking lots of samples

What is the observation/process boundary? e.g. MRI scan

1.5918

Combechem

30 January 2004gvh, hrm, gms

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

image

To

Do

Lis

tP

lan

Pro

ce

ss

Re

co

rd

Jeremy Frey

• Digital Research

• SALAMI as an exemplar

• Reconstruct / Reuse / Repurpose

• Social Machines

Overview

The Problem Ichiro Fujinaga

• Structural Analysis of Large amounts of Music Information

• Musical analysis has traditionally been conducted by individuals and on a small scale

• Computational approach, combined with the huge volume of data now available, will 1. Deliver substantive corpus of musical analyses in

common framework for music scholars and students2. Establish a methodology and tooling so that

community can sustain and enhance this resource

SALAMI

www.diggingintodata.org

Digital Music Collections

Student-sourced ground truth

Community Software

Linked Data Repositories

Supercomputer

23,000 hours ofrecorded music

Music InformationRetrieval Community

Structural Analysis of Large Amounts of Music Information

Ashley Burgoyne

Ground Truth

AnnotationExample

Lead          

Function

Large scale

Small scale

class structure

Ontology models properties from musicological domain• Independent of Music Information Retrieval research and

signal processing foundations• Maintains an accurate and complete description of

relationships that link them

Segment Ontology

Kevin Page and Ben Fields

http://www.music-ir.org/mirex/

Meandre

http://seasr.org/meandre/

Stephen Downie

ABCDB'C

“Signal”Digital Audio

“Ground Truth”

Community

It’s web-like!

StructuralAnalysis

http://musicnet.mspace.fm/blog/music-linked-data-workshop/

Digital Music Collections

ground truth

Evaluation Infrastructure

(sociotechnical)

ExpertiseExpertiseExpertiseExpertise

Community SoftwareCommunity

SoftwareCommunity

Software

Digital Music CollectionsDigital Music CollectionsDigital Music Collections

ground truthground truth

Evaluation Infrastructure

(sociotechnical)

ResultsResultsResultsResults

EvaluationsEvaluationsEvaluations

paperspaperspapersPapers

• Digital Research

• SALAMI as an exemplar

• Reconstruct / Reuse / Repurpose

• Social Machines

Overview

http://force11.org/

data

method

Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed.

Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years.Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable.Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results.Respectful. Explicit representations of the provenance, lineage and flow of intellectual property.

The R dimensions

Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/

Machine

Machine

repeat

Machine

repeat

REPRODUCE

MachineMachine

softwarepaper

paper

ResearchRecord

software

SoftwareREPRODUCE OR REPEAT?

software

workflowpaper

Software

wf

Machine

software

workflow

software

blogs.nature.com/eresearch/

The Executable Thesis

new data

new results

executablethesis

PhD Student

Research Objects that are1. The research record for repeatable, reproducible, ... etc2. Describe process (method) for enactment/execution3. Usable by machines as well as humans

– Social Objects– Semantically described– Programmatically accessible– Designed for assistance and automation– Designed for scale and heterogeneity

4. Composable with a distributed computational model?

Computational Research Objects

Notifications and automatic re-runs

Machines are users too

Autonomic

Curation

Self-repair

New research?

• Digital Research

• SALAMI as an exemplar

• Reconstruct / Reuse / Repurpose

• Social Machines

Overview

http://www.bodleian.ox.ac.uk/bodley/library/special/projects/whats-the-score

SOCIAMThe Theory and Practice of

Social Machines

Real life is and must be full of all kinds of social constraint – the very processes from which society arises. Computers can help if we use them to create abstract social machines on the Web: processes in which the people do the creative work and the machine does the administration… The stage is set for an evolutionary growth of new social engines. Berners-Lee, Weaving the Web, 1999

The Order of Social Machines

Some other machines?

• Number of people• Number of machines• Scale of data• Varieties of data• Type of machine

problem solving• Type of human

problem solving

• Empowering of individuals, groups, crowds

• Time criticality• Extent of wide area

communication• Need for urgent

mobilization • Specification of goal state

Dimensions

SOCIAM – The Theory and Practice of Social Machines – commences October 2012, led by Nigel Shadbolt at University of Southampton.

Physical World(people and devices)

Building a Social Machine

Design andComposition

Participation andData supply

Model of social interaction

Virtual World(Network of social interactions) Dave Robertson

The users of a website, the website, and the interactions between them, together form our fundamental notion of a “machine”

More people

Mor

e m

achi

nes

That Big Picture

Big DataBig Compute

Conventional Computation

The Future!

SocialNetworking

e-infrastructure

onlineR&D

The Fourth Quadrant

An Agenda

1. Science has much to learn from an industry / R&D that is already digital end-to-end

• Insights into ICT challenges2. What can we learn from (e-)Science?

• Metadata capture at source, end-to-end semantics• Social objects, semantic objects, audio objects? • Reproducible/reconstructable/machine-assisted

analysis/production• Interactivity, intersection of digital and physical

3. Designing the Social Machines of music*• Human generated metadata / music?

* And also the music of Social Machines!

[email protected]/people/dder

www.scilogs.com/eresearch

@dder

Slide credits: Christine Borgman, Carole Goble, Faith Lawrence & Mike Jewell, Sean Bechhofer, Jeremy Frey, Ichiro Fujinaga, Stephen Downie, Ashley Burgoyne, Kevin Page, Ben Fields, Nigel Shadbolt, Dave Robertson

www.myexperiment.org/packs/337http://www.slideshare.net/davidderoure/fourth-quadrant-semantic-media

• Semantic Mediahttp://semanticmedia.org.uk/

• myExperiment project wikihttp://wiki.myexperiment.org/

• Workflow Forever project (Wf4Ever)http://www.wf4ever-project.org/

• Future of Research Communication (FORCE11)http://force11.org/

• Theory and Practice of Social Machines (SOCIAM)http://sociam.org/

Links

• D. De Roure, C. Goble and R. Stevens. The Design and Realisation of the myExperiment Virtual Research Environment for Social Sharing of Workflows Future Generation Computer Systems 25, pp. 561-567.

• S. Bechhofer, I. Buchan, D De Roure et al. Why linked data is not enough for scientists, Future Generation Computer Systems

• D. De Roure, David and C. Goble, Anchors in Shifting Sand: the Primacy of Method in the Web of Data. WebSci10, April 26-27th, 2010, Raleigh, NC, US.

• D. De Roure, S. Bechhofer, C. Goble and D. Newman, Scientific Social Objects, 1st International Workshop on Social Object Networks (SocialObjects 2011).

• D. De Roure, K. Belhajjame, P. Missier, P. et al Towards the preservation of scientific workflows. 8th International Conference on Preservation of Digital Objects (iPRES 2011).

• Carole A. Goble, David De Roure and Sean Bechhofer Accelerating scientists’ knowledge turns. Will be available at www.springerlink.com

• Khalid Belhajjame, Oscar Corcho, Daniel Garijo et al Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse, SePublica2012 at ESWC2012, Greece, May 2012

• Kevin R. Page, Ben Fields, David De Roure et al Reuse, Remix, Repeat: The Workflows of MIR, 13th International Society for Music Information Retrieval Conference (ISMIR 2012) Porto, Portugal, October 8th-12th, 2012

• Jun Zhao, Jose Manuel Gomez-Perezy, Khalid Belhajjame et al, Why Workflows Break - Understanding and Combating Decay in Taverna Workflows, eScience 2012, Chicago, October 2012