50
The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration Summer Grid 2004 UT Brownsville South Padre Island Center 24 June 2004 Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division

The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

  • Upload
    sumana

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration. Summer Grid 2004 UT Brownsville South Padre Island Center 24 June 2004 Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division. GriPhyN: Grid Physics Network Mission. - PowerPoint PPT Presentation

Citation preview

Page 1: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

The Virtual Data Grid:A New Model and Architecture for

Data-Intensive Collaboration

Summer Grid 2004UT Brownsville South Padre Island Center

24 June 2004

Mike WildeArgonne National Laboratory

Mathematics and Computer Science Division

Page 2: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

2Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

GriPhyN:Grid Physics Network Mission

Enhance scientific productivity through discovery and processing of datasets, using the grid as a scientific workstation

Virtual Data enables this approach by creating datasets from workflow “recipes” and recording their provenance.

GriPhyN works to “cross the chasm” -

application and computer scientists create and field-test paradigms and toolkits together

Page 3: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

3Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Acknowledgements:Virtual Data is a Large Team Effort

The Chimera Virtual Data Systemis the work of Ian Foster, Jens Voeckler, Mike Wilde and Yong Zhao

The Pegasus Planner is the work of Ewa Deelman, Gaurang Mehta, and Karan Vahi

Applications described are the work of many people, including: James Annis, Rick Cavanaugh, Dan Engh, Rob Gardner, Albert Lazzarini, Natalia Maltsev, Marge Bardeen, and their wonderful teams

Page 4: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

4Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual Data Scenario

simulate –t 10 …

file1

file2reformat –f fz …

file1file1File3,4,5

psearch –t 10 …

conv –I esd –o aodfile6 summarize –t 10 …

file7

file8

On-demand data

generation

Update workflow following changes

Manage workflow;

psearch –t 10 –i file3 file4 file5 –o file8summarize –t 10 –i file6 –o file7reformat –f fz –i file2 –o file3 file4 file5 conv –l esd –o aod –i file 2 –o file6simulate –t 10 –o file1 file2

Explain provenance, e.g. for file8:

Page 5: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

5Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual DataDescribes analysis workflow

The recorded virtual data “recipe” here is:

– Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2

– Programs: 8 < psearch, 7 < summarize,(3,4,5) < reformat, 6 < conv, (1,2) < simulate

simulate –t 10 …

file1

file2reformat –f fz …

file1file1File3,4,5

psearch –t 10 …

conv –I esd –o aodfile6 summarize –t 10 …

file7

file8

Requesteddataset

Page 6: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

6Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual DataDescribes analysis workflow

To recreate file 8: Step 1

– simulate > file1, file2

simulate –t 10 …

file1

file2reformat –f fz …

file1file1File3,4,5

psearch –t 10 …

conv –I esd –o aodfile6 summarize –t 10 …

file7

file8

Requestedfile

Page 7: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

7Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual DataDescribes analysis workflow

To re-create file8: Step 2

– files 3, 4, 5, 6 derived from file 2

– reformat > file3, file4, file5

– conv > file 6

simulate –t 10 …

file1

file2reformat –f fz …

file1file1File3,4,5

psearch –t 10 …

conv –I esd –o aodfile6 summarize –t 10 …

file7

file8

Requestedfile

Page 8: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

8Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual DataDescribes analysis workflow

To re-create file 8: step 3

– File 7 depends on file 6

– Summarize > file 7

simulate –t 10 …

file1

file2reformat –f fz …

file1file1File3,4,5

psearch –t 10 …

conv –I esd –o aodfile6 summarize –t 10 …

file7

file8

Requestedfile

Page 9: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

9Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual DataDescribes analysis workflow

To re-create file 8: final step

– File 8 depends on files 1, 3, 4, 5, 7

– psearch < file1, file3, file4, file5, file 7 > file 8

simulate –t 10 …

file1

file2

psearch –t 10 …

reformat –f fz …

conv –I esd –o aod

file1file1File3,4,5

file6 summarize –t 10 …

file7

file8

Requestedfile

Page 10: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

10Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Grid3 – The Laboratory

Supported by the National Science Foundation and the Department of Energy.

Page 11: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

11Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

VDL: Virtual Data LanguageDescribes Data Transformations

Transformation– Abstract template of program invocation– Similar to "function definition"

Derivation– “Function call” to a Transformation– Store past and future:

> A record of how data products were generated> A recipe of how data products can be generated

Invocation– Record of a Derivation execution

These XML documents reside in a “virtual data catalog” – VDC - a relational database

Page 12: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

12Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

VDL Describes Workflowvia Data Dependencies

TR tr1(in a1, out a2) {

argument stdin = ${a1}; 

argument stdout = ${a2}; }

TR tr2(in a1, out a2) {

argument stdin = ${a1};

argument stdout = ${a2}; }

DV x1->tr1(a1=@{in:file1}, a2=@{out:file2});

DV x2->tr2(a1=@{in:file2}, a2=@{out:file3});

file1

file2

file3

x1

x2

Page 13: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

13Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Workflow example

Graph structure– Fan-in

– Fan-out

– "left" and "right" can run in parallel

Needs external input file– Located via replica catalog

Data file dependencies– Form graph structure

findrangefindrange

analyze

preprocess

Page 14: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

14Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Complete VDL workflow

Generate appropriate derivationsDV top->preprocess( b=[ @{out:"f.b1"},

@{ out:"f.b2"} ], a=@{in:"f.a"} );DV left->findrange( b=@{out:"f.c1"},

a2=@{in:"f.b2"}, a1=@{in:"f.b1"}, name="left", p="0.5" );

DV right->findrange( b=@{out:"f.c2"}, a2=@{in:"f.b2"}, a1=@{in:"f.b1"}, name="right" );

DV bottom->analyze( b=@{out:"f.d"}, a=[ @{in:"f.c1"}, @{in:"f.c2"} );

Page 15: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

15Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Compound TransformationsEnable Functional Abstractions

Compound TR encapsulates an entire sub-graph:TR rangeAnalysis (in fa, p1, p2, out fd, io fc1, io fc2, io fb1, io fb2, ){ call preprocess( a=${fa}, b=[ ${out:fb1}, ${out:fb2} ] ); call findrange( a1=${in:fb1}, a2=${in:fb2},

name="LEFT", p=${p1}, b=${out:fc1} ); call findrange( a1=${in:fb1}, a2=${in:fb2},

name="RIGHT", p=${p2}, b=${out:fc2} ); call analyze( a=[ ${in:fc1}, ${in:fc2} ], b=${fd} ); }

Page 16: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

16Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Derivation scripts Representation of virtual data provenance:

DV d1->diamond( fd=@{out:"f.00005"}, fc1=@{io:"f.00004"}, fc2=@{io:"f.00003"}, fb1=@{io:"f.00002"}, fb2=@{io:"f.00001"}, fa=@{io:"f.00000"}, p2="100", p1="0" );

DV d2->diamond( fd=@{out:"f.0000B"}, fc1=@{io:"f.0000A"}, fc2=@{io:"f.00009"}, fb1=@{io:"f.00008"}, fb2=@{io:"f.00007"}, fa=@{io:"f.00006"}, p2="141.42135623731", p1="0" );

...DV d70->diamond( fd=@{out:"f.001A3"},

fc1=@{io:"f.001A2"}, fc2=@{io:"f.001A1"}, fb1=@{io:"f.001A0"}, fb2=@{io:"f.0019F"}, fa=@{io:"f.0019E"}, p2="800", p1="18" );

Page 17: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

17Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Invocation Provenance

Completion status and resource usage

Attributes of executable transformation

Attributes of input and output files

Page 18: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

18Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Executing VDL Workflows

Abstractworkflow

local planner

ConcreteDAG

Global planner“Pegasus”

DAGman /Condor-G

GridInfo

“jit” planner(research)

Page 19: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

19Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

GriPhyN-iVDGLApplications to date

ATLAS, BTeV, CMS – HEP event simulation Argonne Computational Biology – sequence

comparison and result capture LIGO – Pulsar search Sloan Digital Sky Survey – cluster finding;

near-earth object search planned Quarknet – science education – cosmic

rays, HEP analysis

Page 20: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

20Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Genome Analysis Database Update

Jazz/ANL

Grid3

UofWiscJazz/ANL

Grid3

UofWisc

Grid

A

B

D

C A

B

C

D A

D

B

C

C

D

A

B

Automatic Workflows Created as per UserRequest or Project

GADU - GServer

A

B

D

C A

B

C

D A

D

B

C

C

D

A

B

A

B

D

C

A

B

D

C A

B

C

D

A

B

C

D A

D

B

C

A

D

B

C

C

D

A

B

C

D

A

B

Automatic Workflows Created as per UserRequest or Project

GADU - GServer

Automatic Workflows Created as per UserRequest or Project

GADU - GServer

Hit and Run Registered Groups Collaborators

Interface to theServer

Jets

pee

d

Hit and Run Registered Groups CollaboratorsPublic Registered Groups Collaborators

End Users

Interface to theServer

Jets

pee

d

Dat

a F

low

an

d S

tora

ge

at v

ario

us

leve

ls

Ch

imer

a, C

on

do

r, G

lob

us

Application work by Alex Rodriguez, Dina Sulakhe, Natalia Matlsev,Argonne MCS

Described in GGF10workshop paper.

Page 21: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

21Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy clustersize distribution

DAG

Virtual Data Example:Galaxy Cluster Search

Sloan Data

Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao,

University of Chicago. Described in SC2002 paper

Page 22: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

22Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Cluster SearchWorkflow Graph

and Execution Trace

Workflow jobs vs time

Page 23: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

23Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

mass = 200decay = WWstability = 1LowPt = 20HighPt = 10000

mass = 200decay = WWstability = 1event = 8

mass = 200decay = WWstability = 1plot = 1

mass = 200decay = WWplot = 1

mass = 200decay = WWevent = 8

mass = 200decay = WWstability = 1

mass = 200decay = WWstability = 3

mass = 200

mass = 200decay = WW

mass = 200decay = ZZ

mass = 200decay = bb

mass = 200plot = 1

mass = 200event = 8

Virtual Data Application: High Energy Physics

Data Analysis

Work and slide byRick Cavanaugh andDimitri Bourilkov,University of FloridaRef: CHEP 2002 paper

Page 24: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

24Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Using Virtual Data forScience Education

The QuarkNet-Trillium collaboration is using Grid virtual data tools and methods to enrich science education

Its an experiment to give students the means to:– discover and apply datasets, algorithms, and data

analysis methods

– collaborate by developing new ones and sharing results and observations

– learn data analysis methods that will ready and excite them for a scientific career

And in later steps, we may actually use the Grid!

Page 25: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

25Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Quarknet Virtual Data Project

Standard Web access

Central High SchoolReston, Virginia

LocallyCollected Data

CosmicRay

DetectorS

tud

ent/

Teach

erT

eams

Yale / Middletown High CollaborationHartford, Connecticut

LocallyCollected Data

CosmicRay

Detector

Stu

den

t/T

eacher

Team

s

Foothills High SchoolGreat Falls, Montana

LocallyCollected Data

CosmicRay

Detector

Stu

den

t/T

eacher

Team

s

Quarknet Virtual Data Portal

Student Data,Algorithms,

Results, Notes,and communications

VirtualData

Toolkit

VirtualData

Catalog

Student teacher teams sharing data, methods, programs, and knowledge

Enabling collaboration-intensive science discovery with virtual data tools and methods

Page 26: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

26Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Detector Performance Study

Page 27: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

27Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Example: BTeV Event Simulation

Page 28: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

28Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Support for Search and Discovery

Goal: make it as easy to use as Google More advanced capabilities lie below the

surface (as with Google) Understand the structure and meaning of

the datasets and their fields. Advanced search, using SQL-like queries Find both DATA and TRANSFORMATIONS Create datasets from queries Perform calculations on datasets, filtering

results to look for patterns

Page 29: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

29Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Search byMetadata

Page 30: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

30Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Derving a new

dataset

…to find mass of

“z” particle:

Page 31: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

31Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Workflow formissing energy calculations

Page 32: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

32Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual Provenance:list of derivations and files

<job id="ID000001" namespace="Quarknet.HEPSRCH" name="ECalEnergySum" level="5“ dv-namespace="Quarknet.HEPSRCH" dv-name="run1aesum"> <argument><filename file="run1a.event"/> <filename file="run1a.esm"/></argument> <uses file="run1a.esm" link="output" dontRegister="false" dontTransfer="false"/> <uses file="run1a.event" link="input" dontRegister="false" dontTransfer="false"/> </job><job id="ID000002" namespace="Quarknet.HEPSRCH" name="ECalEnergySum" level="7“ dv-namespace="Quarknet.HEPSRCH" … <argument><filename file="electron10GeV.event"/> <filenamefile="electron10GeV.sum"/></argument>… </job><job id="ID000014" namespace="Quarknet.HEPSRCH" name="ReconTotalEnergy" level="3"… <argument><filename file="run1a.mis"/> <filename file="run1a.ecal"/> … <uses file="run1a.muon" link="input" dontRegister="false" dontTransfer="false"/> <uses file="run1a.total" link="output" dontRegister="false" dontTransfer="false"/> <uses file="run1a.ecal" link="input" dontRegister="false" dontTransfer="false"/> <uses file="run1a.hcal" link="input" dontRegister="false" dontTransfer="false"/> <uses file="run1a.mis" link="input" dontRegister="false" dontTransfer="false"/> </job>

<!--list of all files used --> <filename file="ecal.pct" link="inout"/> <filename file="electron10GeV.avg" link="inout"/> <filename file="electron10GeV.sum" link="inout"/> <filename file="hcal.pct" link="inout"/>….(excerpted for display)

Page 33: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

33Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual Provenance in XML:control flow graph

<child ref="ID000003"> <parent ref="ID000002"/> </child> <child ref="ID000004"> <parent ref="ID000003"/> </child> <child ref="ID000005"> <parent ref="ID000004"/> <parent ref="ID000001"/>… <child ref="ID000009"> <parent ref="ID000008"/> </child> <child ref="ID000010"> <parent ref="ID000009"/> <parent ref="ID000006"/>… <child ref="ID000012"> <parent ref="ID000011"/> </child> <child ref="ID000013"> <parent ref="ID000011"/> </child> <child ref="ID000014"> <parent ref="ID000010"/> <parent ref="ID000012"/>… <parent ref="ID000013"/>… </child>…

(excerpted for display…)

Page 34: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

And writing the results up in a “poster”

Page 35: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

35Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Poster describing analysis

Page 36: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

36Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Using active data from Web Services

Page 37: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

37Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Page 38: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

38Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Page 39: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

39Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Page 40: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

40Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Levels of Interaction “Skins” – use it like a calculator,

experiment with scenarios and settings, use virtual data like a log book to document, assess, and share parameter values.

“Blocks” – re-assemble workflow pipelines using existing ones as patterns and pre-developed transforms as building blocks

“Code” – write new transforms in a variety of languages and data models

Page 41: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

41Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Observations

A provenance approach based on interface definition and data flow declaration fits well with Grid requirements for code and data transportability and heterogeneity

Working in a provenance-managed system has many fringe benefits: uniformity, precision, structure, communication, documentation

The real world is messy – finding the right abstractions is hard, and handling “legacy” applications is even harder

Page 42: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

42Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Vision for Provenance in the Large

Universal knowledge management and production systems

Vendors integrate the provenance tracking protocol into data processing products

Ability to run anywhere “in the Grid”

Page 43: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

43Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Virtual Data Grid Vision

GridOperations

simulation data

discovery

ScienceReview

Data Grid

storageelement

replica locationservice

storageelement

storageelement

Dat

aT

ran

spo

rt Sto

rage

Reso

urce

Mg

mt

virtualdata

catalogvirtual data

index

virtualdata

catalog

virtualdata

catalog

Computing Grid

workflowplanner

request plannerworkflowexecutor

(DAGman)

request executor(Condor-G,

GRAM)

requestpredictor

(Prophesy)

Grid Monitor

ProductionManager

Researcher

planning

discovery

com

po

sition

sim

ula

tio

n

anal

ysis

sharing

raw d

ata

detector

derivatio

n

Page 44: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

44Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Planned Dataset Model

<FORM <Title…>/FORM>

File Set of files

Relational query or spreadsheet range

XML Element

Set of files with relational index

Object closure

New user-defined dataset type:

Speculative model described in CIDR 2003 paper by Foster, Voeckler, Wilde and Zhao

Page 45: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

45Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Planned Dataset Type ModelFileDataset

File FileSet

MultiFileSet TarFileSetEventCollection

RawEventSet SimulatedEventSet

MonteCarloSimulation

DiscreteEventSimulation

Representational

Logical

(Nonleaf Typesare Superclasses)

Page 46: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

46Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Provenance Server Plans OGSA-based Grid services

– Discovery, security, resource management Supports code and data discovery

and workflow management Object names (TR, DS, TY, DV, IV) can be used as

global cross-server links Derivations can reference remote transformations

and datasets Structured object namespaces & object-level access

control enable large VO collaboration Generalize transforms to describe service calls,

database queries and language interpreters

Page 47: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

47Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

CollaborationVDS

TR

TR

TR

DV

TR

DV

DV

DV

DV

DV

Group VDS

PersonalVDS

PersonalVDS

DS

DSDS

Provenance Hyperlinks

Page 48: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

48Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

Indexing Serversto Support Discovery

Collaboration-wideindex

Collaboration-levelindex

Group Index

PersonalIndex

PersonalIndex

PersonalIndex

CollaborationVDS

TR

TR

TR

DV

TR

DV

DV

DV

DV

DV

Group VDS

PersonalVDS

PersonalVDS

DS

DSDS

Page 49: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

49Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

For Information and Software Virtual Data System

– www.griphyn.org/chimera - Chimera Virtual Data System: Overview, papers, software

Grids and Grid Software– www.ivdgl.org/grid2003 - Using Grid3– www.griphyn.org/vdt - Virtual Data Toolkit– www.globus.org – The Globus Toolkit– www.cs.wisc.edu/condor - The Condor Project– www.ppdg.net – Particle Physics Data Grid

Page 50: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

50Summer Grid 2004 www.griphyn.org/chimera 24 June, UTB/SPI

AcknowledgementsGriPhyN, iVDGL, and QuarkNet

(in part) are supported by the National Science Foundation

The Globus Alliance, PPDG, and QuarkNet are supported in part by the US Department of

Energy, Office of Science; by the NASA Information Power Grid program; and by IBM