Upload
barry-ramsey
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Searching within Large Grid Searching within Large Grid InfrastructuresInfrastructures
Marios D. DikaiakosUniversity of Cyprus & CoreGRID
Slide 2
AcknowledgementsAcknowledgements
• Wei Xing, University of Cyprus• Rizos Sakellariou, U. Manchester, UK• Yannis Ioannidis, U. Athens, GR• Salvatore Orlando, ISTI-CNR, IT• Domenico Laforenza, ISTI-CNR, IT
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Slide 3
OutlineOutline
• Context and Motivation• Limitations of Grid Information Services• Semantic Grid and Ontologies• A Core Grid Ontology• Conclusions and Future Work
Slide 4
The GridThe Grid
• A wide-scale, distributed computing infrastructure to support resource sharing and coordinated problem solving in dynamic, multi-institutional Virtual Organizations.– Computational Grid: Provides the raw computing power,
high speed bandwidth interconnection and associate data storage.
– Data & Information Grid: Allows easily accessible connections to major sources of information and tools for its analysis and visualisation.
– Knowledge & Semantic grid: Gives added value to the information; provides intelligent guidance for decision-makers; facilitates the generation, diffusion and support of knowledge.
Slide 5
Near-future Scenarios for the GridNear-future Scenarios for the Grid
Slide 6
Near-future Scenarios for the GridNear-future Scenarios for the Grid
• The Grid as a Wide-Scale Distributed System:– Millions of resources of different kinds.– Services and Policies in place.– Relationships (permanent and transient) between
organizations, software, data, services, applications…– Different middleware platforms.– Common (?) protocols, standards and API’s.
• The hope is that Grid will grow larger and will reach an acceptance as wide as the Web.
Slide 7
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Slide 8
Problem Statement: Problem Statement: Searching the Searching the GridGrid
• How are individuals and organizations going to harness the capabilities of a fully deployed Grid, with a massive and ever-expanding base of computing and storage nodes, network resources, and a huge corpus of available programs, services, and data?
• To this end, users need to identify “resources” that are:– Interesting (discovery)– Relevant (classification)– Accessible and available under known policies of use,
cost (inquiry)• Emphasis on “summary” information, in terms of
granularity and timing.
Slide 9
Searching the GridSearching the Grid
• Computing, Storage, Network Resources •Software and Data-sets• Policies • Relationships• Best-practices
Slide 10
Examples of search queries Examples of search queries
• Hardware resources on the Grid, their attributes, and applicable policies of their use:– Find a VO providing exclusive access to a shared-
memory multiprocessor system with at least 16 processors, 8 GB of main memory, and a usage charge of not more than 100 euros per CPU time?
• Application services, software, and data-sets:– Find services running Quantum Chromo-Dynamics
calculations (QCD) using F90 and MPI.
• Hardware-software combinations, Grid usage and best-practices:– Find the pricing and prior clientele of Grid services that
provide access to the XYZ workflow for high-performance oil refinery simulations.
Slide 11
OutlineOutline
• Context and Motivation• Grid Information Services and
Limitations • Semantic Grid and Ontologies• A Core Grid Ontology• Conclusions and Future Work
Slide 12
Grid Information ServicesGrid Information Services
• Established to help users answer questions on the status of individual resources and the Grid.
• Support the discovery and ongoing monitoring of the existence and characteristics of resources, services, computations and other entities of value to the Grid.
• Examples:– GLOBUS, EDG: Metacomputing Directory Service (MDS)– UNICORE Gateway and Network Job Supervisor (NJS)– EGEE: Relational Grid Monitoring Architecture (R-GMA),
GridICE– Condor Matchmaker
Slide 13
MDS: Grid Info Services in GlobusMDS: Grid Info Services in Globus
Resources
GIIS GIIS
GRRP
Users
GRIP
GRIS
LDIF
GRIS
“Info. Provider”
LDIF
GRIS
“Info. Providers”
LDIF
GRIS
“Info. Providers”
LDIF
GRRPGRRPGRRP
GRIP
GRIP
GIIS
GIIS
Info. Retrieval
Discovery/Inquiry/Retrieval
GRRPGRRP
“Info. Providers”
Slide 14
Relational Grid Monitoring Relational Grid Monitoring ArchitectureArchitecture
Application
ConsumerAPI
Sensor Code
ProducerAPI
ConsumerServlet
ProducerServlet
Reg
istry AP
I
RegistryService
Slide 15
What information is out there?What information is out there?
Virtual Organizations:• Resources• Policies• People
Software:• Codes• Specs• Location
Data-sets:• Data• Metadata• Replicas
Services:• Interface• Metadata
Applications:• Descriptions.• I/O requirements.• Meta-Data• Worklfows
Summary & Statistics• Logs.• Associations.• Statistics of use.
Resource Specifications:• Descriptions & Types• Names• Capacity• Configuration
Resource status• Resource use.• Availability.• Monitoring data.
Slide 16
Resource Specification info. Resource Specification info. (examples)(examples)Source Information provided Schema System
Info. Provider(Unix sys-call)
Mds-computer-platformMds-Cpu-modelMds-Host-hn
Hierarchical MDS-GlobusLDAP
Info. Provider (Unix sys-call)
Static info.
GlueCENameGlueHostNameGlueHostArchitectureGlueHostProcessorClockSpeedGlueSEAccessProtocolTypeGlueCESEBindGroupGlueHostFileLatency
Hierarchical MDS-EDGLDAP
Sensors(Unix sys call)
StorageElementProtocolNetworkTCPThroughputNetworkRTT
Relational RGMA-EDGHTTP
Slide 17
Resource status information Resource status information (examples)(examples)
Source Information provided Schema System
Info. Provider(Unix sys-call)
Mds-Memory-Ram-freeMBMds-FS-Total-freeMBcpuload5
Hierarchical MDS-GlobusLDAP
Info. Provider(Unix sys-call)
GlueCEStateRunningJobsGlueCEJobLocalIDGlueHostProcessorLoadLast1Min
Hierarchical MDS-EDGLDAP
Sensors(Unix sys call)
StorageElementStatusNetworkUDPPacketLoss NetworkFileTransferThroughput
Relational RGMA-EDGHTTP
Condor’s Sensor modules
DiskSpace MemoryUsed SystemLoad
ClassAds HawkeyeCondor
NWS probesTraceroute
End-to-end bandwidthEnd-to-end latencyEnd-to-end path
XML GridLab’s TopoMonGMA arch.
Slide 18
VO information (examples)VO information (examples)
Source Information provided Schema System
Static info.
Cert (info. About local certificate policy)MdsHostContact
Hierarchical MDS-GlobusLDAP
Static info.
GlueCEPolicyMaxWallClockTimeGlueCEPolicyMaxCPUTimeGlueSAPolicyMaxFileSize
Hierarchical MDS-EDGLDAP
Slide 19
Software & Dataset information (examples)Software & Dataset information (examples)
Source Information provided Schema System
Info. Provider Mds-Application-Group-configMds-Application-nameMds-Application-locationMds-Application-info
Hierarchical MDS-GlobusLDAP
Info. Provider GlueSLFileNameGlueSLFileSizeGlueSLFilePath
Hierarchical MDS-EDGLDAP
GDMP producer
ExportCatalogue RGMA Replica Catalogue ServiceGDMP-EDG
Slide 20
Application & Logging InformationApplication & Logging Information
Source Information provided Schema System
TRIANA Worklow information & Metadata
XML TRIANA - GridLab
Condor submission
DAGMan input file (DAG specification and metadata)
Condor-specific Condor meta-scheduler
Workload Management System
BrokerInfo file Hierarchical Resource Broker (EDG)LDAP
LDAP queries to JSS, RB.
Logging informationBookkeeping information (transient)UserID, JobID, Job State, JobDescription, etc
Attribute=value LB Server (EDG)Events, exported API for queries
Slide 21
Limitations of Current ApproachesLimitations of Current Approaches
• Remarks extracted from the description of a Grid-application development effort:– “Jobs typically need to access hundreds of files, and
each site has a different subset of the files.”– “Our data system knows what portion of a user's
data may be at each site, but does not know how to submit grid jobs.”
– “Our job submission system required users to choose grid sites and gave them no assistance in choosing.”
– “…jobs requesting thousands of files and sites having hundreds of thousands of files are not uncommon in production.”
– “…it would not be scalable to explicitly publish all the properties of jobs and resources in ...”
Slide 22
Limitations and ChallengesLimitations and Challenges
• Scalability in the context of Millions of Resources:– Infrastructure intrusiveness.– Resource Discovery, Retrieval and Classification.
• Expressiveness of Data Models in terms of:– Types of captured information.– Expressing semantic relationships between represented entities.– Amenability to Indexing, Query Optimization.
• Complexity:– Different protocols for discovery & inquiry, registration,
invocation.– Lack of interoperability between different platforms.– Information Standardization.
• Missing Functionalities:– Transient and Historical information.– Policies.– Complex Queries.
Slide 23
Revisiting the problemRevisiting the problem
• Very large number of sources.• Independent.• No common schema. • Various, partly unknown semantics.• Subject to change, birth, or silence.
Slide 24
Revisiting the problemRevisiting the problem
• A federated warehouse approach:– “Wrap” the various sources to extract their
information.– Store data in a warehouse.– Monitor sources and propagate updates to the
warehouse.– Ask queries to the warehouse.
Slide 25
Requirements for Requirements for Searching the GridSearching the Grid
• Global/Common naming scheme for Grid entities.
• Resolution mechanism for discovery and retrieval of entity-related information/meta-data.
• Type and representation of retrieved entity-related information.
• Mining and representation of relationships and summary data.
• Complexity of queries and query interpretation.
Slide 26
Research IssuesResearch Issues
• Metadata Consolidation: Definition & local creation of metadata about Grid entities.
• Information Source Discovery: Algorithms for Search and Discovery, Management of
Updates.
• Metadata Retrieval and Integration: Protocols for retrieval; Data structures and algorithms for
integration.
• Management of meta-data: Analysis to build proper indexes; Extrapolation of semantic
relationships.
• Query mechanisms and interface. Query language definition. Intelligent-agent interface to
help users formulate queries.
Slide 27
OutlineOutline
• Context and Motivation• Limitations of Grid Information Services• Semantic Grid and Ontologies• A Core Grid Ontology• Conclusions and Future Work
Slide 28
Looking for answers: Semantic GridLooking for answers: Semantic Grid
An extension of the current Grid in which information and services are given well-defined and explicitly represented meaning, so that it can be shared and used by humans and machines, better enabling them to work in cooperation.
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101GGF16, Athens, 2/2005
Slide 29
Ontologies and the Semantic GridOntologies and the Semantic Grid
• Ontologies are among the key building blocks of the Semantic Grid.– The concepts/terms of Grid entities, resources,
capabilities and the relationships between them.
• We develop Grid ontologies to:– Merge the information from different sources;– Build a knowledge base for Grid infrastructures;– Construct a Grid information system;– Support co-operation with semantic-able Grid
services, such as Resource Broker, Information Service, etc.
Slide 30
• An ontology is an engineering artifact: – It is constituted by a specific vocabulary used to
describe a certain reality, plus – a set of explicit assumptions regarding the
intended meaning of the vocabulary. • Almost always including how concepts should be classified
• Thus, an ontology describes a formal specification of a certain domain:– Shared understanding of a domain of interest– Formal and machine manipulable model of a
domain of interest
Ontologies in Computer ScienceOntologies in Computer Science
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101GGF16, Athens, 2/2005
Slide 31
LanguagesLanguages
• Work on Semantic Web has concentrated on the definition of a collection or “stack” of languages. – These languages are then used to support the
representation and use of metadata.• The languages provide basic machinery that can be
used to represent the extra semantic information needed for the Semantic Web– XML– RDF– RDF(S)– OWL– …
OWL
Integration
RDF(S)
RDF
XML
Annotation
Integration
Inference
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101, GGF16, Athens, 2/2005
Slide 32
““W3C” StackW3C” Stack
• XML provides a surface syntax for structured documents
• XML Schema is a language for restricting the structure of XML documents.
• RDF is a data-model for objects ("resources") and relations between them, provides simple semantics for this data-model
• RDF Schema is a vocabulary for describing properties and classes of RDF resources, with semantics for generalization and hierarchies of such properties and classes.
• OWL adds more vocabulary for describing properties and classes.
Slide 33
OutlineOutline
• Context and Motivation• Limitations of Grid Information Services• Semantic Grid and Ontologies• A Core Grid Ontology• Conclusions and Future Work
Slide 34
Towards a general Ontology for Towards a general Ontology for GridsGrids• Currently, there are several Grid architectures and
Grid implementations.• Different views of Grid entities and their properties. • It is practically impossible that one ontology can
include all aspects of Grids or of many types of Grid entities.
• A Core Grid Ontology (CGO):– A core “framework” for representing a Grid.– Open and extensible for all kinds of Grid
architectures and Grid implementations.
Slide 35
Building a Core OntologyBuilding a Core Ontology
• The most difficult task for developing an ontology:– Capture a “right” model for the Grid;
• Our view of a Grid:– Users&Applications+{Middleware/Services}+Resources
within VOs;
• A layer-structured model consisting of three layers:– Users/Applications– Middleware/services– Resources.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.GGF 16, 2/2006
GGF 16, 2/2006
Slide 36
A Grid ModelA Grid Model
Slide 37
CGO Classes OverviewCGO Classes OverviewQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Slide 38
Defining propertiesDefining properties
Based on the Constraints of the CGO Classes.
Slide 39
Representing a Grid EntityRepresenting a Grid Entity
Slide 40
Representing a Grid Entity using Representing a Grid Entity using OWLOWL
<owl:Class rdf:ID="ComputingElement">
<rdfs:subClassOf>
<owl:Restriction>
<owl:someValuesFrom>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Jobmanager"/>
<owl:Class rdf:about="#JobScheduler"/>
</owl:unionOf>
</owl:Class>
</owl:someValuesFrom>
<owl:onProperty rdf:resource="#runningSevice"/>
</owl:Restriction>
</rdfs:subClassOf>
……
Slide 41
Generating InstancesGenerating Instances
Slide 42
ConclusionsConclusions
• The CGO can be used as a common, extensible language for:– Expressing the basic concepts of a Grid
infrastructure and the relationships thereof.– Encoding and storing Grid metadata.– Integrating grid-related information extracted
from different sources.– Expressing queries.
Slide 43
Next stepsNext steps
• Automate the knowledge-base construction and maintenance process:– Information-source discovery– Metadata wrapping– Metadata integration– Consistency updates
• Investigate mechanisms for efficient knowledge-base query implementation.
Slide 44
Thank you for your attention!• Questions? • Comments ?
Slide 45
ReferencesReferences• "A Core Grid Ontology for the Semantic Grid." Wei Xing, M. D.
Dikaiakos, and R. Sakellariou. 6th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), Singapore, May 2006 (to appear).
• "Information Services for Large-scale Grids: A Case for a Grid Search Engine." M. D. Dikaiakos, R. Sakellariou, and Y. Ioannidis. In Engineering the Grid: status and perspectives, Jack Dongarra, Hans Zima, Adolfy Hoisie, Laurence Yang, Beniamino DiMartino (Editors), American Scientific Publishers, January 2006, ISBN: 1-58883-038-1.
• "Building a Distributed Digital Library for Natural Disasters Metadata with Grid Services and RDF." W. Xing, M. D. Dikaiakos, Hua Yang, A. Sphyris, G. Eftychidis. Library Management Journal (Special Issue on Digital Libraries in the Knowledge Era: Knowledge Management and Semantic Web Technology). Vol. 26, No. 4-5, May 2005
• "Search Engines for the Grid: A Research Agenda." M. D. Dikaiakos, Y. Ioannidis, R. Sakellariou. In Grid Computing. First European AcrossGrids Conference, Santiago de Compostela, Spain, February 2003, Revised Papers, Lecture Notes in Computer Science series, vol. 2970, pages 49-58, vol. 2970, Springer, 2004.
Slide 46
The RDF Data ModelThe RDF Data Model• Statements are <subject, predicate, object> triples:
– <Sean,hasColleague,Ian>• Can be represented as a graph:
• Statements describe properties of resources• A resource is any object that can be pointed to by a URI:
– The generic set of all names/addresses that are short strings that refer to resources
– a document, a picture, a paragraph on the Web, http://www.cs.man.ac.uk/index.html, a book in the library, a real person (?), isbn://0141184280
• Properties themselves are also resources (URIs)
Sean IanhasColleague
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101GGF16, Athens, 2/2005
Slide 47
Linking StatementsLinking Statements• The subject of one statement can be the object of another• Such collections of statements form a directed, labeled graph
• The object of a triple can also be a “literal” (a string)
Sean IanhasColleague
Carole http://www.cs.man.ac.uk/~horrocks
hasColleaguehasHomePage
“Sean K. Bechhofer”hasName
Slide 48
RDF SyntaxRDF Syntax
• RDF has an XML syntax that has a specific meaning:
• Every Description element describes a resource• Every attribute or nested element inside a
Description is a property of that Resource• We can refer to resources by URIs
<rdf:Description rdf:about="some.uri/person/sean_bechhofer">
<o:hasColleague resource="some.uri/person/ian_horrocks"/>
<o:hasName rdf:datatype="&xsd;string">Sean K. Bechhofer</o:hasName>
</rdf:Description>
<rdf:Description rdf:about="some.uri/person/ian_horrocks">
<o:hasHomePage>http://www.cs.mam.ac.uk/~horrocks</o:hasHomePage>
</rdf:Description>
<rdf:Description rdf:about="some.uri/person/carole_goble">
<o:hasColleague resource="some.uri/person/ian_horrocks"/>
</rdf:Description>
Slide 49
What does RDF give us?What does RDF give us?
• A mechanism for annotating data and resources.• Single (simple) data model.• Syntactic consistency between names (URIs). • Low level integration of data.
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101GGF16, Athens, 2/2005
Slide 50
RDF(S): RDF SchemaRDF(S): RDF Schema
• RDF gives a formalism for meta data annotation, and a way to write it down in XML, but it does not give any special meaning to vocabulary such as subClassOf or type (supporting OO-style modelling)– Interpretation is an arbitrary binary relation
• RDF Schema extends RDF with a schema vocabulary that allows you to define basic vocabulary terms and the relations between those terms– Class, type, subClassOf, – Property, subPropertyOf, range, domain
– it gives “extra meaning” to particular RDF predicates and resources
– this “extra meaning”, or semantics, specifies how a term should be interpreted
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101GGF16, Athens, 2/2005
Slide 51
Problems with RDFSProblems with RDFS
• RDFS is too weak to describe resources in sufficient detail– No localised range and domain constraints
• Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants
– No existence/cardinality constraints• Can’t say that all instances of person have a mother that is
also a person, or that persons have exactly 2 parents– No transitive, inverse or symmetrical properties
• Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical
• It can be difficult to provide reasoning support– No “native” reasoners for non-standard semantics– May be possible to reason via FO axiomatisation
Source: Goble, Bechhofer, DeRoure, Semantic Grid 101GGF16, Athens, 2/2005
Slide 52
Web Ontology Language Web Ontology Language RequirementsRequirements• Desirable features identified for Web Ontology
Language:
• Extends existing Web standards – Such as XML, RDF, RDFS
• Easy to understand and use– Should be based on familiar KR idioms (e.g. OO-style,
frames etc).
• Formally specified
• Of “adequate” expressive power
• Possible to provide automated reasoning support
Slide 53
OWLOWL
• W3C Recommendation (February 2004) • Well defined RDF/XML serializations• A family of Languages
– OWL Full– OWL DL– OWL Lite
• Formal semantics– First Order (DL/Lite)– Relationship with RDF
• Comprehensive test cases for tools/implementations
• Growing industrial takeup.
Slide 54
OWL BasicsOWL Basics
• Set of constructors for concept expressions– Booleans: and/or/not– Quantification: some/all
• Axioms for expressing constraints– Necessary and Sufficient conditions on classes– Disjointness– Property characteristics: transitivity, inverse
• Facts– Assertions about individuals
Slide 55
Metacomputing Directory Service Metacomputing Directory Service (MDS)(MDS)
• Distributed Directory approach: collection of LDAP servers.
• Simple LDAP Information Schemas describe resource information.
• Servers:– Grid Resource Information Server (GRIS): Running on
each resource and supplying information about it. Supports multiple resources as well.
– Grid Index Information Server (GIIS): Collect information from multiple GRIS servers. Support particular queries for information spread across multiple GRIS servers.
• Protocols (LDAP based) for:– Discovery and Inquiry (GRIP).– “Soft-state” Registration (GRRP).