View
41
Download
0
Category
Tags:
Preview:
DESCRIPTION
Ed Barkmeyer, NIST Ontolog Forum, April, 2007. Information Models as a Basis for Ontologies. Outline. Overview of information modeling Features of “information modeling” Comparison to features of OWL Information modeling methodology Conclusions. History. Linked record models (1968) - PowerPoint PPT Presentation
Citation preview
04/12/07 Next Generation Info Models 1
Information Models
as a Basis for Ontologies
Ed Barkmeyer, NIST
Ontolog Forum, April, 2007
04/12/07 Next Generation Info Models 2
Outline
• Overview of information modeling• Features of “information modeling”• Comparison to features of OWL• Information modeling methodology• Conclusions
04/12/07 Next Generation Info Models 3
History
• Linked record models (1968)– CODASYL standard (1974), Navigational Data Model (1980)
• E.F.Codd: Relational Algebra (1970)• Peter Chen: Entity Attribute Relationship Models (1976)• ISO TR 8002: 1984
the Conceptual schema and the information base • 1980s information modeling technologies
– IDEF1-X, SDM, NIAM/ORM, SSADM, EXPRESS, etc.
• 1990s object modeling technologies (UML)• Frame-based logics (1975-1995)• Description logics (1985-present): DAML, OWL
04/12/07 Next Generation Info Models 4
Differences in Nature
• Navigational and relational models – relate data to data– relational normal forms model functions of keys
• Information models – relate things (entities) to other things – relate things to information about them– use classifiers to collect properties
• Ontologies– relate things to things– relate things to information about them– use information to classify things
04/12/07 Next Generation Info Models 5
Differences in Purpose
• Data models– support software implementations of business processes– organize information for access– describe instances
• Information models– support sets of business processes– organize information for comprehension– support design of databases and messages– use classifications to describe instances
• Ontologies– support retrieval of information using inferencing– organize information for relevance– describe subjects and categories by classifications
04/12/07 Next Generation Info Models 6
Differences in Concept
• Information models– universe is things used by the business processes– classification/axioms are as used by the business
business rules, not accepted scientific truth– distinguish conceptual schema
= invariants, quantified assertionsfrom the information base
= current assertions about individual things
• Ontologies– universe is all things that may be encountered in a domain– classification/axioms are accepted truth in the domain– primarily quantified assertions with a few ground facts– distinguished from an information base for some practical uses
04/12/07 Next Generation Info Models 7
Common Ideas
• Universe is a set of things of interest• Classification enables understanding of the universe• Axioms (invariants, necessities)
but with a different concept of truth• Ground facts = axiomatic truths about instances
• conceptual schema is “nearly monotonic”current/transient facts restricted to the information base
04/12/07 Next Generation Info Models 8
Outline
• Overview of information modeling• Features of “information modeling”• Comparison to features of OWL• Information modeling methodology• Conclusions
04/12/07 Next Generation Info Models 9
Information Modeling: Classifiers
• Entity type classifies things in the universe– a template for capturing (current) information about things– a model of the state of a thing– identity is distinct from state– domain of properties
• Value type classifies information about things– instance is an information unit, a data element– can be a structure of component data elements– identity is state (state is invariant)– only range of properties (its properties proceed from its identity)
• Data type represents Value types– instance is a computational data value
04/12/07 Next Generation Info Models 10
Information modeling: Subtypes
• Subtype relationships among classifiers– S is a subtype (subclass) of E iff
every s in class S is also an instance of E
– multiple supertypes: S is a subtype of E1, ..., En
• Exclusion relationships– if t is an instance of E then t is not an instance of D
• Covering relationships– E is covered by S1, ..., Sn iff e in E implies
there exists at least 1 k such that e is in Sk
– Mutually exclusive coverings are “partitions”– “abstract type” = a type that is covered by some set of subtypes
04/12/07 Next Generation Info Models 11
Information Modeling: Class definition
• Union (“choice”, “select”) types– Class E is the union of classes F and G and ...
E(x) == F(x) OR G(x)– Union types are “abstract” by construction
• Intersection– Class E is the intersection of classes F and G
E(x) == F(x) AND G(x)
• Relative complement – if S is a subtype of E, C is the relative complement iff C = E – S
04/12/07 Next Generation Info Models 12
Classification
• Entity classes can represent roles or states of things– no notion of intrinsic properties– models contain intrinsic classifiers, e.g., maximal superclasses
but languages don’t identify them
• A thing can be an instance of multiple entity types– the entity types need not be explicitly related
• Default relationship among subtypes is “overlaps”– a thing can be instance of both
• A thing can change classification over time– thing is instance of class is just part of the state of thing
• Most of these concepts not supported by object models
04/12/07 Next Generation Info Models 13
Aside: Value Types
• Value type = conceptual classifier for information unit• Categories
– name (referencer, supports equal/unequal)• enumerated lists
• codes/identifiers taken from registries
• strings intended to identify things
– quantity• includes numbers and values with “dimensions”
– quantitative name (names that support quantitative operations)• ordinal, date, time, time period, temperature, etc.
– truth value– text (structured and unstructured)
• a body of information interpreted by a specific agent
04/12/07 Next Generation Info Models 14
Information Modeling: Properties
• Attributes (data type properties)– domain is entity, range is value
• Relationships (object properties, associations)– domain is entity, range is entity
• Inverse relationship– same relationship, nominal domain and range reversed– different “reading” (spelling of the relationship name)
• Multiplicity/cardinality of attributes and relationships– one entity can have the same property (type)
0, 1, n, unbounded times– distinguish set of the same property from
property whose range is a set
04/12/07 Next Generation Info Models 15
Property domains
• Domain and range of a property must be a single class– Name of a property implicitly qualified by the domain
• Ad hoc supertypes (“union type”)may be created to be domain or range– enumerate the entity types constituting the domain, or– enumerate the entity types constituting the range, or– (rarely) enumerate the value types constituting the range
• Mutable and immutable properties– a property P(e, v) is “mutable” if
the value v associated with a given e may change over time– P(e,v) is “immutable” if P(e,x) implies x=v over all time
04/12/07 Next Generation Info Models 16
Property Relationships
• Property implies property – (there exists v such that P(d,v)) implies
(there exists x such that Q(d,x))
• Property excludes property– (there exists v such that P(d,v)) implies
NOT (there exists x such that Q(d,x))
• Properties P1, ..., Pn cover entity type– For every instance e of E there exists some i such that
there exists v such that Pi(e,v)
04/12/07 Next Generation Info Models 17
Relationship Relationships
• Relationship implies/subsets relationship (pairwise)– P(x,y) implies Q(x,y) – every pair (x,y) that satisfies P also satisfies Q
• Relationship excludes relationship (pairwise)– P(x,y) implies NOT Q(x,y)
• Relationship refines/subtypes relationship– property P is a specialization of property Q– every instance of P is an instance of Q– not just implication
04/12/07 Next Generation Info Models 18
Examples
• Property implies property– x is an officer of ship S implies
there exists officer y such that x reports to y
• Property excludes property– x is employee of G implies NOT x is eligible for prize p
• Relationship implies/subsets relationship (pairwise)– x is an officer of ship S implies x has cabin on S
• Relationship excludes relationship (pairwise)– x is an officer of ship S implies NOT x is passenger on S
• Relationship refines/subtypes relationship– x is captain of ship S refines x is officer of ship S
04/12/07 Next Generation Info Models 19
Qualifying Properties
• Qualifying property– a property whose existence or value determines
membership in a given subtype– existence:
If there exists y such that Q(d,y) then d is an instance of S– value:
If Q(d, ‘red) then d is an instance of S– functional value:
Let y = Q(d); if Greater(y, 1) then d is an instance of S– the domain (D) of property Q must be a supertype of S
Q may be optional (cardinality 0..<something>) on D
04/12/07 Next Generation Info Models 20
Derived Properties
• Derived Property:a property created by “joining” relationships– represented by a “path through the semantic network”
• Example:– vehicle and model are entity types– weight is a value type (a quantity)– attribute: model-has-gross-weight(model, weight)– relationship: vehicle-has-model(vehicle, model)– derived property: vehicle-has-gross-weight(vehicle, weight)
= vehicle.vehicle-has-model[model].model-has-gross-weight[weight]= { (vehicle, weight) : (exists m) (and vehicle-has-model(vehicle,m) model-has-gross-weight(m,weight)) }
04/12/07 Next Generation Info Models 21
Information Modeling: Identifiers
• Identifiers/keys distinguish instances of an entity class– simple key: a property whose inverse is “functional”
• for each v in the range, there exists at most 1 d in the domainsuch that P(d,v)
• almost always an attribute (value type)
– relative uniqueness• property P is unique within property Q • for each p in the range of P and each q in the range of Q,
there exists at most 1 d in the domain such that P(d,p) AND Q(d,q)• p is usually a value, and q is usually an entity such that
for each d there exists exactly 1 q such that Q(d,q)• selection of a key for q gives rise to a “composite key” for d
by “concatenating” (making a tuple of) the keys
– a key property must apply to all things in the class– a given entity class may have multiple identifier/key properties
04/12/07 Next Generation Info Models 22
Dependencies
• Entity type E is “dependent on” property P(e,x) iff(exists e)E(e) implies (exists x)P(e,x)– that is, the e cannot exist unless the x exists– a meta-property of a relationship between instances
• sometimes modeled as “dependent on class X”• in IDEF1-X, E is a “weak entity type” and P “supports” E
– not all “mandatory” properties are dependencies– dependency is an “intrinsic” property– dependency is an invariant property: the x never changes
• Example– course-has-section(course, section) has inverse
section-of-course(section, course)– section is dependent on section-of-course
the section cannot meaningfully exist without the course
04/12/07 Next Generation Info Models 23
Aggregates
• Entity type E “aggregates” property P(e,m) iffevery instance e of E is a “collection” and P(e,m) is the relationship of e to its members– aggregate is a metaproperty of E that is based on P– P is a “logical” or “virtual” “part of” relationship
• Problem: e is only instantaneously a “set”– the identity of e does not change if a member is deleted– no axiom is associated with this metaproperty
• Example:– Entity type Convoy, with property convoy-includes-ship(c,s)
• Convoy aggregates convoy-includes-ship
• by extension, Convoy “is aggregation of” Ship
04/12/07 Next Generation Info Models 24
Composition
• Entity type E “is composed by” properties Pi(e,ci) iff
– each instance e of E is constructed from the ci such that Pi(e,ci)
– each Pi relates an instance e to one (or more) of its components
– for each i, there are n distinct ci such that Pi(e,ci), where n is the minimum cardinality of p(otherwise e is not an instance of E)
– for each ci such that Pi(e,ci), if Pi(x,ci) then x = e(a ci belongs to at most one e)
– some models make the ci dependent on the inverse of Pi
– “composite” is a metaproperty of E that is based on the Pi
– each Pi is a “physical” “part of” relationship
• Example– entity type Book is composed by book-has-chapter(b, c)
04/12/07 Next Generation Info Models 25
Validity Rules
• Validity Rule =arbitrary first-order logic expression involving instances, classifiers and propertiesthat must hold in a “valid” information base
• Languages have limitations on expressibility– instance references– existentials– “special functions”– nature of comparisons
• NOT inferencing rules– cannot conclude x should be classified as an instance of E
conclusion E(x) means invalid information base if NOT E(x)
04/12/07 Next Generation Info Models 26
Aside: Object Modeling
– Ad hoc models of state• properties needed for some set of software applications
• Object is to design software programs
– Object templates (class models)– Attributes, Relationships (associations, pointers)– Superclasses and “inheritance”– Validity rules– ‘Operations’ = actions on the object state
• No real association to process
– No keys, no qualifiers
04/12/07 Next Generation Info Models 27
Some known Issues
• Diverse keys for union types– identity of individuals determined by type and type-specific keys
• Variance of cardinality constraints over time/state– can be stated as validity rules (only)
• Intermediate states (transactions)– validity rules don’t apply while the info base is in transition
during certain times in a process
• Localization of properties– subtype A always has property P, subtypes B and C never do– model property P local to A?– model optional property P to common supertype S,
and use its existence to define (“qualify”) subtype A
04/12/07 Next Generation Info Models 28
Outline
• Overview of information modeling• Features of “information modeling”• Comparison to features of OWL• Information modeling methodology• Conclusions
04/12/07 Next Generation Info Models 29
OWL Features – Classification
• Classification– Entity type Class– Value type Class
• enumeration Y (all values from)
• name N (datatype string)
• text N (datatype string)
• quantities N (numeric datatypes)
• truth values Y
– Data type Y– Multiple classification Y– Default overlap Y– Classification change not applicable
04/12/07 Next Generation Info Models 30
OWL Features – Type relationships
• Type relationships– subtype Y– multiple supertypes Y– exclusion Y– covering Y– relative complement Complement, Difference– choice/union Y– intersection Y
04/12/07 Next Generation Info Models 31
OWL Features -- Properties
• Properties– Attributes Datatype property– Relationships Object property– Inverse Y– Multiplicity/Cardinality Y– Set of property instances Y– Single domain, range Y– Mutable property not applicable
04/12/07 Next Generation Info Models 32
OWL Features -- Metaproperties
• Property relationships– Property implies property Y– Property excludes property Y– Properties cover entity type N?– Relationship implies relationship Y– Relationship excludes relationship Y– Relationship refines relationship N (only implies)
• Derived properties some• Identifiers functional
property• Dependencies N• “Part of”, Aggregates, Composites N
04/12/07 Next Generation Info Models 33
OWL Features – Definitions and Rules
• Qualifying properties Class definition– based on presence Y– based on value equal Y– based on function of value N
• Validity rules N• N Inferencing rules
04/12/07 Next Generation Info Models 34
OWL as Info Modeling Language
• OWL has all the major features• OWL is formally defined
– other information modeling languages have formal models ascribed to them after the fact (not standard interpretations)
• OWL has formal classification inferencing– but it is not much stronger than languages like ORM– not even strong in “datatype reasoning”
• OWL needs:– Identifier/Key metaproperties – identification of individuals– Relative uniqueness rules– Validity rules
04/12/07 Next Generation Info Models 35
Outline
• Overview of information modeling• Features of “information modeling”• Comparison to features of OWL• Information modeling methodology• Conclusions
04/12/07 Next Generation Info Models 36
Information Analysis Approach
• Interview– obtain initial information from the experts
• Formalize– formally capture what the experts said
• Design– reorganize the formal model to provide insight
• Review– walk the experts through the designed model– examine one or more use cases– solicit questions, concerns, variants
• Revise– correct the design to accommodate the clarifications
04/12/07 Next Generation Info Models 37
Information Analysis Method
• Identify the processes to be supported• Identify the principal business classifications of things
used/modified by the processes• Identify the properties of those things
that are used/modified by the processes• Identify types, specializations and generalizations
that collect uses and properties• Determine type-to-type relationships• Associate properties with the classifications• Determine cardinality constraints• Distinguish entity types from value types• Identify the keys for individuals• Specify validity rules
InterviewFormalize
Design
04/12/07 Next Generation Info Models 38
Process Modeling
• Business Process Modeling– Activities and control flows– Decision points and rules– Process decomposition– Data/Message/Material flows– Information as ‘documents’– Languages: BPMN, ARIS, METIS, ...
04/12/07 Next Generation Info Models 39
Binding process to information
• Actions of process on entities– creating an entity instance – creating a relationship instance between entity instances,
usually as a property having a “domain" (or “subject") and a “range" (or “object")
– changing one or more properties of an entity instance or relationship
– destroying an entity instance – destroying a relationship instance – using a property of an entity instance
04/12/07 Next Generation Info Models 40
Relating Process to Info Requirements
• USE defines an information requirement• All other actions define EVENTS
– Process models can/should represent impact of events
• Use and Events can be aggregated or decomposed– Entity/Class level (UML)– Specific instance– Aspect (a collection of properties)– Property
04/12/07 Next Generation Info Models 41
Conclusions
• Emphasis on supported processes as driver– scopes the model in breadth and depth– orthogonal to semantic web concerns
• Model for understanding– model must be meaningful to the domain experts– correct formal interpretation is important– implementation is a separate engineering activity
• OWL language is strong– formal logic basis– almost all known features (necessary and optional)– identifiers are a critical concern– validity rules will be required
Recommended