Upload
scott-collins
View
235
Download
0
Tags:
Embed Size (px)
Citation preview
Chapter 6 - Basic Similarity Topics
Case-based reasoning
Introduction
• Common term in everyday language, where two objects usually are considered similar if they look or sound similar
• Similarity is a core concept within CBR
• From a CBR perspective: «Two problems are similar if they have similar solutions»
• Not as clear defined as the term equality
• Accepted that similarity is subjective and requires approximate rather than exact reasoning
Similarity and case representation
• Similarity measures are defined to compare objects (cases)
• The measures operate on the case representation
• Similarity is the essential function used for retrieval and the link between case representation and retrieval
• Only consider attribute-value case representations and attribute-based similarity measures
The mathematics of similarity• Two influencing factors:
- Fuzzy sets offers a background to model inexact expressions. Do not deal with classical yes-or-no answers, but rather ones that have vague character
- Metrics are used in mathematics whenever approximations (rather than exact solutions) are involved. This make them suitable for modeling similarity
• Similarity measures may inherit and benefit from properties of these two factors. Examples of such properties are symmetry, transitivity, etc.
Two mathematical models of similarity
• Similarity as a relation:
- Qualitative measure comparing different similarities
- Example: two objects are more similar to each other than two other objects
R(x,y,z) ⇔ «x is at least as similar to y as x is to z»
- Allows the definition the nearest neighbour concept
➡ The nearest neighbor of x is the y for which the R-relation above holds for all z
Example of k-NN where k=3
Two mathematical models of similarity
• Similarity as a function:
- Make similarity quantitative by expressing how similar two objects are
- Assigning a number/degree of similarity to pairs of objects
- Def.: A similarity measure for a problem space P is a function
sim: P x P → [0,1]
- Example of similarity functions and how they may be compared
sim (x,y) ≥ sim (x,z) ⇔ «x is at least as similar to y as x to z»
Distances
• Proxy to similarities, both look at the same object from different point of view
• In most situations we can freely choose between distances and similarities
• It is possible to convert between similarities and distances. However, such a transformation may not necessarily conserve the exact numerical similarity/distance values
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarities
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarities
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
‣Measures similarity by counting certain occurrences in the representation➡ Count the number of family members for tax purposes
‣Example: Hamming measures
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarities
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
‣Applicable to attributes with numerical values
‣Arise as variations of Euclidean metrics➡Typically distance functions that represent a travel view
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarities
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
‣The measure counts the number of operations required to transform one object into another
‣Example: Levenshtein distance. Uses insertion, deletion and modification as possible change actions and counts the number of changes required
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarities
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
‣The structure in which the knowledge is presented plays a role, e.g. object-orient representation
‣Refers mainly to attributes that have symbolic attribute values from with the attribute-based structure is built
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarity
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
‣Information and knowledge plays an essential role
‣Often used for texts; considered similar if they provide similar information to the user
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarity
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities
‣Weight the importance of different aspects contributing to similarity
‣Not a type in itself, but rather may rather be used in combination with the other types
Types of similarity measures
• Counting similarities
• Metric similarities
• Transformation similarities
• Structure-oriented similarity
• Information-oriented similarities
• Relevance-oriented similarities
• Dynamic-oriented similarities ‣Consider and compare dynamic processes
Local-global principle of similarity
• Useful when dealing with complex structures
• The principle: Each object is constructed from atomic parts, by some construction process.
• Possible to compare the atomic parts by using local measures, before comparing the more complex structure.
• Determine the influence of each one of the local parts should have on the global measure by assigning weights to each part
• Difficult problem to determine the weights
Virtual attributes• A problem with the local-global principle arises when there are
dependencies between the attributes that influence similarity
• Example: bank loans
Reliability for getting a loan depends on both income and spending
• Assigning weights to independent attributes make little sense
• Introduce additional attributes that reflect the dependencies explicitly
• Such attributes are defined in terms of the given attributes and are called virtual attributes
• Allows simpler similarity measure
Which similarity measure should be used?
• Some influencing factors for the choice are:
- Case representation
- Size of case base
- Efficiency needed for retrieval
- Number of values in the domain of the attributes
• Useful guidelines:
- Try to ensure compatibility between case representation and the similarity measure
- If possible, apply the local-global principle for complex structures
Summary• Link between case representation and retrieval
• There is no clear definition of the concept and there exists a variety of different types of measures
• Similarity measures are heavily influenced by mathematics. Two mathematical ways to represent similarity is as a function or as a relation
• The local-global principle may also apply to similarity measures
• What type of similarity measure that should be used depends on the objects to be compared
Comments
• Few comparisons, missing an overview of the differences between the different types of similarity measures
- Mainly descriptive presentation, making it difficult to distinguish between the different measures
• What that the implications of choosing one type of measure over the other
- In a later chapter?