View
221
Download
3
Tags:
Embed Size (px)
Citation preview
Object Fusion in Geographic Information Systems
Catriel Beeri, Yaron Kanza,
Eliyahu Safra, Yehoshua Sagiv
Hebrew University
Jerusalem Israel
The Goal: Fusing Objects that Represent the Same Real-World Entity
Example: three data sources that provide information about hotels in Tel-AvivMAPI: the survey of Israel
MAPA: commercial corporation
MUNI: The municipally of Tel-Aviv
The Goal: Fusing Objects that Represent the Same Real-World Entity
Each data source provides data that the other sources do not provide
Hotel RankIs there a nearby parking lot?
polygon
points
MAPI: cadastral and building information
MAPA: tourist information
MUNI: Municipal information
The Goal: Fusing Objects that Represent the Same Real-World Entity
Object fusion enables us to utilize the different perspectives of the data sources
MAPI: cadastral and building information
MAPA: tourist information
Radison MoriaMUNI: Municipal information
Why Are Locations Used for Fusion?
• There are no global keys to identify objects that should be fused
• Names cannot be used– Change often
– May be missing
– May be in different languages
• It seems that locations are keys: – Each spatial object includes location attributes
– In a “perfect world,” two objects that represent the same entity have the same location
Why is it Difficult to use Locations?
• In real maps,
locations are inaccurate• The map on the left is an overlay
of the three data sources about hotels in Tel-Aviv
For example, the Basel Hotel has three different locations, in the three data sources
Inaccuracy Difficult to Use Locations
• It is difficult to distinguish between: 1. A pair of objects that represent close entities
2. A pair of objects that represent the same entity
• Partial coverage complicates the problem
+
+
1 a 2
?
Fusion methods
Assumptions
• There are only two data sources
• Each data source has at most one object for each real-world entity – i.e., the matching is one-to-one
Corresponding Objects
• Objects from two distinct sources that represent the same real-world entity
Fusion Sets
• A fusion algorithm creates two types of fusion sets:
– A set with a single object
– A set with a pair of objects – one from each data source +
+
Confidence
• Our methods are heuristics may produce incorrect fusion sets
• A confidence value between 0 and 1 is attached to each fusion set
• It indicates the degree of certainty in the correctness of the fusion set
+
+ Fusion sets with high confidence
Fusion sets with low confidence
The Mutually-Nearest Method
• The result includes– All mutually-nearest pairs– All singletons, when an object is not part of pair
Fusion setsinput Finding nearest objects
nearest
nearest
nearest
1 a 2 1 a 2 1 a 2
The Probabilistic Method
+ Confidence – the probability of the mutual choice
A threshold value is used to discard fusion sets with low confidence
• An object from one dataset has a probability of choosing an object from the other dataset
• The probability is inversely proportional to the distance
Confidence – the probability that
the object is not chosen by any +
Mutual Influences Between Probabilities
Case II: we expect
Case I:
1 a 2
b
1 a 2
1 a 2b
1 a 2
0.3 0.2
0.050.8
The Normalized-Weights Method
Normalization
captures mutual
influence
Iteration
brings to
equilibrium
Results are superior to those of the previous two methods (at a cost of only a small increase in the computation time)
Measuring the Quality of the Result
||
||
result thein sets all #
result thein setscorrect #
R
CPrecision
EEntities in the world
RFusion sets in
the result
CCorrect
fusion setsin the result
||
||
entities #
result thein setscorrect #
E
CRecall
A Case Study: Hotels in Tel-Aviv
The traditional nearest neighbor
(Best results)
Mutually nearest
Proba-bilistic method
Normal-ized weights method
Recall0.480.770.800.85
Precision
0.560.850.800.90
All three methods perform much better than the nearest-neighbor method
Our three methodsState of the art
Extensive tests on synthesized data are
described in the paper
Conclusions
The novelty of our approach is in developing efficient
methods that find fusion sets with high recall and
precision, using only location of objects.
You are invited to visit our poster
And our web site
http://gis.cs.huji.ac.il/
Thank you!Thank you!