1
The Model based Similarity Metric Lionel Gueguen + , Mihai Datcu *+ + GET-T´ el´ ecom Paris, 46 rue Barrault, 75013 Paris, France * German Aerospace Center DLR, Oberpfaffenhofen, D-82234 Wessling, Germany This paper addresses the problem of building a compact representation of objects which enables to define a similarity measure. The key idea is to provide a mean to extract and represent the relevant information of an object. By integrating the Mini- mum Length Description to an informational similarity metric based on Kolmogorov complexity [1], we propose a new informational measure based on models. First, we make the strong assumption that for any object x there exists a model M x such that K (x)= K (x |M x )+ K (M x ). So, the Li’s similarity metric between two objects x and y is reformulated, in the case when K (x) K (y), by: d(x, y)= α K (x, y |M x,y ) - K (x |M x ) K (y |M y ) + (1 - α) K (M x,y ) - K (M x ) K (M y ) (1) α = K (y |M y ) K (y |M y )+ K (M y ) = K (y |M y ) K (y) (2) One can notice that the similarity metric is divided in two parts as in the MDL principle. The right part measures the similarity between objects expressed in their models, while the second one measures the similarity between models. Moreover, when α goes to 0, the similarity metric is more based on the models complexity. On the contrary, when α tends to 1, the similarity metric takes into account mostly the similarity between the randomness of objects. The object y is highly random when α 1, while it is highly deterministic when α 0. Consequently, it appears natural that the similarity metric takes into account the randomness of an object to compare it to others. Secondly, we make the assumption that an object y is independent of the randomness of an object x such that K (y | x)= K (y |{x |M x , M x })= K (y |M x ). Thus, if we consider y | M x as a new object z with its associated model M z , the similarity metric becomes: δ (x, y)= α K (z |M z ) K (y |M y ) + (1 - α) K (M z ) K (M y ) (3) In conclusion, representing an object by its model enables to compare it to another using the proposed similarity. In addition, the model constitutes a compact repre- sentation of objects. References [1] Ming Li, Xin Chen, Xin Li, Bin Ma, and P.M.B Vitanyi, “The Similarity Metric,” IEEE Transactions on Information Theory, vol. 50, no. 12, pp. 3250–3264, Dec. 2004. 2007 Data Compression Conference (DCC'07) 0-7695-2791-4/07 $20.00 © 2007

[IEEE 2007 Data Compression Conference (DCC'07) - Snowbird, UT, USA (2007.03.27-2007.03.29)] 2007 Data Compression Conference (DCC'07) - The Model based Similarity Metric

  • Upload
    mihai

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2007 Data Compression Conference (DCC'07) - Snowbird, UT, USA (2007.03.27-2007.03.29)] 2007 Data Compression Conference (DCC'07) - The Model based Similarity Metric

The Model based Similarity Metric

Lionel Gueguen+, Mihai Datcu∗+

+GET-Telecom Paris, 46 rue Barrault, 75013 Paris, France∗German Aerospace Center DLR, Oberpfaffenhofen, D-82234 Wessling, Germany

This paper addresses the problem of building a compact representation of objectswhich enables to define a similarity measure. The key idea is to provide a mean toextract and represent the relevant information of an object. By integrating the Mini-mum Length Description to an informational similarity metric based on Kolmogorovcomplexity [1], we propose a new informational measure based on models. First, wemake the strong assumption that for any object x there exists a model Mx such thatK(x) = K(x | Mx) + K(Mx). So, the Li’s similarity metric between two objects x

and y is reformulated, in the case when K(x) ≤ K(y), by:

d(x, y) = αK(x, y | Mx,y) − K(x | Mx)

K(y | My)+ (1 − α)

K(Mx,y) − K(Mx)

K(My)(1)

α =K(y | My)

K(y | My) + K(My)=

K(y | My)

K(y)(2)

One can notice that the similarity metric is divided in two parts as in the MDLprinciple. The right part measures the similarity between objects expressed in theirmodels, while the second one measures the similarity between models. Moreover,when α goes to 0, the similarity metric is more based on the models complexity. Onthe contrary, when α tends to 1, the similarity metric takes into account mostly thesimilarity between the randomness of objects. The object y is highly random whenα ≈ 1, while it is highly deterministic when α ≈ 0. Consequently, it appears naturalthat the similarity metric takes into account the randomness of an object to compareit to others. Secondly, we make the assumption that an object y is independent of therandomness of an object x such that K(y | x) = K(y | {x | Mx,Mx}) = K(y | Mx).Thus, if we consider y | Mx as a new object z with its associated model Mz, thesimilarity metric becomes:

δ(x, y) = αK(z | Mz)

K(y | My)+ (1 − α)

K(Mz)

K(My)(3)

In conclusion, representing an object by its model enables to compare it to anotherusing the proposed similarity. In addition, the model constitutes a compact repre-sentation of objects.

References

[1] Ming Li, Xin Chen, Xin Li, Bin Ma, and P.M.B Vitanyi, “The Similarity Metric,” IEEE Transactionson Information Theory, vol. 50, no. 12, pp. 3250–3264, Dec. 2004.

2007 Data Compression Conference (DCC'07)0-7695-2791-4/07 $20.00 © 2007