Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval

NonnegativeNonnegative Shared Subspace Shared Subspace Learning andLearning and

Its Application to Social Media Its Application to Social Media RetrievalRetrieval

Sunil Kumar GuptaSunil Kumar Gupta, Dinh Phung, Brett Adams, Tran The Truyen, Svetha Venkatesh, Dinh Phung, Brett Adams, Tran The Truyen, Svetha VenkateshInstitute for Multi-sensor Processing & Content Analysis (IMPCA) Institute for Multi-sensor Processing & Content Analysis (IMPCA)

Curtin University of Technology, Perth, AustraliaCurtin University of Technology, Perth, Australia

KDD 2010, Washington DCKDD 2010, Washington DC2828thth July, 2010 July, 2010

OutlineOutline

IntroductionIntroduction MotivationMotivation Shared Subspace LearningShared Subspace Learning Social Media RetrievalSocial Media Retrieval Experimental ResultsExperimental Results ConclusionConclusion

IntroductionIntroduction Social tags have the potential to improve Social tags have the potential to improve search,search, personal organizationpersonal organization

and have been instrumental in the rising popularity of social sharing sites and have been instrumental in the rising popularity of social sharing sites such as such as Del.icio.us, Flickr and YouTubeDel.icio.us, Flickr and YouTube..

However, these tags are often very subjective, ambiguous and incomplete However, these tags are often very subjective, ambiguous and incomplete [17, 14] due to the lack of constraints during their creation.[17, 14] due to the lack of constraints during their creation.

The tag quality should be improved for better retrieval performance.The tag quality should be improved for better retrieval performance.

ProblemProblemAim

To improve tag-based search performance in social media by transferring knowledge across related auxiliary sources.

Motivation

Tags in some tagging systems are cleaner.

Why? Because they are created with controlled vocabulary for different purpose (e.g. object detection)

Can we do “knowledge-transfer” from these cleaner tagging systems to improve search in noisy tagging systems?

Flickr image and tags LabelMe image and tagstreebuildingpersonwomantreebenchwindowroofsidewalkroadskycloud

hawaiimauihdr

Marlow et al.[17] study user tagging behaviourMarlow et al.[17] study user tagging behaviour

Li et al. [14,15] present a method to learn tag relevanceLi et al. [14,15] present a method to learn tag relevance

Wang et al. [24] do content based processing and fuse with text-based retrieval resultsWang et al. [24] do content based processing and fuse with text-based retrieval results

Related works

Related WorksRelated Works

Text Mining : NMFText Mining : NMF NMF aims to factorize a nonnegative data matrix X as NMF aims to factorize a nonnegative data matrix X as

NMF is widely used in text mining applications due to its ability to find part-based NMF is widely used in text mining applications due to its ability to find part-based and intuitive representation.and intuitive representation.

0,0, HFFHX

NMR

matrixNRH

matrixRMF

matrixNMX

,min

and usually,

where

Nonnegative Shared Nonnegative Shared Subspace Learning (JSNMF)Subspace Learning (JSNMF)

Let us represent the two datasets by X, Y with dimension MxNLet us represent the two datasets by X, Y with dimension MxN11 and MxN and MxN22 respectively and write the respectively and write the

decomposition as :decomposition as :

GLLVWY

FHHUWX

G

F

|

|

Optimize the cost function

WU V

LabelMe Flickr

2

2

2

2

0,,,,

||min

F

F

F

F

LHVUW Y

LVWY

X

HUWX

Illustration of NMF and Illustration of NMF and JS-NMFJS-NMF

Consider toy datasets X1 (shown in red) and X2 (shown in blue) each having 2 clusters

Apply standard NMF to determine 2 basis vectors for each data

Treat both data similar by augmenting them together and use NMF with K = 3

Use JSNMF framework with one shared vector

Individual Basis Vectors

Common Basis Vector

Social Media RetrievalSocial Media RetrievalHUW ,,

Construct query vector qx

using vocabulary D and SQ

Project qx on the subspace (qh)

Rank the similarities indecreasing order

Query set (SQ)

No. of items (N)

{Retrieved items}

JSNMF based retrieval algorithm

hx qUWq |

Compute cosine similarity betweenquery vector and the items in the

subspace

Vocabulary (D)

ExperimentsExperiments

We created our dataset by crawling metadata for 50000 images We created our dataset by crawling metadata for 50000 images (Flickr)(Flickr), 12000 , 12000 videos videos (YouTube )(YouTube ) and used 7000 images and used 7000 images (LabelMe)(LabelMe). .

To download data, we used a variety of concepts To download data, we used a variety of concepts IndoorIndoor (‘chair’, ‘computer’, ‘cup’, ‘door’, ‘desk’, ‘microwave’) (‘chair’, ‘computer’, ‘cup’, ‘door’, ‘desk’, ‘microwave’) OutdoorOutdoor (‘beach’, ‘boat’, ‘building’, ‘plane’, ‘ship’, ‘sky’, ‘tree’) (‘beach’, ‘boat’, ‘building’, ‘plane’, ‘ship’, ‘sky’, ‘tree’) GenericGeneric (‘book’, ‘car’, ‘pen’, ‘person’, ‘phone’, ‘picture’, ‘window’). (‘book’, ‘car’, ‘pen’, ‘person’, ‘phone’, ‘picture’, ‘window’).

Data collection

Choice of Shared Subspace Choice of Shared Subspace Dimensionality (K)Dimensionality (K)

Find the Find the number of the common featuresnumber of the common features (tags in our case) between the two datasets, say (tags in our case) between the two datasets, say Mxy.Mxy.

Use “the rule of thumb” suggested by [K.V. Mardia et al 1979, Use “the rule of thumb” suggested by [K.V. Mardia et al 1979, Multivariate AnalysisMultivariate Analysis] as] as

2/xyMK

Figure: Sharing Configuration

K

1R2R

W VU

Choice of Shared Subspace Choice of Shared Subspace Dimensionality (K)Dimensionality (K)

Figure: Sharing Configuration

K

1R2R

W VU

Another way to estimate KAnother way to estimate K : supposedly, if subspaces spanned by : supposedly, if subspaces spanned by WW, , UU and and VV are are mutually-orthogonal then mutually-orthogonal then

However, in our case, W, U and V are only approximately mutually-orthogonal, However, in our case, W, U and V are only approximately mutually-orthogonal, suggesting that suggesting that

YXrankK T

YXrankK T

Effect of Shared Subspace Effect of Shared Subspace Dimensionality (K)Dimensionality (K)

Baseline-I : NMF (No sharing)Baseline-II : JSNMF with full-sharing (Lin et al. [16])

BASELINES

DatasetDataset Baseline-IBaseline-I Baseline-IIBaseline-II JSNMFJSNMF

(with LabelMe)(with LabelMe)

FlickrFlickr 50%50% 46%46% 58%58%

YouTubeYouTube 38%38% 36.5%36.5% 48%48%

RESULTS SUMMARY

No Sharing Full Sharing

P@N, MAP and 11-point interpolated precision-recall results

(a) Precision-Scope and MAP resultsfor JSNMF, baseline-I (NMF) and

baseline-II (Fully Shared)

(b) 11-point interpolated precision recall for JSNMF, baseline-I (NMF)

and baseline-II (Fully Shared)

Flickr Retrieval ResultsFlickr Retrieval Results

P@N, MAP and 11-point interpolated precision-recall results

(a) Precision-Scope and MAP resultsfor JSNMF, baseline-I (NMF) and

baseline-II (Fully Shared)

(b) 11-point interpolated precision recall for JSNMF, baseline-I (NMF)

and baseline-II (Fully Shared)

YouTube Retrieval YouTube Retrieval ResultsResults

ConclusionConclusion We presented a novel nonnegative shared subspace learning framework.We presented a novel nonnegative shared subspace learning framework.

We demonstrated its application to improve tag-based image and video retrieval We demonstrated its application to improve tag-based image and video retrieval in Flickr and YouTube respectively.in Flickr and YouTube respectively.

We empirically demonstrated that controlled sharing is crucial to avoid any We empirically demonstrated that controlled sharing is crucial to avoid any negative knowledge-transfernegative knowledge-transfer from auxiliary data sources. from auxiliary data sources.

Our JSNMF framework is generic and can be applied widely to carry out flexible Our JSNMF framework is generic and can be applied widely to carry out flexible knowledge transfer from related data sources.knowledge transfer from related data sources.

ReferencesReferences[1] http://code.google.com/apis/youtube/overview.html. [1] http://code.google.com/apis/youtube/overview.html. Accessed in OctAccessed in Oct, 2009., 2009.[2] http://www.flickr.com/services/api/. [2] http://www.flickr.com/services/api/. Accessed in JulyAccessed in July, 2009., 2009.[3] H.D. Abdulla, M. Polovincak, and V. Snasel. Search Results Clustering using Nonnegative Matrix Factorization (NMF). [3] H.D. Abdulla, M. Polovincak, and V. Snasel. Search Results Clustering using Nonnegative Matrix Factorization (NMF).

ASONAM ’09ASONAM ’09, pages 320–323, July 2009., pages 320–323, July 2009.[4] M.W. Berry and M. Browne. Email Surveillance using Non-negative Matrix Factorization. [4] M.W. Berry and M. Browne. Email Surveillance using Non-negative Matrix Factorization. Computational & Mathematical Computational & Mathematical

Organization TheoryOrganization Theory, 11(3):249–264, 2005., 11(3):249–264, 2005.[5] R. Caruana. Multitask learning. [5] R. Caruana. Multitask learning. Machine LearningMachine Learning, 28(1):41–75,1997., 28(1):41–75,1997.[6] A.P. Dempster, N.M. Laird, D.B. Rubin, et al. Maximum Likelihood from Incomplete Data via the EM Algorithm. [6] A.P. Dempster, N.M. Laird, D.B. Rubin, et al. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Journal of

the Royal Statistical Society. Series B (Methodological)the Royal Statistical Society. Series B (Methodological) , 39(1):1–38, 1977., 39(1):1–38, 1977.[7] L. Fei-Fei, R. Fergus, and P. Perona. One-shot Learning of Object Categories. [7] L. Fei-Fei, R. Fergus, and P. Perona. One-shot Learning of Object Categories. PAMIPAMI, 28(4):594–611, 2006., 28(4):594–611, 2006.[8] S.A. Golder and B.A. Huberman. Usage Patterns of Collaborative Tagging Systems. [8] S.A. Golder and B.A. Huberman. Usage Patterns of Collaborative Tagging Systems. Journal of Information ScienceJournal of Information Science, 32(2):198, , 32(2):198,

2006.2006.[9] D.R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical Correlation Analysis: An Overview With Application To Learning [9] D.R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical Correlation Analysis: An Overview With Application To Learning

Methods. Methods. Neural ComputationNeural Computation, 16(12):2639–2664, 2004., 16(12):2639–2664, 2004.[10] P.O. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. [10] P.O. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. The Journal of Machine Learning ResearchThe Journal of Machine Learning Research , ,

5:1457– 1469, 2004.5:1457– 1469, 2004.[11] M.S. Kankanhalli and Y. Rui. Application Potential of Multimedia Information Retrieval. [11] M.S. Kankanhalli and Y. Rui. Application Potential of Multimedia Information Retrieval. Proceedings of the IEEEProceedings of the IEEE, 96(4):712, , 96(4):712,

2008.2008.[12] J.R. Kettenring. Canonical Analysis of Several Sets of Variables. [12] J.R. Kettenring. Canonical Analysis of Several Sets of Variables. BiometrikaBiometrika, 58(3):433–451, 1971., 58(3):433–451, 1971.[13] D.D. Lee and H.S. Seung. Algorithms for Non-negative Matrix Factorization. [13] D.D. Lee and H.S. Seung. Algorithms for Non-negative Matrix Factorization. In Advances in Neural Information ProcessingIn Advances in Neural Information Processing , ,

2000. 2000. [14] X. Li, C. G. M. Snoek, and M.Worring. Learning Social Tag Relevance by Neighbour Voting. [14] X. Li, C. G. M. Snoek, and M.Worring. Learning Social Tag Relevance by Neighbour Voting. IEEE Transactions on IEEE Transactions on

MultimediaMultimedia, in press, 2009., in press, 2009.[15] X. Li, C.G.M. Snoek, and M. Worring. Annotating Images by Harnessing Worldwide User-tagged Photos. [15] X. Li, C.G.M. Snoek, and M. Worring. Annotating Images by Harnessing Worldwide User-tagged Photos. ICASSPICASSP. Taipei, . Taipei,

Taiwan, 2009.Taiwan, 2009.

ReferencesReferences[16] Y.R. Lin, H. Sundaram, M. De Choudhury, and A. Kelliher. Temporal Patterns in Social Media Streams: Theme Discovery [16] Y.R. Lin, H. Sundaram, M. De Choudhury, and A. Kelliher. Temporal Patterns in Social Media Streams: Theme Discovery

And Evolution Using Joint Analysis Of Content and Context. In And Evolution Using Joint Analysis Of Content and Context. In ICME 2009ICME 2009, pages 1456–1459, 2009., pages 1456–1459, 2009.[17] C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, Tagging Paper, Taxonomy, Flickr, Academic Article, Toread. [17] C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, Tagging Paper, Taxonomy, Flickr, Academic Article, Toread.

Proceedings Of The Seventeenth Conference On Hypertext And HypermediaProceedings Of The Seventeenth Conference On Hypertext And Hypermedia , pages 31–40, 2006., pages 31–40, 2006.[18] S.J. Pan and Q. Yang. A Survey on Transfer Learning. [18] S.J. Pan and Q. Yang. A Survey on Transfer Learning. Technical Report HKUST-CS08-08Technical Report HKUST-CS08-08, Department of Computer Science , Department of Computer Science

and Engineering, HKUST, Hong Kong, China, 2008.and Engineering, HKUST, Hong Kong, China, 2008.[19] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng. Self-taught Learning: Transfer Learning from Unlabeled Data. [19] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng. Self-taught Learning: Transfer Learning from Unlabeled Data.

Proceedings of the 24th International Conference on Machine LearningProceedings of the 24th International Conference on Machine Learning , page 766, 2007., page 766, 2007.[20] B.C. Russell, A. Torralba, K.P. Murphy, andW.T. Freeman. Labelme: A Database and Web-based Tool for Image Annotation. [20] B.C. Russell, A. Torralba, K.P. Murphy, andW.T. Freeman. Labelme: A Database and Web-based Tool for Image Annotation.

International Journal of Computer VisionInternational Journal of Computer Vision , 77(1):157–173, 2008., 77(1):157–173, 2008.[21] G. Salton and C. Buckley. Term-weighting Approaches in Automatic Text Retrieval. [21] G. Salton and C. Buckley. Term-weighting Approaches in Automatic Text Retrieval. Information Processing & ManagementInformation Processing & Management , ,

24(5):513–523, 1988.24(5):513–523, 1988.[22] F. Shahnaz, M.W. Berry, V.P. Pauca, and R.J. Plemmons. Document Clustering using Nonnegative Matrix Factorization. [22] F. Shahnaz, M.W. Berry, V.P. Pauca, and R.J. Plemmons. Document Clustering using Nonnegative Matrix Factorization.

Information Processing and ManagementInformation Processing and Management , 42(2):373–386, 2006., 42(2):373–386, 2006.[23] B. Sigurbjörnsson and R. Van Zwol. Flickr Tag Recommendation based on Collective Knowledge. [23] B. Sigurbjörnsson and R. Van Zwol. Flickr Tag Recommendation based on Collective Knowledge. Proceeding of ACM Proceeding of ACM

International World Wide Web ConferenceInternational World Wide Web Conference, 2008., 2008.[24] C. Wang, F. Jing, L. Zhang, and H.J. Zhang. Scalable Search-based Image Annotation. [24] C. Wang, F. Jing, L. Zhang, and H.J. Zhang. Scalable Search-based Image Annotation. Multimedia SystemsMultimedia Systems, 14(4):205–220, , 14(4):205–220,

2008.2008.[25] X. Wang, C. Pal, and A. McCallum. Generalized Component Analysis for Text with Heterogeneous Attributes. [25] X. Wang, C. Pal, and A. McCallum. Generalized Component Analysis for Text with Heterogeneous Attributes. Proceedings of Proceedings of

the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Miningthe 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , page 803, 2007., page 803, 2007.[26] L. Wu, L. Yang, N. Yu, and X.S. Hua. Learning to Tag. [26] L. Wu, L. Yang, N. Yu, and X.S. Hua. Learning to Tag. Proceedings of the 18th International Conference on World Wide Proceedings of the 18th International Conference on World Wide

WebWeb, pages 361–370, 2009., pages 361–370, 2009.[27] Z.Wu, C.W. Cheng, and C. Li. Social and Semantics Analysis via Nonnegative Matrix Factorization. [27] Z.Wu, C.W. Cheng, and C. Li. Social and Semantics Analysis via Nonnegative Matrix Factorization. Proceedings of the 17th Proceedings of the 17th

International Conference on World Wide WebInternational Conference on World Wide Web , 2008., 2008.

Documents

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval