32
Computing & Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases PhD Research Proficiency Exam Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http:// www.kddresearch.org http://www.cis.ksu.edu/~xiajing Social Network Analysis using Link Mining

PhD Research Proficiency Exam

  • Upload
    sven

  • View
    53

  • Download
    1

Embed Size (px)

DESCRIPTION

PhD Research Proficiency Exam. Social Network Analysis using Link Mining. Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~xiajing. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

PhD Research Proficiency Exam

Jing XiaLaboratory for Knowledge Discovery in Databases

Department of Computing and Information Sciences

Kansas State University

http://www.kddresearch.org

http://www.cis.ksu.edu/~xiajing

Social Network Analysis using Link Mining

Page 2: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network Introduction

Networks in Biological System

Mining on Social NetworkLinking MiningMulti Relational Mining

Problem Specification

Proposed approach

OutlineOutline

Page 3: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network IntroductionSocial Network Introduction

What is Social Network?a social net work is a heterogeneous and

multirelational data set represented by a graph.

Characteristics of Social Network“Natural” Networks and UniversalityQuantitative measures

Mining Social NetworkLink Mining: Tasks and Challenges

Page 4: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Society Society

Nodes: individuals

Links: social relationship(family/work/friendship/etc.)

S. Milgram (1967) “natural” network appears to be a universal Six Degrees of Separation

Society networks: Many individuals with diverse social interactions between them.

2023年4月20日 Data Mining: Concepts and Techniques4

Page 5: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

CommunicationCommunication

The Earth is developing an electronic system, a network with diverse nodes and links are

-computers

-routers

-satellites

-phone lines

-TV cables

-EM waves

Communication networks: Many non-identical components with diverse connections between them.

Page 6: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

EpidemiologyEpidemiology

Nodes: doctors, patients, geological location

Links: contact relationship(direct/indirect infectiousness)

Page 7: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Characteristics of Social NetworkCharacteristics of Social Network

Consider many kinds of networks:social, technological, business, economic, content,…

These networks tend to share certain informal properties:Multi relational interactionTemporal (time-evolving)large scale; continual growthdistributed, organic growth: vertices “decide” who to

link tomixture of local and long-distance connectionsabstract notions of distance: geographical, content,

social,…

Page 8: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network TheorySocial Network Theory

Do natural networks share more quantitative universals?

What would these “universals” be?How can we make them precise and measure them?How can we explain their universality?

This is the domain of social network theorySometimes also referred to as link analysis

Page 9: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Quantitative MeasureQuantitative Measure

Connected components:how many, and how large?

Network diameter:maximum (worst-case) or average?exclude infinite distances? (disconnected

components)the small-world phenomenon

Page 10: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Quantitative MeasureQuantitative Measure

Clustering: to what extent that links tend to cluster “locally”? what is the balance between local and long-

distance connections? what roles do the two types of links play?

Degree distribution: what is the typical degree in the network? what is the overall distribution?

Page 11: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network Introduction

Networks in Biological System

Problem Specification

Mining on Social NetworkLinking MiningMulti Relational Mining

OutlineOutline

Page 12: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Bio-MapBio-Map

Protein-gene interaction

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Page 13: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Protein-Protein Interaction Network

Protein-Protein Interaction Network

protein-protein interactions

PROTEOME

Page 14: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Protein-Protein Interaction Network

Protein-Protein Interaction Network

Nodes: proteins Links: multi relational

physical interactions (binding)complex membershipPathway

P. Uetz, et al. Nature 403,

623-7 (2000).

Page 15: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network Introduction

Networks in Biological System

Mining on Social NetworkLinking MiningMulti Relational Mining

Problem Specification

Proposed approach

OutlineOutline

Page 16: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Link MiningLink Mining

Traditional machine learning and data mining approaches assume: data is flat

Typical real data setInstances in data set form linked networks

Link Mining

Newly emerging research area at the intersection of research in social network and link analysis, hypertext and web mining, graph mining, relational learning and inductive logic programming

Page 17: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Link Mining TasksLink Mining Tasks

Object-Related TasksLink-based object rankingLink-based object classificationObject clustering (group detection)Object identification (entity resolution)

Link-Related TasksLink prediction

Graph-Related TasksSubgraph discoveryGraph classification

Page 18: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Multi-relational Link MiningMulti-relational Link Mining

Traditional link mining assume there is only one kind of relation in the network: link is flat

There exist multiple, heterogeneous social networks, each representing a particular kind of relationshipMulti-relational & heterogeneous

Page 19: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Multi-relational NetworkMulti-relational Network

Multi-relational & heterogeneous NetworkMultiple object and link types

Example NetworkMedical network: patients, doctors, disease,

contacts, treatmentsBibliographic network: years, publications, authors,

venuesEpidemic transmission network (involve temporal

data, multi-relational: airborne, patients’ contacts

Page 20: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network Introduction

Networks in Biological System

Mining on Social NetworkLinking MiningMulti Relational Mining

Problem Specification

Proposed approach

OutlineOutline

Page 21: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Problem SpecificationProblem Specification

Phenomenon: Heterogeneity & Multi-relationship exists in many real network

Rationale: it might be useful for link mining

ProblemCan we utilize multi-relationship to help

link analysisHow to extract relations as relation network (RN)?How to identify relationship among relation

network? (co-relation, independent, etc)Is RN time-evolving? Which relation plays an

important role?

Page 22: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Problem Example1Problem Example1

Application Domain: Epidemic Disease

Pre-condition 1: given multi relations -- patients’ contacts network in timeline

Pre-condition 2: sequential relationship among relations

Pre-condition 3: another medium of disease transmission

Problem: can we predict if any person will be infected, based on mining these multi-relational networks?

Page 23: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Problem Example2Problem Example2

Application Domain: bibliographic network

Pre-condition 1: given multi relations – the co-author relation networks of a conference in some years

Problem 1: what is the relationship among these relation networks

Problem 2: How can we utilize the relationship to meet the user’s query

Mining Hidden Community in Heterogeneous Social Networks, Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han, March, Report No. UIUCDCS-R-2005-2538 UILU-ENG-2005-1731

Page 24: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Problem Example3Problem Example3

Application Domain: bibliographic network

Pre-condition 1: given multi relations – the co-author networks of a conference in some years

Pre-condition 2: topics of publications

Problem: Can we predict if two researchers will be co-author in the future, based on two types of networks?

Page 25: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network Introduction

Networks in Biological System

Mining on Social NetworkLinking MiningMulti Relational Mining

Problem Specification

Proposed approach

OutlineOutline

Page 26: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Proposed approachProposed approach

Random walk with restart

1

4

3

2

5 6

7

9 10

811

12

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

43

2

5 6

7

9 10

811

120.13

0.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

More red, more relevant

Nearby nodes, higher scores

Page 27: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Proposed approachProposed approach

Basic ideaRWR serves as a measure for proximity between

two nodes in networkModel relationship among multi relations using

RWR

PurposeFacilitate mining more interesting patternsIncrease prediction accuracy

Page 28: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Measure Relationship Measure Relationship

ICDM

KDD

SDM

ECML

PKDD

PAKDD

CIKM

DMKD

SIGMOD

ICML

ICDE

0.009

0.011

0.0080.007

0.005

0.005

0.005

0.0040.004

0.004

A: RWR!

Q: what is most related conference to ICDM

Neighborhood Formulation [Sun ICDM2005]

Page 29: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Multi-Relational ModelMulti-Relational Model

ICDM author network

KDD author network

PKDD author network ICML author network

ICDM

KDD

SDM

ECML

PKDD

PAKDD

CIKM

DMKD

SIGMOD

ICML

ICDE

0.009

0.011

0.0080.007

0.005

0.005

0.005

0.0040.004

0.004

relation network

Page 30: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Other ApplicationsOther Applications

Content-based Image Retrieval [He]

Personalized PageRank [Jeh], [Widom], [Haveliwala]

Anomaly Detection (for node; link) [Sun]

Link Prediction [Getoor], [Jensen]

Semi-supervised Learning [Zhu], [Zhou]…

Page 31: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Social Network Analysis

Linking mining

Problem: multi relational

Proposed approach

SummarySummary

Page 32: PhD Research Proficiency Exam

Computing & Information SciencesKansas State University

Laboratory forKnowledge Discovery in Databases

Thank youThank you