22
Informetric methods seminar Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding Erjia Yan

Informetric methods seminar

Embed Size (px)

DESCRIPTION

Informetric methods seminar. Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding Erjia Yan. Contents. Network construction Ranking C lustering T opic modeling P ath finding. Contents. Network construction Ranking C lustering - PowerPoint PPT Presentation

Citation preview

Page 1: Informetric methods seminar

Informetric methods seminar

Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding

Erjia Yan

Page 2: Informetric methods seminar

Network construction Ranking Clustering Topic modeling Path finding

Contents

Page 3: Informetric methods seminar

Network construction Ranking Clustering Topic modeling Path finding

Contents

Page 4: Informetric methods seminar

Bibliographical data

From data to networks

Page 5: Informetric methods seminar

Paper-to-paper citation network is the base

Web of Science cited references format: First Author, Year Of Publication, Abbreviated

Journal Name, Volume Number, Beginning Page Number

AANESTAD M, 2011, J STRATEGIC INF SYST, V20, P161

All fields can be found in “full record + cited references” downloading option

Web of Science format

Some of the newer records may also have DOI. For a better match, it is better to remove the DOI from the cited references

Page 6: Informetric methods seminar

For citing papers, extract these fields and format them into Web of Science cited reference format.

Now we have citing papers and cited references that have the same format

Use these two fields, construct an internal citation network that only contains those cited references that are cited by the citing papers in the data set

Citation matching

Page 7: Informetric methods seminar

If you can write an app for this, it would be great!

Otherwise, you can follow these instructions

Converting into

Use Access to construct the network Have a table for citing papers Import the converted citation pairs to Access Use query to extract those pairs whose papers are in

the table Now you have the node info and link info Import both into Matlab

Procedures

CP1 CR1; CR2; CR3

CP1 CR1

CP1 CR2

CP1 CR3

Page 8: Informetric methods seminar

Now we have paper-to-paper citation networks, but in order to construct for instance author-to-author citation or author co-citation networks, we need to use adjacent matrices.

Adjacent matrices

Authors

Papersa cell number 1 (i,j)=1 indicates paper i is written by author j

Page 9: Informetric methods seminar

Convert into

Add to the beginning of the file

Use Txt2Pajek on the linkage file Import the edge section of the .net file to

Matlab Select M(1:n,n+1:m) where m is the col

size. The selection is our author-paper adjacent matrix

Procedures

ID1 AU1; AU2; AU3

ID1 AU1

ID1 AU2

ID1 AU3ID1 ID1

ID2 ID2

… …

IDn IDn

Page 10: Informetric methods seminar

Citation and coauthorship

Page 11: Informetric methods seminar

Cocitation and biblio. coupling

Page 12: Informetric methods seminar

Co-word

Page 13: Informetric methods seminar

Network construction Ranking Clustering Topic modeling Path finding

Contents

Page 14: Informetric methods seminar

By David Gleich of Purdue University http://

www.mathworks.com/matlabcentral/fileexchange/11613-pagerank

pagerank(M,options) options.c: the teleportation coefficient [double |

{0.85}] options.v: the personalization vector [vector |

{uniform: 1/n}]

PageRank

Page 15: Informetric methods seminar

Network construction Ranking Clustering Topic modeling Path finding

Contents

Page 17: Informetric methods seminar

By MIT Strategic Engineering http://

strategic.mit.edu/downloads.php?page=matlab_networks [modules,module_hist,Q] =

newmangirvan(adj,k) [groups_hist,Q]=newman_comm_fast(ad

j)

Modularity-based clustering

Page 18: Informetric methods seminar

By Nees van Eck and Ludo Waltman of Leiden University

http://www.vosviewer.com/relatedsoftware/ A variant of the modularity-based

clustering technique [X, cluster_size, V] = VOS_clustering(A,

P)

VOSviewer clustering

Page 19: Informetric methods seminar

Network construction Ranking Clustering Topic modeling Path finding

Contents

Page 20: Informetric methods seminar

By Mark Steyvers of University of California Irvine

http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm

Input: The input is a bag of word representation containing the number of times each words occurs in a document. 

Matlab Topic Modeling Toolbox

Page 21: Informetric methods seminar

Network construction Ranking Clustering Topic modeling Path finding

Contents

Page 22: Informetric methods seminar

http://www.mathworks.com/help/bioinfo/ref/graphshortestpath.html

[dist, path, pred]=graphshortestpath(G,S,T) from S to T in graph G

[dist] = graphallshortestpaths(G) find all shortest path in graph G; dist is a

distance matrix for the shortest path of each pair of nodes

Bioinformatics toolbox