Joint Modeling of Topics, Citations, and Topical Authority in … · 2017-10-11 · Authority 12.86 0.37 4.83 0.04 highly cited R. Sutton R. Sutton M. Jordan Citation Network Inferred

Joint Modeling of Topics, Citations, and Topical Authority in AcademicCorpora

Jooyeon KimKAIST

[email protected]

Dongwoo KimAustralian National [email protected]

Alice OhKAIST

[email protected]

Abstract

Much of scientific progress stems from pre-viously published findings, but searchingthrough the vast sea of scientific publicationsis difficult. We often rely on metrics of schol-arly authority to find the prominent authors butthese authority indices do not differentiate au-thority based on research topics. We presentLatent Topical-Authority Indexing (LTAI) forjointly modeling the topics, citations, and top-ical authority in a corpus of academic papers.Compared to previous models, LTAI differs intwo main aspects. First, it explicitly modelsthe generative process of the citations, ratherthan treating the citations as given. Second,it models each author’s influence on citationsof a paper based on the topics of the cited pa-pers, as well as the citing papers. We fit LTAIinto four academic corpora: CORA, ArxivPhysics, PNAS, and Citeseer. We comparethe performance of LTAI against various base-lines, starting with the latent Dirichlet alloca-tion, to the more advanced models includingauthor-link topic model and dynamic authorcitation topic model. The results show thatLTAI achieves improved accuracy over othersimilar models when predicting words, cita-tions and authors of publications.

1 Introduction

With a corpus of scientific literature, we can ob-serve the complex and intricate process of scientificprogress. We can learn the major topics in journalarticles and conference proceedings, follow authorswho are prolific and influential, and find papers thatare highly cited. The huge number of publications

Topic Reinforcement Learning

Image Processing

Statistical Learning

Information Retrieval

Authority 2.76 0.14 10.29 0.10

highly cited

M. Jordan

Topic Reinforcement Learning

Image Processing

Statistical Learning

Information Retrieval

Authority 12.86 0.37 4.83 0.04

highly cited

R. Sutton

R. Sutton

M. Jordan

Citation Network

Inferred Authority

Figure 1: Overview of Latent Topical Authority In-dexing (LTAI). Based on content, citation, and au-thorship information (top), the LTAI discovers thetopical authority of authors; it increases when a pa-per with certain topics gets cited (bottom). Topicalauthority examples are the results of the LTAI withCORA dataset and 100 topics.

and authors, however, makes it practically impossi-ble to attain any deep or detailed understanding be-yond the very broad trends. For example, if we wantto identify authors who are particularly influential ina specific research field, it is difficult to do so with-out the aid of automatic analysis.

Online publication archives, such as GoogleScholar, provide near real-time metrics of schol-

191

Transactions of the Association for Computational Linguistics, vol. 5, pp. 191–204, 2017. Action Editor: Noah Smith.Submission batch: 11/2016; Revision batch: 2/2017; Published 7/2017.

c©2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

arly impact, such as the h-index (Hirsch, 2005), thejournal impact factor (Garfield, 2006), and citationcount. Those indices, however, are still at a coarselevel of granularity. For example, both Michael Jor-dan and Richard Sutton are researchers with veryhigh citation counts and h-index, but they are author-itative in different topics, Jordan in the more gen-eral machine learning topic of statistical learning,and Sutton in the topic of reinforcement learning. Itwould be much more helpful to know that via topicalauthority scores, as shown in Figure 1.

Fortunately, various academic publicationarchives contain the full contents, references, andmeta-data including titles, venues, and authors.With such data, we can build and fit a model topartition researchers’ scholarly domains into topicsat a much finer-grain and discover their academicauthority within each topic. To do that, we proposea model named Latent Topical-Authority Indexing(LTAI), based on the LDA, to jointly model thetopics, authors’ topical authority, and citationsamong the publications.

We illustrate the modeling power of the LTAI withfour corpora encompassing a diverse set of academicfields: CORA, Arxiv Physics, PNAS, and Citeseer.To show the improvements over other related mod-els, we carry out prediction tasks on words, cita-tions and authorship using the LTAI and comparethe results with those of Latent Dirichlet Allocation(Blei et al., 2003), relational topic model (Chang andBlei, 2010), author-link topic model, and dynamicauthor-cite topic model (Kataria et al., 2011), as wellas simple baselines of topical h-index. The resultsshow that the LTAI outperforms these other modelsfor all prediction tasks.

The rest of this paper is organized as follows.In section 2, we describe related work, includingmodels that are most similar to the LTAI, and de-scribe how the LTAI fits in and contributes to thefield. In section 3, we describe the LTAI model indetail and present the generative process. In sec-tion 4, we explain the algorithm for approximate in-ference, and in section 5, we present a faster algo-rithm for scalability. In section 6, we describe theexperimental setup and in section 7, we present theresults to show that the LTAI performs better thanother related models for word, citation and author-ship prediction.

2 Related Work

In this section, we review related papers, first in thefield of NLP and ML-based analysis of scientificcorpora, then the approaches based on the Bayesiantopic models for academic corpora, and lastly jointmodels of topics, authors, and citations. In ana-lyzing scientific corpora, previous research presentsclassifying scientific publications (Caragea et al.,2015), recommending yet unlinked citations (Huanget al., 2015; Neiswanger et al., 2014; Wang et al.,2015; Jiang, 2015), summarizing and extracting keyphrases (Cohan and Goharian, 2015; Caragea et al.,2014), triggering a better model fit (He et al., 2015),incorporating authorship information to increase thecontent and link predictability (Sim et al., 2015), es-timating a paper’s potential influence on academiccommunity (Dong et al., 2015), and finding andclassifying different functionalities of citation prac-tices (Moravcsik and Murugesan, 1975; Teufel et al.,2006; Valenzuela et al., 2015).

Several variants of topic modeling consider the re-lationship between topics and citations in academiccorpora. Topic models that use text and citation net-works are divided into two types: (a) models thatgenerate text given citation networks (Dietz et al.,2007; Foulds and Smyth, 2013) and (b) models thatgenerate citation networks given text (Nallapati etal., 2008; Liu et al., 2009; Chang and Blei, 2010).While our model falls into the latter category, wealso take into account the influence of the authorson the citation structure.

Most closely related to the LTAI is the citationauthor topic model (Tu et al., 2010), the author-link topic model, and the dynamic author-cite topicmodel (Kataria et al., 2011). Similar to the LTAI,they are designed to capture the influence of the au-thors. However, these models infer authority by ref-erencing only the citing papers’ text, while our au-thority is based on the predictive modeling of com-paring both the citing and the cited papers. Further-more, the LTAI defines a generative model of cita-tions and publications by introducing a latent au-thority index, whereas the previous models assumethe citation structure is given. The LTAI thus explic-itly gives a topical authority index, which directlyanswers the question of which author increases theprobability of a paper being cited.

192

3 Latent Topical-Authority Indexing

The LTAI models the complex relationships amongthe topics of publications, the topical authority of theauthors, and the citations among these publications.The generative process of the LTAI can be dividedinto two parts: content generation and citation net-work generation. We make several assumptions inthe LTAI to model citation structures of academiccorpora. First, we assume a citation is more likelyto occur between two papers that are similar in theirtopic proportions. Second, we assume that an authordiffers in their authority (i.e., potential to induce ci-tation) for each topic, and an author’s topical author-ity positively correlates with the probability of cita-tions among publications. Also, in the LTAI, whenthere are multiple authors in a single cited publica-tion, their contribution of forming citations with re-spect to different citing papers varies according totheir topical authority. Lastly, we assign differentconcentration parameters for a pair of papers withand without citations. In this paper, we use positiveand negative links to denote pairs of papers with andwithout citations respectively.

Figure 2 illustrates the graphical model of theLTAI, and we summarize the generative process ofthe LTAI, where the variables of the model are ex-plained in the remainder of this section, as follows:

1. For each topic k, draw topic βk ∼ Dir(αβ).

2. For each document i:

(a) Draw topic proportion θi ∼ Dir(αθ).(b) For each word win:

i. Draw topic assignment zin|θi ∼Mult(θi).

ii. Draw word win|zin, β1:K ∼Mult(βzin).

3. For each author a and topic k:

(a) Draw authority index ηak ∼ N (0, α−1η ).

4. For each document pair from i to j:

(a) Draw influence proportionparameter πi←j ∼ Dir(πi).

(b) Draw author ai|πi←j ∼ Mult(πi←j).(c) Draw link xi←j ∼N (∑

k ηaikz̄ikz̄jk, c−1i←j).

✓i ✓j↵⌘⇡i

zin zjn

win wjn�k

⌘a

↵�

↵✓

Ni NjK

A

i jx

↵i j

Figure 2: Graphical representation of the LTAI.The LTAI jointly models content-related variablesθ, z, w, β, and author and citation related variablesη and π.

3.1 Content Generation

To model the content of publications, we followa standard document generative process of LatentDirichlet Allocation (LDA) (Blei et al., 2003). Weinherit notations for variables from LDA; θ is theper-document topic distribution, β is the per-topicword distribution, z is the topic for each word in adocument where w is the corresponding word, andαθ, αβ are the Dirichlet parameters of θ and β.

3.2 Citation Generation

Let xi←j be a binary valued variable which indicatesthat publication j cites publication i. We formulatea continuous variable ri←j which is a linear com-bination of the authority variable and the topic pro-portion variable to approximate xi←j by minimiz-ing the sum of squared errors between the two vari-ables. There is a body of research on using contin-uous user and item-related variables to approximatebinary variables in the field of recommender systems(Rennie and Srebro, 2005; Koren et al., 2009).

Approximating binary variables using a lin-ear combination of continuous variables can beprobabilistically generalized (Salakhutdinov andMnih, 2007). Using probabilistic matrix factor-ization, we approximate probability mass func-tion p(xi←j) using probability density functionN (xi←j |ri←j , c−1i←j), where the precision parameterci←j can be set differently for each pair of papers aswill be discussed below.

193

Content Similarity Between Publications: Inthe LTAI, we model relationships between a randompair of documents i and j. The probability of pub-lication j citing publication i is proportional to thesimilarity of topic proportions of two publications,i.e., ri←j positively correlates to

∑k θikθjk. Fol-

lowing the relational topic model’s approach (Changand Blei, 2010), we use z̄i = 1

Ni

∑n zi,n ≈ θi in-

stead of a topic proportion parameter θi.Topical Authority of Cited Paper: We introduce

a K-dimensional vector ηa for representing the top-ical authority index of author a. ηak is a real numberdrawn from the zero-mean normal distribution withvariance α−1η . Given the authority indices ηai for au-thor a of cited publication i, the probability of a ci-tation is further modeled as ri←j =

∑k ηaikz̄ikz̄jk,

where the authority indices can promote or demotethe probability of a citation.

Different Degree of Contribution among Mul-tiple Authors: Academic publications are oftenwritten by more than one author. Thus, we needto distinguish the influence of each author on a ci-tation between two publications. Let Ai be a setof authors of publication i. To measure the influ-ence proportion of author a ∈ Ai on the citationfrom i to j, we introduce an additional parameterπi←j which is a one-hot vector drawn from a Dirich-let distribution with |Ai|-dimensional parameter πi.πi←ja ∈ {0, 1} is an element of πi←j which mea-sures the influence of author a on the citation fromj to i and sums up to one (

∑a∈Ai

πi←ja = 1)over all authors of publication i. We approximatethe probability of citation xi←j from publicationj to publication i by p(xi←j |z, πi←j , ai←j , ηa) ≈∑

a∈Aiπi←jaN (xi←j |

∑k ηakz̄ikz̄jk, c

−1i←j) which

is a mixture of normal distributions with precisionparameter ci←j . Therefore, if topic distributions ofpaper i and j are similar and if η values of the citedpaper’s authors are high, the citation formation prob-ability increases. On the other hand, dissimilar ortopically irrelevant pair of papers with less author-itative authors on the cited paper will be assignedwith low probability of citation formation.

Different Treatment between Positive and Neg-ative links: Citation is a binary problem where xi←jis either one or zero. When xi←j is zero, this can beinterpreted in two ways: 1) the authors of citing pub-lication j are unaware of the publication i, or 2) the

publication j is not relevant to publication i. Identi-fying which case is true is impossible unless we arethe authors of the publication. Therefore the modelembraces this uncertainty in the absence of a link be-tween publications. We control the ambiguity by theGaussian distribution with precision parameter ci←jas follows:

ci←j =

{c+ if xi←j = 1c− if xi←j = 0

(1)

where c+ > c− to ensure that we have more confi-dence on the observed citations. This is an implicitfeedback approach that permits using negative ex-amples (xi←j = 0) of sparse observations by miti-gating their importance (Hu et al., 2008; Wang andBlei, 2011; Purushotham et al., 2012). Setting dif-ferent values to the precision parameter ci←j accord-ing to xi←j induces cyclic dependencies between thetwo variables. Due to this cycle, the model no longerbecomes a Bayesian network, or a directed acyclicgraph. However, we note that this setting does leadto better experimental results, and we show the prag-matic benefit of the setting in the Evaluation section.

3.3 Joint Modeling of the LTAI

In the LTAI, the topics and the link structures are si-multaneously learned, and thus the content-relatedvariables and the citation-related variables mutuallyreshape one another during the posterior inference.On the other hand, if content and citation data aremodeled separately, the topics would not reflect anyinformation about the document citation structures.Thus, in the LTAI, documents with shared links aremore likely to have similar topic distributions whichleads to a better model fit. We develop and explainthis joint inference in section 4. In section 7, we il-lustrate the differences in word-level predictive pow-ers of the LTAI and LDA.

4 Posterior Inference

We develop a hybrid inference algorithm in whichthe posterior of content-related parameters θ, z, andβ are approximated by variational inference, andauthor-related parameters π and η are approximatedby an expectation-maximization (EM) algorithm. Inalgorithm 1, we summarize the full inference proce-dure of the LTAI.

194

4.1 Content Parameters: Variational Update

Since computing the posterior distribution of theLTAI is intractable, we use variational inference tooptimize variational parameters each of which cor-respond to original content-related variables. Fol-lowing the standard mean-field variational approach,we define fully factorized variational distributionsover the topic-related latent variables q(θ, β, z) =∏i q(θi|Ψin)

∏Niq(zin|γi)

∏k q(βk|λk) where for

each factorized variational distribution, we place thesame family of distributions as the original distribu-tion. Using the variational distributions, we boundthe log-likelihood of the model as follows:

L[q] = Eq[∑

k

log p(βk|αβ) +∑

i

log p(θi|αθ)

+∑

i

∑

Ni

log p(zin|θd) + log p(win|βzin) (2)

+∑

i,j

log p(xi←j |zi, zj , πi)]−H[q]

whereH[q] is the negative entropy of q.Taking the derivatives of this lower bound with

respect to each variational parameter, we can ob-tain the coordinate ascent updates. The update forthe variational Dirichlet parameters γi and the λk isthe same as the standard variational update for LDA(Blei et al., 2003). The update for the variationalmultinomial φin is:

φink ∝ exp

{∑j ∂Eq[log p(xi←j |z̄i, z̄j , πi, η)]

∂φink

+

∑j ∂Eq[log p(xj←i|z̄j , z̄i, πj , η)]

∂φink(3)

+ Eq[log θik] + Eq[log βkwin]}

where the gradient of expected log probabilities ofboth incoming link xi←j and outgoing link xj←icontribute to the variational parameter. The first ex-pectation can be rewritten as

Eq[log p(xi←j |z̄i, z̄j , πi, η)] (4)

= Eq[log∑

a∈Ai

p(ai←j = a|πi)p(xi←j |z̄i, z̄j , ηa)]

≥∑

a∈Ai

p(ai←j = a|πi)Eq[log p(xi←j |z̄i, z̄j , ηa)]

Algorithm 1 Posterior inference algorithm for theLTAI

Initialize γ, λ, π, and η randomlySet learning-rate parameter ρt that satisfiesRobbins-Monro conditionSet subsample sizes SV , SE , SS and SArepeat

Variational update: local publication parametersSS ← SS randomly sampled publicationsfor i in SS do

for n = 1 to Ni doS←, S→ ← Set of SV random samplesUpdate φink using Equation 4, 5, 9.

end forγi ← αθ +

∑Niφin

end for

EM update: local author parametersSA ← SA randomly sampled authorsfor a in SA doSE ← SE random publication pairsUpdate ηa using Equation 7, 10for i in Da and j = 1 to D do

πi←ja ∝ πiaN (z̄i>ηaz̄j , c

−1i←j)

end forend for

Stochastic variational updatefor k = 1 to K do

λ̂k ← αβ + DSS

∑SSd=1

∑Nd

n=1 φkdnwdn

end forSet λ(t) ← (1− ρt)λ(t−1) + ρtλ̂

until satisfying converge criteria

where Ai is the set of authors of i. We take thelower bound of the expectation using Jensen’s in-equality. The last term is approximated by the firstorder Taylor expansion Eq[log p(xi←j |z̄i, z̄j , ηa)] =

N (xi←j |φ̄>i diag(ηa)φ̄j , c−1i←j). Finally, the approxi-

mated gradient of φink with respect to the incomingdirections to document i is

∑j ∂Eq[log p(xi←j |z̄i, z̄j , πi, η)]

∂φink≈ (5)

∑

j

φ̄jkci←jNi

∑

a∈Ai

ηak(xi←j − φ̄>i diag(ηa)φ̄j)p(a|πi)

where diag is a diagonalization operator and φ̄i is∑Nin=1 φin/Ni. We can compute the gradient with

respect to the outgoing directions in the same way.

195

4.2 Author Parameters: EM Step

We use the EM algorithm to update author-relatedparameters π, and η based on the lower bound com-puted by variational inference. In the E step, wecompute the probability of author contribution to thelink between document i and j.

πi←ja =πiaN (z̄i

>ηaz̄j , c−1i←j)∑

a′∈Ai

πia′N (z̄i>ηa′ z̄j , c−1i←j)

(6)

In the M step, we optimize the authority param-eter η for each author. Given the other estimatedparameters, taking the gradient of L with respect toηa and setting it to zero leads to the following updateequation:

ηa = (Ψ>a CaΨa + αηI)−1Ψ>a CaXa (7)

Let Da be the set of documents written by authora and Da(i) be the ith document written by a. ThenΨa is a vertical stack of |Da|matrices ΨDa(i), whosejth row is φ̄Da(i) ◦ φ̄j , the Hadamard product be-tween φ̄Da(i) and φ̄j . Similarly, Ca is a vertical stackof |Da|matrices CDa(i) whose j th diagonal elementis cDa(i)←j , andXa is a vertical stack of |Da| vectorsXDa(i) whose j th element is πDa(i)←ja×xDa(i)←j .Finally, we update πDa(i)a =

∑j πDa(i)←ja/D.

5 Faster Inference Using StochasticOptimization

To model topical authority, the LTAI considers thelinkage information. If two papers are linked by ci-tation, the topical authority of the cited paper’s au-thors will increase while the negative link buffers thepotential noise of irrelevant topics. This algorith-mic design of the LTAI results in high model com-plexity. To remedy this issue, we adopt the noisygradient method from the stochastic approximationalgorithm (Robbins and Monro, 1951) to subsam-ple negative links for updating per-document topicvariational parameter φ and authority parameter η.The prior work of using subsampled negative linksto reduce computational complexity is introduced in(Raftery et al., 2012). We elucidate how stochasticvariational inference (Hoffman et al., 2013) is ap-plied in our model to update global per-topic-wordvariational parameter λ.

5.1 Updating φ and ηUpdating φ̄i for document i in variational update re-quires iterating over every other document and com-puting the gradient of link probability. This leads tothe time complexity O(DK) for every φ̄i.

To apply the noisy gradient method, we divide thegradient of the expected log probability of link intotwo parts:

∑

j

∂Eq[log p(xi←j |z̄i, z̄j , πi)]∂φink

= (8)

∑

j:xi←j=1

∂Eq[log p(xi←j)]∂φink

+∑

j:xi←j=0


where the first and the second terms of the right-hand side are the gradient sums of positive links(xi←j = 1) and negative links (xi←j = 0) respec-tively. Compared to positive links, the order of neg-ative links is close to the total number of documents,and thus computing the second term results in com-putational inefficiency. However, in our model, wereduced the importance of the negative links by as-signing a larger variance c−1i←j compared to the pos-itive links, and the empirical mean of φ̄j for neg-ative links follows the Dirichlet expectation due tothe large number of negative links. Therefore, weapproximate the expectation of the gradient for thenegative links using the noisy gradient as follows:

∑

j:xi←j=0


=D−iSV

∑

SV

∂Eq[log p(xi←s)]∂φink

(9)

where D−i is the number of negative links (i.e.xi←j = 0) of document i, and SV is the size of sub-samples SV for the variational update. We randomlysample SV documents, compute gradients on thesampled documents, and then scale the average gra-dient to the size of the negative link D−i . This noisygradient method reduces the updating time complex-ity from O(DK) to O(SVK).

Now, we discuss how to approximate author’stopical authority based on Equation 7. When K �D × Da, the computational bottleneck is Ψ>a CaΨa

which has time complexity O(DDaK2). To allevi-

ate this complexity, we once again approximate thelarge number of negative links using smaller num-ber of subsamples. Specifically, while keeping thepositive link rows Ψa+ intact, we approximate nega-tive link rows in Ψa using a smaller matrix Ψa− that

196

0 5000 10000 15000 20000 25000Time (sec)

7.90

7.85

7.80

7.75

7.70

7.65

7.60

Log P

redic

tive P

robabili

tyStochastic

Batch

Figure 3: Training time of the LTAI on CORAdataset with stochastic and batch variational infer-ence. Using stochastic variational inference, the per-word predictive log likelihood converges faster thanusing the batch variational inference.

Dataset # Tokens # Documents # Authors Avg C/D Avg C/ACORA 17,059 13,147 12,111 3.46 12.17Arxiv-Physics 49,807 27,770 10,950 12.70 67.93PNAS 39,664 31,054 9,862 1.57 13.18Citeseer 21,223 4,255 6,384 1.24 4.38

Table 1: Datasets. From left to right, each columnshows the number of word tokens, number of docu-ments, number of authors, average citations per doc-ument (Avg C/D), and average citations per author(Avg C/A).

has SE rows, or the size of subsamples for the EMstep. Using this approximation, we can representΨ>a CaΨa as

Ψ>a CaΨa = c+Ψ>a+Ψa+ +c−D−aSE

Ψ>a−Ψa− (10)

with the time complexity of O(SEK2), where D−a

is the number of rows with negative links in Ψa. Al-though we do not incorporate rigorous analysis onthe performance of our model given the size of thesubsamples, we confirm that the negative link sizegreater than 100 does not degrade the model perfor-mance in any of our experiment.

5.2 Updating λIn traditional coordinate ascent based variational in-ference, the global variational parameter λ is up-dated infrequently because all the other local param-eters φ need to be updated beforehand. This problemis more noticeable in the LTAI since updating φ us-ing equation 3 is slower than updating φ in vanillaLDA; moreover, per-author topical authority vari-able η is another local variable that algorithm needs

to update a priori. However, using the stochasticvariational inference, the global parameters are up-dated after a small portion of local parameters areupdated (Hoffman et al., 2013). Applying stochasticvariational inference for the LTAI is straightforwardafter we calculate the intermediate topic-word varia-tional parameter λ̂ by αβ + D

SS

∑SSd=1

∑Ndn=1 φ

kdnwdn

from the noisy estimate of the natural gradient withrespect to subsampled local parameters where Nd isthe number of words for document d, and SS is thesubsample size for the minibatch stochastic varia-tional inference. The final global parameter for thetth iteration λ(t) is updated by (1− ρt)λ(t−1) + ρtλ̂where ρt is the learning-rate. Posterior inferenceis guaranteed to converge at local optimum whenthe learning rate satisfies the condition

∑∞t=1 ρt =

∞,∑∞t=1 ρ2t < ∞ (Robbins and Monro, 1951). In

Figure 3, we confirm that stochastic variational in-ference is applicable for the LTAI and reduces thetraining time compared to using the batch counter-part, while maintaining similar performance.

6 Experimental Settings

In this section, we introduce the four academic cor-pora used to fit the LTAI, describe comparison mod-els, and provide information about the evaluationmetric and parameter settings for the LTAI1.

6.1 Datasets

We experiment with four academic corpora: CORA(McCallum et al., 2000), Arxiv-Physics (Gehrke etal., 2003), the Proceedings of the National Academyof Sciences (PNAS), and Citeseer (Lu and Getoor,2003). CORA, Arxiv-Physics, and PNAS datasetscontain abstracts only, and the locations of the cita-tions within each paper are not preserved, whereasthe Citeseer dataset contains the citation locations.For CORA, Arxiv-Physics, and PNAS, we lemma-tize words, remove stop words, and discard wordsthat occur fewer than four times in the corpus. Ta-ble 1 describes the datasets in detail. Note that weobtain citation data from the entire document, notonly from the abstract. Also, we consider within-corpus citation only, which leads to less than 13 av-erage citation counts per document for all corpora.

1Code and datasets are available at http://uilab.kaist.ac.kr/research/TACL2017/

197

0 500 1000 1500 2000

Iteration

7.850

7.725

7.600

Log P

redic

tive P

robabili

ty

(a) CORA

0 500 1000 1500 2000

Iteration

7.950

7.825

7.700

Log P

redic

tive P

robabili

ty

(b) Arxiv-Physics

LTAI

LTAI-10% LTAI-20%

LDA

LTAI-30%

0 500 1000 1500 2000

Iteration

8.500

8.325

8.150

Log P

redic

tive P

robabili

ty

(c) PNAS

0 500 1000 1500 2000

Iteration

8.3

8.0

7.7

Log P

redic

tive P

robabili

ty

(d) Citeseer

Figure 4: Word-level prediction result. We mea-sured per-word log predictive probability on fourdatasets. As shown in the graphs, our model per-forms better than LDA.

6.2 Comparison Models

We compare predictive performance of the LTAIwith five other models. Different comparison mod-els have different degrees of expressive powers.Each model conducts a certain type of predictiontask; while RTM, ALTM, and DACTM predicts ci-tation structures, the topical h-index predicts author-ship information. The baseline topic models areimplemented based on the inference methods sug-gested in the corresponding papers; LDA, RTM andthe LTAI variants use variational inference, whileALTM and DACTM use collapsed Gibbs sampling.Finally, all the conditions for implementation suchas the choice of programming language and mod-ules, except for parts that convey each model’sunique assumption, are identically set; thus, the per-formance differences between models are due totheir model assumption and different degrees of datausage, rather than the implementation technicalities.

Latent Dirichlet Allocation: LDA (Blei et al.,2003) discovers topics and represents each publica-tion by a mixture of the topics. Compared to othermodels, LDA only uses the content information.

LTAI-n%: In LTAI-n%, we remove n% of ac-tual citations and displace them with arbitrarily se-lected false connections. Note that the link struc-tures are displaced rather than removed. If the ci-tation links are just removed, the LTAI and LTAI-n% cannot be fairly compared as the density of thecitation structures will be affected and each modelneeds different concentration values. Performancedifference between the LTAI and this indicates thatunder identical conditions, using the correct linkageinformation is indeed beneficial for prediction.

LTAI-C: In LTAI-C the precision parameter ci←jhas constant value, rather than assigning differentvalues according to xi←j as discussed in section 3.

LTAI-SEP: LTAI-SEP has an identical structureas the LTAI, but the topic and the authority variablesare separately learned. Once the topic variables arelearned using the vanilla LDA, authority and citationvariables are then inferred consecutively. Thus, theperformance edge of the LTAI over LTAI-SEP high-lights the necessity of the LTAI’s joint modeling inwhich both topic and authority related variables re-shape one another in an iterative fashion.

Relational Topic Model: RTM (Chang and Blei,2010) jointly models content and citation, and thus,topic proportions of a pair of publications becomesimilar if the pair is connected by citations. Com-pared to the LTAI, the author information is not con-sidered, the link structure does not have directional-ity and the model does not consider negative links.

Author-Link Topic Model: ALTM (Kataria etal., 2011) is a variation of author topic model (ATM)(Rosen-Zvi et al., 2004) that models both topicalinterests and influence of authors in scientific cor-pora. The model uses content information of citingpapers and names of the cited authors as word to-kens. ALTM outputs per-topic author distributionthat functions as author influence indices.

Dynamic Author-Citation Topic Model:DACTM (Kataria et al., 2011) is an extension ofALTM that requires publication corpora whichpreserves sentence structures. To model authorinfluence, DACTM selectively uses words that areclose to the point where the citation is presented.

198

100 200 300 400

Number of Topics

0.02

0.05

0.08

MR

R

(a) CORA

100 200 300 400

Number of Topics

0.010

0.025

0.040

MR

R

(b) Arxiv-Physics

LTAI

RTM

LTAI-C

DACTM

LTAI-SEP

ALTM

100 200 300 400

Number of Topics

0.010

0.025

0.040

MR

R

(c) PNAS

100 200 300 400

Number of Topics

0.04

0.10

0.16

MR

R

(d) Citeseer

Figure 5: Citation prediction results. The task is tofind out which paper is originally linked to a citedpaper. We measure Mean Reciprocal Rank (MRR)to evaluate model performance. For all cases, theLTAI performs better than the other methods.

In our corpora, only Citeseer dataset preserves thesentence structure.

Topical h-index: To compute topical h-index, weseparate the papers into several clusters using LDAand calculate the h-index within each cluster. Topi-cal h-index is used for author prediction in the samemanner as we did for our model, except the topicproportions are replaced to the LDA’s result and η isreplaced to the topical h-index values.

6.3 Evaluation Metric and Parameter Settings

We use MRR (Voorhees, 1999) to measure the pre-dictive performance of the LTAI and the comparisonmodels. MRR is a widely used metric for evaluat-ing link prediction tasks (Balog and de Rijke, 2007;Diehl et al., 2007; Radlinski et al., 2008; Huang etal., 2015). When the models output the correct an-

swers as ranks, MRR is the inverse of the harmonicmean of such ranks.

We report the parameter values used for evalua-tions. For all datasets, we set c− to 1. To predict acitation, we set c+ to 10,000, 100, 1,000, 10, andto predict authorship, we set c+ to 1,000, 1,000,10,000, 1,000 for CORA, Arxiv-Physics, PNAS, andCiteseer datasets. These values are obtained throughexhaustive parameter analysis. We set αθ to 1, andαβ to 0.1. We fix the subsample sizes to 5002. Forfair comparison, all the parameters that the LTAI andthe baseline models share are set to have the samevalues, and for other parameters that uniquely be-long to the baseline models, the values are exhaus-tively tuned as done in the LTAI. Finally, we notethat all parameters are tuned using the training set,and test dataset is used only for the testing purpose.

7 Evaluation

We conduct the evaluation of the LTAI with threedifferent quantitative tasks, along with one qualita-tive analysis. In the first task, we check whetherusing citation and authorship information in theLTAI helps increase the word-level predictive per-formance. In the second and third tasks, wemeasure the predictability of the LTAI regardingmissing publication-publication linkage and author-publication linkage. With these two tasks, we com-pare the predictive power of the LTAI with othercomparison models and use MRR as evaluation met-ric. Finally, we observe famous researchers’ topi-cal authority scores generated by the LTAI and in-vestigate how these scores capture notable academiccharacteristics of the researchers.

7.1 Word-level PredictionIn the LTAI, citation and authorship information af-fect per-document topic proportions, as can be con-firmed in equation 3. This joint modeling of contentand linkage structure, compared to vanilla LDA thatuses content data only, yields better performance interms of predicting missing words in documents. Inthis task, we use log-predictive probability, a met-ric that is widely used in other researches for mea-suring model fitness (Teh et al., 2006; Asuncion et

2Although we do not present thorough sensitivity analysis inthis paper, we confirm that the performance of our model wasrobust against adjusting the parameters within a factor of 2.

199

100 200 300 400

Number of Topics

0.050

0.085

0.120

MR

R

(a) CORA

100 200 300 400

Number of Topics

0.02

0.04

0.06

MR

R

(b) Arxiv-Physics

LTAI LTAI-C LTAI-SEP Topical h-index

100 200 300 400

Number of Topics

0.035

0.050

0.065

MR

R

(c) PNAS

100 200 300 400

Number of Topics

0.100

0.115

0.130

MR

R

(d) Citeseer

Figure 6: Author prediction results. The task is tofind out who the author of a cited paper is, given allthe citing papers. For all cases, the LTAI performsbetter than the other methods.

al., 2009; Hoffman et al., 2013). For each corpus,we separate one third of the documents as the testset. For all documents in each test set, we use halfof the words for training per-document topic pro-portion θ and predict the probability of word occur-rence regarding the remaining half. Specifically, thepredictive probability for a word in a test set wnewwith respect to the given words wobs and the train-ing documentDtrain is computed using the equationp(wnew|Dtrain, wobs) =

∑Kk=1 Eq[θk]Eq[βk,wnew ].

Figure 4 illustrates the per-word log-predictiveprobability in each corpus. We confirm that whenusing the LTAI, the log predictive probability con-verges at a higher value compared to the result us-ing LDA. Also, when we corrupt the link structurefrom 10% to 30% the predictive performances of theLTAI gradually decrease. Thus, the LTAI’s superiorpredictive performance is attributed to its usage ofcorrect citations rather than the algorithmic bias.

7.2 Citation Prediction

We evaluate model predictability regarding whichpublication is originally citing a certain publica-tion. Specifically, we randomly remove one ci-tation from each of the documents in the testset. To predict the citation link between pub-lications, we first compute the probability thatpublication j cites i from p(xi←j |z,Ai, πi) ∝∑

a∈Aiπi←jaN (xi←j |z̄>i diag(ηa)z̄j , c

−1+ ). Given

the topic proportion of the cited publication θi andthe topical authorities of the authors ηa, we computewhich publication is more likely to cite the publi-cation. Based on our model assumption in subsec-tion 3.2, using topical authority increases the perfor-mance of predicting linkage structure.

In Figure 5, the LTAI yields better citation predic-tion performance than other models for all datasetsand with the most number of topics. Since theLTAI incorporates topical authority for predicting ci-tations, it performs better than RTM, which does notdiscover topical authority. We can attribute the bet-ter performance of the LTAI compared to ALTM andDACTM to the LTAI’s multiple model assumptionsexplained in section 3. We note that DACTM re-quires additional information such as citation loca-tion and sentence structure, and thus, is only appli-cable for limited kinds of datasets.

7.3 Author Prediction

For author prediction, we randomly remove oneof the authors from documents in the test setwhile preserving citation structures. Similar tocitation prediction, we predict which author ismore likely to write the cited publication basedon the topic proportions of cited publication iand a set of citing publications J . We approx-imate the probability of researcher a being anauthor of publication i from p(a|z, ηa, xi←j) ∝∏j∈J N (xi←j |z̄>i diag(ηa)z̄j , c

−1+ ). Because the

mixture proportion of an unknown author πi←jacannot be obtained during posterior inference, weassume the cited publication is written by a singleauthor to approximate the probability. For authorprediction, we choose the author that maximizes theabove probability. In Figure 6, the LTAI outperformsthe comparison models in most of the settings.

200

Author h-index # cite # paper Representative Topic T Authority

D Padua 12 291 21 parallel, efficient, computation, runtime 10.36

V Lesser 11 303 48 interaction, intelligent, multiagent, autonomous 11.92

M Lam 11 440 20 memory, processor, cache, synchronization 12.74

M Bellare 11 280 43 scheme, security, signature, attack 13.21

L Peterson 10 297 24 operating, mechanism, interface, thread 9.28

D Ferrari 10 377 18 traffic, delay, bandwidth, allocation 14.16

O Goldreich 9 229 49 proof, known, extended, notion 12.57

M Jordan 9 263 27 approximation, intelligence, artificial, correlation 10.15

D Culler 9 565 30 operating, mechanism, interface, thread 12.37

A Pentland 8 207 39 image, motion, visual, estimate 10.82

Table 2: Authors with the highest h-index scores and their statistics from the CORA dataset. We show theauthors with their h-index, number of citations (# cite), and number of papers (# paper), representative topic,and their topical authority (T Authority) of the corresponding topic. We show that while the authors havethe highest h-indices with lots of papers written and lots of citations earned, the topics that the authors exertauthority varies.

7.4 Qualitative Analysis

To stress our model’s additional characteristics thatare not observed in the quantitative analysis, we lookat the assigned topical authority indices as well asother statistics of some researchers in the dataset. Inthe analyses, we set the number of topics to 100, anduse CORA dataset for demonstration.

We first demonstrate famous authors’ authorita-tive topics that can be unveiled using our model. InTable 2, we list top 10 authors with highest h-indicesalong with their number of citations, number of pa-pers, and their representative topics. Authors’ rep-resentative topics are the topics with highest author-ity scores. In the table, we observe that all authorswith top h-indices have written at least 18 papers andearned at least 207 citations, which are the top 0.8%and 0.2% values respectively. However, their au-thoritative topics retrieved by the LTAI do not over-lap for any of the authors. This table illustrates thateach of the top authors in the table exerts authorityon different academic topics that can be captured bythe LTAI, while the authors commonly have highesth-index scores as well as other statistics.

We now stress attributes of topical authority indexthat are different from other topic irrelevant statis-tics. From Tables 3 to 5, we show four exampletopics extracted by our model and list notable au-thors within each topic with their topical authorityindices, h-indices, number of citations, and numberof papers. In the tables, we first find that all four

authors with highest topical authority values, Mon-ica Lam, Alex Pentland, Michael Jordan, and MihirBellare are also listed in the topic-irrelevant author-ity rankings in Table 2. From this, we confirm thatauthority score of the LTAI has a certain degree ofcorrelation to other statistics, while it splits the au-thors by their authoritative topics.

At the same time, the topical authority score cor-relates less with topic-irrelevant statistics than thosestatistics correlate with themselves. In Table 5, OdedGoldreich has lower topical authority score for thecomputer security topic while having higher topicirrelevant scores than the above four researchers,because his main research filed is in the theoryof computation and randomness. We can spot au-thors who exert high authority on multiple academicfields, such as Tomaso Poggio in Table 3 and in Ta-ble 4. Similarity, when comparing Federico Girosiand Tomaso Poggio in Table 4, the two researchershave similar authority indices for this topic whileTomaso Poggio has higher values for the other threetopic-irrelevant indices. This is a reasonable out-come when we investigate the two researchers’ pub-lication history. Federico Girosi has relatively fo-cused academic interest, with his publication his-tory being skewed towards machine-learning-relatedsubjects, while Tomaso Poggio has broader topicalinterests that include computer vision and statisticallearning, while also co-authoring most of the papersthat Federico Girosi wrote. Thus, Federico Girosi

201

Topic: image, motion, visual, estimate,

robust, shape, scene, geometric

Rank Author Topical Authority h-index # cite # paper

1 A Pentland 10.82 8 207 39

2 J Fessler 9.09 6 92 26

3 T Poggio 8.22 6 178 27

4 S Sclaroff 7.61 3 69 11

5 K Toyama 6.65 4 41 10

Table 3: Authors who have a high authority score ina computer vision topic.

Topic: approximation, intelligence, artificial,

correlation, support, recognition, model, representation


1 M Jordan 10.15 9 263 27

2 M Warmuth 9.57 8 160 17

13 T Poggio 3.48 6 178 27

17 F Girosi 3.22 3 101 9

34 M Jones 2.06 7 151 20

Table 4: Authors who have a high authority score inan artificial intelligence topic.

has a similar authority index for this topic but haslower authority indices for other topics than TomasoPoggio.

Also, our model is able to capture topic-specificauthoritative researchers that have relatively lowtopic-irrelevant scores. For example, researcherssuch as Stan Sclaroff and Kentaro Toyama are thetop 5 authoritative researchers in a computer visiontopic according to the LTAI, but it is difficult to de-tect these researchers out of many other authoritativeauthors using the topic-irrelevant scores.

Finally, the LTAI detect researchers’ topical au-thority that is peripheral but not negligible. MarkJones in Table 4, who has high h-index, a numberof citations, and wrote many papers, is a researcherwhose academic interest lies in programming lan-guage design and application. However, while mostof his papers’ main topics are about programminglanguage, he often uses inference techniques and al-gorithms in machine learning in his papers. Ourmodel captures that tendency and assigns a positiveauthority score for machine learning to him.

Topic: scheme, security, signature, attack,

threshold, authentication, cryptographic, encryption


1 M Bellare 13.21 11 280 43

2 P Rogaway 11.98 7 117 13

3 H Krawczyk 7.29 6 75 15

4 R Canetti 7.13 4 40 10

9 O Goldreich 3.70 9 229 49

Table 5: Authors who have a high authority score ina computer security topic.

8 Conclusion and Discussion

We proposed Latent Topical Authority Indexing(LTAI) to model the topical-authority of academicresearchers. Based on the hypothesis that authorsplay an important role in citations, we specificallyfocus on their authority and develop a Bayesianmodel to capture the authority. With model as-sumptions that are necessary for extracting convinc-ing and interpretable topical authority values for au-thors, we have proposed speed-up methods that arebased on stochastic optimization.

While there is prior research in topic modelingthat provides topic-specific indices when modelingthe link structure, these do not extend to individualindices, and most previous citation-based indices aredefined for each individual but without consideringtopics. On the other hand, our model combines themerits of both topic-specific and individual-specificindices to provide topical authority information foracademic researchers.

With four academic datasets, we demonstratedthat the joint modeling of publication and authorrelated variables improve topic quality, when com-pared to vanilla LDA. We quantitatively manifestedthat including authority variables increases the pre-dictive performance in terms of citation and authorpredictions. Finally, we qualitatively demonstratedthe interpretability by topical-authority outcomes ofthe LTAI from the CORA corpus.

Finally, there are issues that can be dealt with infuture work. We do not consider time information interms of when papers are published and when pairsof papers are linked; we can use datasets that incor-porate timestamps to enhance the model capabilityto predict future citations and authorships.

202

Acknowledgments

We thank Jae Won Kim for collecting and refin-ing the dataset and for contributing to the earlyversion of the manuscript. We also thank Ac-tion Editor Noah Smith and the anonymous re-viewers for their detailed and thoughtful comments;and we thank Joon Hee Kim and the other UILabmembers for providing helpful insights in the re-search as well. Finally, we thank Editorial Assis-tant Cindy Robinson for carefully proofreading themanuscript. This work was supported by Institutefor Information & communications Technology Pro-motion(IITP) grant funded by the Korea govern-ment(MSIP) (No.B0101-15-0307, Basic SoftwareResearch in Human-level Lifelong Machine Learn-ing (Machine Learning Center)).

References

Arthur Asuncion, Max Welling, Padhraic Smyth, andYee Whye Teh. 2009. On smoothing and inferencefor topic models. In UAI.

Krisztian Balog and Maarten de Rijke. 2007. Determin-ing expert profiles (with an application to expert find-ing). In IJCAI.

David Blei, Andrew Ng, and Michael Jordan. 2003. La-tent Dirichlet Allocation. JMLR, 3:993–1022.

Cornelia Caragea, Florin Adrian Bulgarov, AndreeaGodea, and Sujatha Das Gollapalli. 2014. Citation-enhanced keyphrase extraction from research papers:A supervised approach. In EMNLP.

Cornelia Caragea, Florin Bulgarov, and Rada Mihalcea.2015. Co-training for topic classification of scholarlydata. In EMNLP.

Jonathan Chang and David Blei. 2010. Hierarchical re-lational models for document networks. The Annals ofApplied Statistics, 4(1):124–150.

Arman Cohan and Nazli Goharian. 2015. Scientific arti-cle summarization using citation-context and article’sdiscourse structure. In EMNLP.

Christopher Diehl, Galileo Namata, and Lise Getoor.2007. Relationship identification for social networkdiscovery. In AAAI.

Laura Dietz, Steffen Bickel, and Tobias Scheffer. 2007.Unsupervised prediction of citation influences. InICML.

Yuxiao Dong, Reid Johnson, and Nitesh Chawla. 2015.Will this paper increase your h-index?: Scientific im-pact prediction. In WSDM.

James Foulds and Padhraic Smyth. 2013. Modeling sci-entific impact with topical influence regression. InEMNLP.

Eugene Garfield. 2006. The history and meaning of thejournal impact factor. JAMA, 295(1):90–93.

Johannes Gehrke, Paul Ginsparg, and Jon Kleinberg.2003. Overview of the 2003 KDD Cup. ACMSIGKDD Explorations Newsletter, 5(2):149–151.

Yuan He, Cheng Wang, and Changjun Jiang. 2015. Dis-covering canonical correlations between topical andtopological information in document networks. InCIKM.

Jorge Hirsch. 2005. An index to quantify an individual’sscientific research output. PNAS, 102(46):16569–16572.

Matthew Hoffman, David Blei, Chong Wang, and JohnPaisley. 2013. Stochastic variational inference.JMLR, 14:1303–1347.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Col-laborative filtering for implicit feedback datasets. InICDM.

Wenyi Huang, Zhaohui Wu, Chen Liang, Prasenjit Mitra,and Giles Lee. 2015. A neural probabilistic model forcontext based citation recommendation. In AAAI.

Zhuoren Jiang. 2015. Chronological scientific infor-mation recommendation via supervised dynamic topicmodeling. In WSDM.

Saurabh Kataria, Prasenjit Mitra, Cornelia Caragea, andGiles Lee. 2011. Context sensitive topic models forauthor influence in document networks. In IJCAI.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009.Matrix factorization techniques for recommender sys-tems. Computer, 42(8):30–37.

Yan Liu, Alexandru Niculescu-Mizil, and WojciechGryc. 2009. Topic-link LDA: joint models of topicand author community. In ICML.

Qing Lu and Lise Getoor. 2003. Link-based classifica-tion. In ICML.

Andrew Kachites McCallum, Kamal Nigam, Jason Ren-nie, and Kristie Seymore. 2000. Automating the con-struction of internet portals with machine learning. In-formation Retrieval, 3(2):127–163.

Michael Moravcsik and Poovanalingam Murugesan.1975. Some results on the function and quality of ci-tations. Social studies of science, 5(1):86–92.

Ramesh Nallapati, Amr Ahmed, Eric Xing, and WilliamCohen. 2008. Joint latent topic models for text andcitations. In SIGKDD.

Willie Neiswanger, Chong Wang, Qirong Ho, and Eric P.Xing. 2014. Modeling citation networks using latentrandom offsets. In UAI.

Sanjay Purushotham, Yan Liu, and C.-C. Jay Kuo. 2012.Collaborative topic regression with social matrix fac-torization for recommendation systems. In ICML.

203

Filip Radlinski, Madhu Kurup, and Thorsten Joachims.2008. How does clickthrough data reflect retrievalquality? In CIKM.

Adrian Raftery, Xiaoyue Niu, Peter Hoff, and Ka Yee Ye-ung. 2012. Fast inference for the latent space net-work model using a case-control approximate likeli-hood. Journal of Computational and Graphical Statis-tics, 21(4):901–919.

Jasson Rennie and Nathan Srebro. 2005. Fast maximummargin matrix factorization for collaborative predic-tion. In ICML.

Herbert Robbins and Sutton Monro. 1951. A stochasticapproximation method. The annals of mathematicalstatistics, 22(3):400–407.

Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, andPadhraic Smyth. 2004. The author-topic model forauthors and documents. In UAI.

Ruslan Salakhutdinov and Andriy Mnih. 2007. Proba-bilistic matrix factorization. In NIPS.

Yanchuan Sim, Bryan Routledge, and Noah Smith. 2015.A utility model of authors in the scientific community.In EMNLP.

Yee Whye Teh, Michael Jordan, Matthew Beal, and

David Blei. 2006. Hierarchical Dirichlet pro-cesses. Journal of the American Statistical Associa-tion, 101(476):1566–1581.

Simone Teufel, Advaith Siddharthan, and Dan Tidhar.2006. Automatic classification of citation function. InEMNLP.

Yuancheng Tu, Nikhil Johri, Dan Roth, and Julia Hock-enmaier. 2010. Citation author topic model in expertsearch. In COLING.

Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015.Identifying meaningful citations. In Workshops at theTwenty-Ninth AAAI Conference on Artificial Intelli-gence.

Ellen Voorhees. 1999. The TREC-8 question answeringtrack report. In TREC.

Chong Wang and David Blei. 2011. Collaborative topicmodeling for recommending scientific articles. InSIGKDD.

Jingang Wang, Dandan Song, Zhiwei Zhang, Lejian Liao,Luo Si, and Chin-Yew Lin. 2015. LDTM: A latentdocument type model for cumulative citation recom-mendation. In EMNLP.

204