Reinforcement Learning based Meta-path Discovery in Large ...Reinforcement Learning based Meta-path Discovery in Large-scale Heterogeneous Information Networks Guojia Wan,1,2 Bo Du,1,2

Reinforcement Learning based Meta-path Discovery in Large-scale HeterogeneousInformation Networks

Guojia Wan,1,2 Bo Du,1,2∗ Shirui Pan,3 Gholamreza Haffari3

1National Engineering Research Center for MultimediaSoftware, Wuhan University, Wuhan 430072, P. R. China.2School of Computer Science, Wuhan University, Wuhan 430072, P. R. China.

3 Faculty of Information Technology, Monash University, Melbourne, [email protected], [email protected], [email protected], [email protected]

Abstract

Meta-paths are important tools for a wide variety of datamining and network analysis tasks in Heterogeneous Infor-mation Networks (HINs), due to their flexibility and inter-pretability to capture the complex semantic relation amongobjects. To date, most HIN analysis still relies on hand-crafting meta-paths, which requires rich domain knowledgethat is extremely difficult to obtain in complex, large-scale,and schema-rich HINs. In this work, we present a novelframework, Meta-path Discovery with Reinforcement Learn-ing (MPDRL), to identify informative meta-paths from com-plex and large-scale HINs. To capture different semantic in-formation between objects, we propose a novel multi-hop rea-soning strategy in a reinforcement learning framework whichaims to infer the next promising relation that links a sourceentity to a target entity. To improve the efficiency, moreover,we develop a type context representation embedded approachto scale the RL framework to handle million-scale HINs.As multi-hop reasoning generates rich meta-paths with var-ious length, we further perform a meta-path induction step tosummarize the important meta-paths using Lowest CommonAncestor principle. Experimental results on two large-scaleHINs, Yago and NELL, validate our approach and demon-strate that our algorithm not only achieves superior perfor-mance in the link prediction task, but also identifies usefulmeta-paths that would have been ignored by human experts.

IntroductionThe complex interaction in real-world structured data, suchas social networks, biological networks, and knowledgegraphs, can be modeled as Heterogeneous Information Net-works (HINs) (Sun and Han 2013), where objects and edges

∗Corresponding author. This work was supported in part bythe National Natural Science Foundation of China under Grants61822113, the National Key R&D Program of China underGrant 2018YFA060550, the Natural Science Foundation of HubeiProvince under Grants 2018CFA050 and the Science and Technol-ogy Major Project of Hubei Province (Next-Generation AI Tech-nologies) under Grant 2019AEA170. G. H. is partly supportedby a Future Fellowship from the Australian Research Council(FT190100039).Copyright c© 2020, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Barack Obama

isPoliticianOf

BornIn isLocatedIn

HawaiiType: District

GraduatedFrom

Columbia UniversityType:University

isLoca

tedIn

the USAType: CountryType:Person,President,

Writer,Activitist ...

Type context

Figure 1: A HIN with multiple types and multiple relations.

are annotated with multiple types. Due to their capabil-ity to retain the rich and complex inter-dependency be-tween objects, HIN have recently attracted increasing re-search attention. However, the heterogeneity and complex-ity of a HIN also impose significant challenges when en-gaging in relation analysis among objects, particularly inlarge-scale networks. In order to cope with these restric-tions, the concept of meta-path were proposed to capturethe semantic relation between objects (Sun et al. 2009;2011).

Figure 1 illustrates examples meta-path between twoobjects, Barack Obama and the USA. To understand therelation between a person and a country, we can ex-ploit the following meta-paths: a) Person

isPoliticianOf−→Country, b) Person

BornIn−→ DistrictisLocatedIn−→ Country,

c) PersonGraduatedFrom−→ University

isLocatedIn−→ Country.Although they are different in length and involve differentintermediate objects, these meta-paths all help inference thesemantic relation of isCitizenOf(Person, Country). Thus, itis easy to deduce from Figure 1 that Barack Obama is a citi-zen of the USA. As meta-paths carry rich semantic informa-tion, they have been widely used in many data mining andnetwork analysis tasks (Shi et al. 2016).

Most existing meta-path studies or meta-path guided re-search stipulate to predefine enumerable sets of meta-paths,which largely depends on domain experts and is labour-

intensive, as finding interesting meta-paths in HINs is verychallenging. In general, this can be considered as a searchproblem (Lao and Cohen 2010). More specifically, given atype set T , a relation setR, and a fixed length l, the possiblemeta-paths are in the search space of size |T |× (|T |× |R|)l.Such a huge space can result in combinatorial explosionas the scale of |T |, |R| and l increase. Most existing ap-proaches based on human-defined meta-paths are feasibleonly on schema-simple HINs, e.g. DBLP (Ley 2002). Oncethe schema of a HIN is large and complex, it is infeasible topredefine sufficient meta-paths, particularly long ones, lead-ing to degenerated performance in many HIN analysis tasks.Therefore, it is imperative to develop an appropriate strategyto discover meaningful meta-paths, which remains a chal-lenge in this area.

Some previous research has aimed to automatically dis-cover meta-paths. Graph traversal methods, such as breadth-first search (BFS) (Kong et al. 2012), A* algorithm (Zhuet al. 2018), have been used to find the shortest path in aschema-simple HIN. But they are hard to deal with a com-plex and large-scale HIN. Meng et al. (2015) proposed agreedy algorithm named FSPG to discover the most relevantmeta-paths iteratively. However, FSPG operates in a fullydiscrete space, which makes it difficult to evaluate and com-pare similar objects and relations in a HIN.

Recently, multi-hop reasoning (Shen et al. 2018; Xiong,Hoang, and Wang 2017), has emerged as a promising ap-proach in inferring paths linking two objects in a knowledgegraph that also considered as a type of semantic rich HIN.This approach involves sampling the most promising rela-tion (edge) to extend a path from a source node to a targetnode. However, this approach has the following limitations:1) it is not an end-to-end reasoning approach and it heav-ily relies on the precomputed entity embedding, which istypically learned by a translation-based embedding method(e.g., TransE (Bordes et al. 2013)), so that the reasoning statecan be represented in a continuous space and be used in areinforcement learning (RL) agent. Employing an embed-ding learning approach before reasoning is not only time-consuming but also increases memory consumption, hinder-ing its ability to handle million-scale HINs. 2) These algo-rithms generate only bare paths linking two objects, withoutmeta-path induction (summarization) for downstream taskssuch as link prediction. 3) These approaches ignore differenttype information in an entity, which however is very impor-tant, and can provide rich interpretation of the relationshipsamong objects in HINs.

Based on the above observations, we here propose a novelreinforcement learning framework to automatically mine in-teresting meta-paths from large-scale HINs, named as Meta-path Discovery with Reinforcement Learning (MPDRL).Our goal is to employ a reinforcement learning agent to in-fer the most meaningful paths from a source object to a tar-get object and then to perform a further meta-path inductionstep to summarize the meta-paths from a large number ofgenerated paths. Our approach has three desirable proper-ties. First, our reasoning framework requires no pre-trainingand other supervision or fine-tuning information over priorknowledge. Second, the proposed approach has the built-

in flexibility to consider interpretable meta-paths of varyinglengths, which is important for inferring long-range meta-paths. Third, the agent can identify diverse objects by theirtype context, allowing the system to be run successfully onmillion-scale HINs. Our approach is applied on two HINswith complex schema, Yago and NELL, and has yielded richmeta-paths by the agent’s multi-hop reasoning. Moreover,the experimental results of link prediction demonstrate thatour approach out-performs the compared approaches.

Our contribution are three-fold:• We present an RL-based framework, MPDRL1, to mine

meta-paths from large-scale complex Heterogeneous In-formation Networks without the need of human effort.

• We propose type context representation embedded ap-proach in our RL approach, and design a policy networkto memorize or forget historical state, allowing our pro-posed algorithm to easily handle million-scale HINs.

• We conduct the link prediction task based on extractedmeta-paths on two large-scale and complex HINs, Yagoand NELL. Experimental results demonstrated that ouralgorithm not only reveals the synonymous meta-paths,but also outperforms approaches employing meta-pathsdesigned by human experts.

Related workMeta-path guided approaches To analyze and performdata mining tasks in HINs, Sun et al. (2009; 2011) pro-posed the concept of meta-paths to capture semantic infor-mation and express complex relevance of two objects. Sub-sequently, a number of papers have been published that in-volve meta-paths in many data mining tasks in HINs, such assimilarity measurement (Sun et al. 2011; Wang et al. 2016),link prediction (Shi et al. 2014; Cao, Kong, and Philip 2014),representation learning (Dong, Chawla, and Swami 2017;Cao, Kong, and Philip 2014), and so on.Discovery meta-paths in HINs Many meta-path guided ap-proaches suffer from a major drawback, i.e., they require do-main experts to manually predefine a series of meta-paths.Lao and Cohen (2010) proposed a method based on randomwalk to discover and leverage meta-paths of labeled rela-tional networks within a fixed length l. However, l is hardto set, as it varies among different datasets. A recent studyby Meng et al. (2015) developed a greedy algorithm namedFSPG to discover the most relevant meta-paths, further de-veloping a Greedytree data structure to find meta-paths iter-atively. Yang et al. (2018) pointed out that the path-findingprocess is a combination problem. Thus, they proposed asimilarity measurement model that can predefine meta-pathsby reinforcement learning. However, this approach is onlyeffective on schema-simple HINs. Shi and Weninger (2014)discussed meta-path discovery at various levels of type-granularity in a large complex HIN and proposed a generalframework to mine meta-paths from complex HINs usingadaptations from classical knowledge discovery techniques.Multi-hop reasoning in graphs Many multi-hop reasoningapproaches (Lao, Mitchell, and Cohen 2011) based on ran-

1https://github.com/mxz12119/MPDRL

dom walk have been proposed to capture more complex rea-soning patterns in a knowledge base. However, the reasoningpaths gathered by performing random walks are independentof the type of objects. Recently, deep reinforcement learn-ing has achieved great success in many artificial intelligenceproblems (Mnih et al. 2015). Deep reinforcement learningallows a policy function to be learned from graph-based datafor multi-hop reasoning. Xiong et al. (2017) investigated themulti-hop reasoning with RL on knowledge bases. However,it also ignores the types of objects. Das et al. (2018) andShen et al. (2018) further studied reinforcement learning forknowledge base completion.

Definitions and annotationsDefinition 1 (Heterogeneous Information Network) AHIN is an information network with multiple types of nodesand edges, which defined as a graph G = (V, E). V denotesan object set with a type mapping function: φ : V → T ,where T denotes a type set. E denotes an edge set witha relation mapping function:ϕ : E → R , where R is arelation set. A node denotes to an object v ∈ V . An edgedescribes the relation r ∈ R between two objects.

Example: Figure 1 illustrates a HIN, includingT=Person, President, Writer, Activist, University, Dis-trict, Country, R =isPoliticianOf, BornIn, isLocatedIn,GraduatedFrom.Definition 2 (Meta-path) Given a HING, a meta-path Ω isdefined as a sequence in the form of ω1

r1−→ ω2r2−→ · · · rn−→

ωn+1, where ωi denotes the type of the object, ωi ∈ T , andri ∈ R denotes the relation.

Example: From the path instanceBarack Obama

isPoliticianOf−→ the USA, we can derivethe meta-path: Person

isPoliticianOf−→ Country. A meta-pathcan measure the closeness between objects and guidemodeling the similarity computation.

MethodologyOverview of MPDRL A schematic overview of our pro-posed approach is presented in Figure 2. MPDRL aims todiscover meta-paths from a HIN via multi-hop reasoning be-tween objects; this process consists of two steps.• Multi-hop Reasoning with RL for Path Instance Gen-

eration: The RL agent carries out multi-hop reasoningand generates various path instances. The agent starts atthe source objects, Obama and Trump. It then observesthe current state and decides to move to the next objectthat has the highest probability of reaching the target ob-ject via the policy network. The agent alternates betweensuch observation and movement until reaching the targetobject or the maximum length, thus generating a trajec-tory. The trajectory of an episode is a path instance for thetopic isCitizenOf. The reasoning process can be formal-ized as a Markov Decision Process (MDP), and the agentcan be trained by reinforcement algorithms.

• Meta-path Induction from Path Instances: We furtherrefine and summarize these path instances by searching

Table 1: Annotation tableSymbol Meaning Symbol MeaningV Object set v ObjectR Relation set r RelationT Type set ω Typev0 Source object vd Target objectG HIN E Edge setS State set s StateA Action set a ActionR Reward γ Reward factorπ Policy function θ Parametersτ Agent trajectory ht History vectorz Update gate q Reset gate Hadamard product f Full-connected layerσ Sigmoid function G DAGl Maximum length Ω Meta-path

We denote a vector using a bold letter, e.g. v corresponding to v

for the Lowest Common Ancestor (LCA) in type directedacyclic graph (DAG) so as to generate various meta-paths.

Multi-hop Reasoning with RL for Path InstanceGenerationIn the below, we describe in detail our multi-hop reasoningapproach with reinforcement learning.

Reinforcement learning architecture in HINs Rein-forcement learning follows a Markov Decision Process(MDP) formulation, which an agent learns from the inter-actions with the environment derived from a HIN throughsequential exploration and exploitation. In a HIN, we for-malize RL with the quartuple (S,A,P, R), whose elementsare elaborated below.

States. The state si at step i is defined as a tuple(vi−1, ri, vi, vd), where vi ∈ V is the current object, vi−1 isthe last object, ri denotes the relation between vi and vi−1,and vd is the target object. si ∈ S, where the state space Sconsists of all valid combination in V × R × V × V . Givena path-finding pair (v0, vd), the starting state is representedas (′ST′, ′ST′, v0, vd), where a start state indicator ‘ST’ wasadded to indicate the initial state of the agent. The terminalstate is (vt−1, rt, vd, vd). Each state captures the agent’s po-sition in the HIN. After taking an action, the agent will moveto the next state.

Actions. The action space Asi for a state si =(vi, ri+1, vi+1, vd) is the set of outgoing edges of the cur-rent object vi in the HIN, where Asi = (r, v)|(vi, r, v) ∈G, v /∈ v0, v1, · · · , vd. Beginning with the source objectv0, the agent uses the policy network to predict the mostpromising path, then extends its path at each step until itreaches the target object vd.

Transition. The transition P is the state transition proba-bility used to identify the probability distribution of the nextstate, which is defined as a map function:P : S × A → S.The policy network encodes the current state to output aprobability distribution P(si+1|si, ai), where ai ∈ Asi . Inour RL framework, the transition strategy involves selectingthe action with maximum probability in Asi .

Obama(Person, President,Writer)

Born

In

Honululu(State, District)

LocatedIn

the USA(Conutry)

Trump(Person, President)

Wor

kAt

White House(Architecture, Workplace)

LocatedIn

Located

InWashington(State, District)

IsCitizenOf(X,Y)

Obama BornInHonululu

LocatedInthe USA

TrumpWorkAt

White HouseLocatedIn

the USAWashingtonLocatedIn

Step 1: Multi-hop reasoning with RL

for path instance generation

Person BornInState

LocatedInCountry

PersonWorkAt

WorkplaceLocatedIn

CountryDistrictLocatedIn

Obama type directed asyclic graph (DAG)

Lowest common acestor

Step 2: Meta path induction from path instances

The agent

Policy network architecture

Target object

Source object

Unselected action

σ σ

tanh

softmax

+

reset gate forget gate

State Vector: st

ht ht+1

History vector

p(a1) p(a

2) p(a

3)

Nexthistory vector

a1

a2

a3

President Writer Activist

Politician

Person

Path instances

Meta-paths

Figure 2: Overview of MPDRL. MPDRL consists of two step: 1)Multi-hop reasoning with RL for path instance generation inthe HIN, 2) Meta-path induction from path instances. The left gray box is the architecture of our policy network.

Rewards. Given a pair (v0, vd), if the agent reaches thetarget object, i.e. vi = vd, the agent’s trajectory is labeled asa successful finding. The reward for each hop is defined asfollows:

R(τi) =

1 · γi, vi = vd0, otherwise

, (1)

where γ > 0 is the reward factor and τi is the i-th step ofthe trajectory. The reward factor is made flexible to controlthe trade-off between the long-term and short-term rewardfeedback. If γ < 1, the agent is likely to pick a short finding-path. If γ > 1, the agent will prefer a longer path.

It should be noted here that the positive rewards usuallysuffer from the sparsity problem, being received only reach-ing at the end of the correct object. To tackle this rewardsparsity issue, we augment another action option, marked as‘OP’, i.e. Asi ← Asi ∪ ′OP ′. ‘OP’ means that the agentfails to reach the correct object, as a result it stops and re-ceives a negative reward. This is especially helpful for pre-venting the agent from getting stuck in intermediate states,thereby accelerating convergence of the training.

Policy network Due to the large search space in the com-plex HIN, we design a model-free policy network π(s,A) =P (a|s; θ) based on deep learning to model the RL agent ina continuous space, where θ is the neural network parame-ter. Considering that the agent need to do sequential deci-sion making, we introduce a history vector ht to keep his-tory information in order to better guide the agent. Givena trajectory τ at step t, the history vector is determinedby the last history ht−1 and the last state st−1, wherest−1 = [vt−1; rt−1;vd], while v, r ∈ Rd,

ht = H(ht−1, st−1). (2)

Note that Eq. (2) is a recursive formula. To encode the his-tory vector, we introduce a gating mechanism which is sim-ilar to GRU (Cho et al. 2014) shown in Figure 2 to controlthe memorization or forgetting of the history information .

H is defined as follows,

zt = fz(st−1,ht−1)

qt = fq(st−1,ht−1)

ht = fh(st−1,ht−1 zt)

ht = q ht + (1− q) ht−1,

(3)

where zt ∈ Rd is the update gate, qt ∈ Rd is the re-set gate, is the Hadamard product, ht denotes the hiddenlayer, [; ] denotes the concatenation operation, and f is thefull-connected layer with an activation function. Based onthe GRU-like recurrent cell architecture, the history vector isupdated according to the agents movement dynamics. More-over, unlike the classic GRU cell, which uses ht to predict y,we find that ht works better in the HIN environment. There-fore, the distribution of y, i.e. all possible actions, is definedas follows,

a = softmax(RELU(WUht + Wvst−1 + b)), (4)

where a ∈ R|A| denotes the probability distribution for allactions. As a result, the agent picks the action with the max-imum probability then moves to the next state.

Type context learning for object representation Due tothe large scale and complicated semantic context of the ob-jects in HINs, it is challenging to model each object inthe states. One solution is to use pre-training embeddingsor content information to represent objects and relations.However, before obtaining the pre-training, most embeddinglearning approaches in HINs still require well-defined meta-paths (Fu, Lee, and Lei 2017). The other issue would be toinitialize the objects and relations with a finite dimensionalvector. However, it vastly increase the required memory stor-age when the number of objects is million-scale.

Generally, an object is associated with a type set,e.g. Obama:Writer, President, Activist, Person ,Trump:President, Businessman, Person, while Obamais the type-sibling of Trump because they share the same

type President. The type information not only explicitlyexpresses the context of an object in a HIN, but also revealsthe inner relevance among objects. Therefore, learning typecontext allows the agent to efficiently identify the context ofan object in a HIN. Some recent works have placed the typeinformation into knowledge graphs or information networksand thereby achieved improved results in experiments (Xie,Liu, and Sun 2016). Based on this observation, we propose asimple but effective approach to modeling objects in states,which contributes to enables our RL approach to deal withlarge-scale HINs. For an object v with a type set Tv ⊂ T ,the type context representation of v is defined as follows,

v =1

|Tv|∑ωi∈Tv

ωi, (5)

where ωi ∈ Rd is the i-th type vector in Tv . To summarize,the position of an object is determined by its type context.

Meta-path Induction from Path InstancesIn this section, we discuss how to generate meta-paths froma path instance. The proposed RL approach can train anagent capable of multi-hop reasoning to automatically pickpath instances between object pairs (v0, vd). A path in-stance, i.e. a trajectory, is in the form v1

r1−→ v2r2−→ · · · rn−→

vn+1. The previous works assume that the objects in a HINonly are only of one type (Shi and Weninger 2014). As aresult, the meta-paths are generated via simple replacement.However, in a large-scale HIN, an object normally has mul-tiple types. Therefore, simple replacement will bring abouta number of low-relevance meta-paths.

Consequently, the reduction of the type set is necessaryfor assigning a type to an object. Generally, the type struc-ture in a HIN is organized in the form of a directed acyclicgraph (DAG) GT = (T,E), where T is a set including wholetypes, while E denotes the directed link between two types.The edges of the DAG are assumed to point from parents tochildren, e.g. the type president is the subordinate type ofpeople. Step 2 in Figure 2 shows a toy example of a typeDAG. To assign a type to an object that is associated witha type set Tv , we choose the Lowest Common Ancestors(LCA) of GTv . Specifically, we employ a naive LCA algo-rithm in (Bender et al. 2005) to find nearest types that closeto root type so as to obtain the key types.

In the naive LCA algorithm, it first traverses the DAGin a breadth-first manner and assigns depth to every node,then simply walks up the DAG to find the ancestors of thequeried nodes, from which it chooses the node of greatestdepth. The input of the naive LCA is a DAG GTv , while theoutput is a set containing several key types. Finally, we canidentify meta-paths through identifying all valid combina-tions among the output types and relations.

Optimization and trainingThe objective function of the policy network is to maximizethe expectation of long-term accumulated rewards,

J(θ) = Eτ∼pθ(τ)[R(τ)], (6)

where τ denotes an N -length trajectory generated from theunderlying distribution pθ(τ), while R(τ) is the rewardfunction for τ . To optimize the above objective function, weuse the policy gradient to maximize J(θ). The gradient ofJ(θ) is as follows,

∂J(θ)

∂θ= Eτ∼pθ(τ)[(

∂

∂θlog pθ(τ))R(τ)]

= Eτ∼pθ(τ)[(N∑i=1

∂

∂θlog πθ(ai|si−1, ai−1))R(τ)],

(7)

where pθ(τ) can be decomposed to the process ofπθ(ai|si−1, ai−1). In order to estimate the πθ, moreover, wecan compute an approximation of J(θ) by averaging the ac-cumulated rewards on a series of trajectories generated fromthe interaction between the agent and the environment byREINFORCE (Williams 1992),

∂J(θ)

∂θ≈ 1

K

K∑j=1

[

Nj∑τj ,i=1

∂

∂θlog πθ(a

ji |s

ji−1, a

ji−1)γi]. (8)

To improve the training efficiency, we limit the maximumlength of a trajectory to l. When the agent reaches l, the find-ing process is stopped and a negative reward is returned. Weemploy ADAM Optimizer (Kingma and Ba 2014) to opti-mize the policy network. The parameter θ is updated everyk episodes.

ExperimentsExperimental settingsFor the purpose of validating the efficiency and effective-ness of our approach MPDRL, we perform link predictionbased on the generated meta-paths. Link prediction providesus with a measurable and objective way of evaluating ourdiscovered meta-paths (Meng et al. 2015).Datasets We perform experiments on two online HINs aswell as knowledge bases, Yago and (Suchanek, Kasneci, andWeikum 2007) and NELL (Mitchell et al. 2018), which con-tain more complex type mapping and relation mapping com-paring to schema-simple HINs.• Yago is a large-scale knowledge base derived from Wiki-

data, WordNet, and GeoNames (Suchanek, Kasneci, andWeikum 2007). We here use the ‘CORE facts’ portion ofYago2, which consists of 12 million facts, 4 million enti-ties, 0.8 million types, and 38 relations.

• NELL is a knowledge base that extracted from the Webtext of more than one billion documents. We here use the1,115-th dump of NELL3, which consists of 2.7 millionfacts, 2 million entities, 758 types, and 833 relations. Wethen remove the triples with the relation generalizationsas this relation describes redundant object type informa-tion that is already included.

Baselines: HeteSim (Shi et al. 2014; 2012), FSPG (Meng etal. 2015), PCRW (Lao and Cohen 2010), AutoPath (Yang et

2https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/

3http://rtw.ml.cmu.edu/rtw/

isCitizenOf (Yago) DiedIn (Yago) GraduatedFrom (Yago)a) b) c)

WorksFor(NELL) CompetesWith (NELL) PlaysAgainst (NELL)d) e) f)

0.956

0.660

0.774

0.506

0.736

0.704

0.761

AUCLegends

MPDRL 0.915

0.649

0.767

0.556

0.752

0.704

0.793

AUCLegends

MPDRL 0.884

0.616

0.729

0.643

0.835

0.616

0.689

AUCLegends

MPDRL

0.967

0.725

0.876

0.622

0.842

0.739

0.763

AUCLegends

MPDRL

DeepPath 0.748

0.876

0.670

0.784

0.648

0.776

0.689

0.702

AUCLegends

MPDRL

DeepPath 0.701

0.785

0.591

0.793

0.664

0.701

0.578

0.699

AUCLegends

MPDRL

DeepPath 0.649

Figure 3: ROC and AUC for link prediction on six topics: a)isCitizenOf(Yago), b) DiedIn(Yago), c) GraduatedFrom(Yago), d)WorksFor(NELL), e)CompetesWith(NELL), f) PlaysAgainst(NELL).

al. 2018), DeepWalk (Perozzi, Al-Rfou, and Skiena 2014),Metapath2vec (Dong, Chawla, and Swami 2017), DeepPath(Xiong, Hoang, and Wang 2017).

Note that the embedding-based approaches, i.e. DeepWalkand Metapath2vec, output object vectors. For performinglink prediction on a pair, we use Hadamard product of thetwo object embeddings as the input of the SVM classifier.

In the RL agent training stage, the key hyper-parametersettings are as follows: maximum length l is fixed to 5, learn-ing rate α is 0.005, reward factor γ is 1.5, and update fre-quency k is 50. Vector dimension d is 100.

Link prediction resultsThree relations for each dataset, isCitizenOf, DiedIn,GraduatedFrom in Yago and WorksFor, CompetesWith,PlaysAgainst in NELL were evaluated for the link predic-tion task. For a certain relation task, e.g. isCitizenOf, we ex-pect to obtain various meta-paths from the RL agent. Thefacts with such relation were removed from the HIN. A sam-ple set including positive pairs and negative pairs was thenconstructed based on the removed facts. Positive pairs weredirectly derived from the removed facts. Each negative pairwas generated by replacing the true target object vd witha fake one v′d in each pair (v0, vd), where v′d has the sametypes as vd. Finally, we adopt a linear regression model (Laoand Cohen 2010) with L1 regularization to perform link pre-diction using the binary meta-path features in order to pre-dict whether the relation isCitizenOf(vα, vβ) exists for atest pair (vα, vβ). As for the binary meta-path feature, if a

meta-path Ω connects the pair (vα, vβ), the value of binarymeta-path feature is 1 otherwise this value is 0.

The Receiver Operating Characteristic (ROC) curves andthe Area under the curve (AUC) are presented in Figure3. As the plots show, the classifiers trained by meta-pathsgenerated using our approach exhibit superior performancein all six relations. Although the existing embedding-basedmethods, such as DeepWalk and Metapath2Vec, made sig-nificant progress on representation learning on HINs, wefound that once the HIN becomes complex, the heterogene-ity causes performance drop, as evidenced by DeepWalk.Moreover, the performance of Metapath2Vec is also hin-dered by the limited hand-crafting meta-paths. Addition-ally, when two objects are connected by a longer path, as inthe case of GraduatedFrom, such models are worse. FSPG,as well as AutoPath considers various meta-paths gener-ated by themselves and thus perform better than PCRW andPCRW, indicating that more effective meta-paths results in abetter performance. Furthermore, the superior performanceachieved by our RL agents demonstrates its ability to gener-ates many effective meta-paths.

Meta-paths AnalysisFor each relation, the trained Linear regression with L1 reg-ularization, also known as Lasso regression, can performsubset selection and return the coefficients of meta-paths.These coefficients imply the weight of meta-paths. As a re-sult, we present the five most relevant meta-paths generatedby our RL agent in Table 2 and 3. As the results show, our RL

Table 2: Example meta-paths found by our RL model fromfour topics in Yago and NELL.

0.604

0.118

0.066

0.080

0.013

Relation Meta-path Weight

isCitizenOf(Yago)

GraduatedFrom(Yago)

CompetesWith(NELL)

PlaysAgainst(NELL)

0.561

0.232

0.101

0.060

0.045

Person BorIn CountryPerson BorIn District

LocatedInCountry

DiedInPerson CountryPlaysForPerson Club CountryLocatedIn

0.734

0.112

0.069

0.045

0.035

WorksAtPerson UniversityWorksAtPerson Institution

InfluencesPerson Person WorksAt UniversityhasAcademicAdvisorPerson Person WorksAt University

Person MarriedTo Person WorksAt University

0.632

0.180

0.095

0.051

0.041

HasOfficeCompany CityHeadQuarteredInCompany City

HasOfficeCompany City HasCompany Company

HasOfficeCompany City HasCompany Company

Company City CityHeadQuarteredIn LocatedIn

HasCompanyHasOfficeCompany City Company LocatedIn City

CompetesWithSportsTeam SportsTeamKnownAsSportsTeam SportsTeam

PlaysInSportsTeam SportsTeam GameTeam SportsTeam

PlaysInSportsTeam SportsLegue SubpartOf SportsTeam

CompetesWithSportsTeam SportsTeam KnownAs SportsTeam

agent is able to find various meta-paths in a HIN. Taking theisCitizenOf relation as an example, the five meta-paths allobviously indicate the strong relevance between the ‘Person’objects and ‘Country’ objects, indicating that our approachcan automatically find various meta-paths without any hand-crafting process.Interesting Meta-path discovery by our method Existinghand-crafting methods may fail to induce meta-paths likePerson

Influences−→ PersonWorksat−→ University, which is an

strong indicator that a person is graduated from a university.Furthermore, our RL agent also find long-length meta-paths,such as the fifth meta-path of CompetesWith in NELL. Inter-estingly, we also found that our approach is capable of find-ing the hyponym relation. Taking CompetesWith as an ex-ample, the relation HasOffice that appears in the meta-pathCompany

HasOffice−→ City is a subordinate relation of Head-QuarteredIn.

EffectivenessAs discussed above, the large search space is the major chal-lenge for reasoning in a complex HIN. We deal with thisproblem by designing a reinforcement learning framework.To demonstrate the efficiency of our approach, we presentthe agent’s average reasoning success ratio and average re-ward within ten episodes after different numbers of trainingepisodes. As Figure 4a shows, under γ = 1.5, we can ob-serve that the average reasoning success ratio of WorksFortends to saturate after 300 episodes, and that of isCitizenofdoes so after 500 episodes. These results indicate that our RLagent can learn the reasoning path from a given pair (v0, vd).Even if the agent has not seen the objects before, it still finda promising path.Type context learning: Moreover, to facilitate the under-standing of how the type context regulates our framework,

Figure 4: RL model analysis: a) Episode versus success ra-tio. The raw data are marked by light color dot. The curvesare obtained from applying an average filter on the corre-sponding raw data. Dash line means the models are free oftype context learning (TCL) module. b) Average path lengthversus reward factor.

we remove the type context representation module from ourframework. Thereby, the object embeddings are initializedby a trainable embedding layer. As the green and yellowcurves show, the increase of reasoning success rate withouttype context learning is slow, indicating a poor convergencein the training procedure. Thus, the performance gap revealsthat type context learning is a key element of the superiorityof our approach. The type context representation learns thetype information in the HIN, meaning that the object rep-resentation can be rapidly and effective learned by sharingtype representation.Reward factor γ: To analyze how the reward factor γ influ-ences the properties of our RL agent, we record the averagereasoning path length versus reward factor during the infer-ring procedure. As Figure 4b shows, as the reward factorincreases, the average reasoning path length rises as well, in-dicating γ flexibly controls the RL agent’s exploration pref-erence. Moreover, while a larger reward factor enables theagent to find longer paths, we also found that training be-comes very slow when the reward factor < 0.8, resultingfrom gradient vanishing caused by the low reward feedback.

ConclusionsIn this paper, we propose a RL framework that can automati-cally mine interesting meta-paths without any human super-vision in large-scale Heterogeneous Information Networks.More specifically, we exploit the type context representa-tion learning to scale up reinforcement learning to million-scale HINs. Unlike previous path-finding models that areoperated in a discrete space, our approach allows the agentto operate multi-hop reasoning in a continuous space so asto control the distribution of the found meta-paths, therebysignificantly reducing the size of large search space. Theseeffective meta-paths can also be used in downstream HINanalysis tasks. We conducted the meta-paths mining on twoHINs, Yago and NELL, yielding reasonable meta-paths onsix topics. These meta-paths were further used to performlink prediction task to evaluate our model. The experimentalresults show that the classifiers trained by these meta-pathsgenerally out-perform the other baselines.

ReferencesBender, M. A.; Farach-Colton, M.; Pemmasani, G.; Skiena,S.; and Sumazin, P. 2005. Lowest common ancestors intrees and directed acyclic graphs. Journal of Algorithms57(2):75–94.Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; andYakhnenko, O. 2013. Translating embeddings for modelingmulti-relational data. In NeurIPS, 2787–2795.Cao, B.; Kong, X.; and Philip, S. Y. 2014. Collective predic-tion of multiple types of links in heterogeneous informationnetworks. In ICDM, 50–59. IEEE.Cho, K.; Van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.;Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learningphrase representations using rnn encoder-decoder for statis-tical machine translation. arXiv preprint arXiv:1406.1078.Das, R.; Dhuliawala, S.; Zaheer, M.; Vilnis, L.; Durugkar, I.;Krishnamurthy, A.; Smola, A.; and McCallum, A. 2018. Gofor a walk and arrive at the answer: Reasoning over paths inknowledge bases using reinforcement learning. In ICLR.Dong, Y.; Chawla, N. V.; and Swami, A. 2017. metap-ath2vec: Scalable representation learning for heterogeneousnetworks. In SIGKDD, 135–144. ACM.Fu, T.-y.; Lee, W.-C.; and Lei, Z. 2017. Hin2vec: Exploremeta-paths in heterogeneous information networks for rep-resentation learning. In CIKM, 1797–1806. ACM.Kingma, D. P., and Ba, J. 2014. Adam: A method forstochastic optimization. ICLR.Kong, X.; Yu, P. S.; Ding, Y.; and Wild, D. J. 2012. Metapath-based collective classification in heterogeneous infor-mation networks. In CIKM, 1567–1571. ACM.Lao, N., and Cohen, W. W. 2010. Relational retrieval usinga combination of path-constrained random walks. Machinelearning 81(1):53–67.Lao, N.; Mitchell, T.; and Cohen, W. W. 2011. Random walkinference and learning in a large scale knowledge base. InEMNLP, 529–539.Ley, M. 2002. The dblp computer science bibliography:Evolution, research issues, perspectives. In InternationalSymposium on String Processing and Information Retrieval,1–10. Springer.Meng, C.; Cheng, R.; Maniu, S.; Senellart, P.; and Zhang,W. 2015. Discovering meta-paths in large heterogeneousinformation networks. In WWW, 754–764. InternationalWorld Wide Web Conferences Steering Committee.Mitchell, T.; Cohen, W.; Hruschka, E.; Talukdar, P.; Yang,B.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.;Kisiel, B.; et al. 2018. Never-ending learning. Commu-nications of the ACM 61(5):103–115.Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Ve-ness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.;Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. Nature518(7540):529.Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk:

Online learning of social representations. In SIGKDD, 701–710. ACM.Shen, Y.; Chen, J.; Huang, P.-S.; Guo, Y.; and Gao, J. 2018.M-walk: Learning to walk over graphs using monte carlotree search. In NeurIPS, 6786–6797.Shi, B., and Weninger, T. 2014. Mining interesting meta-paths from complex heterogeneous information networks. InICDM Workshop, 488–495. IEEE.Shi, C.; Kong, X.; Yu, P. S.; Xie, S.; and Wu, B. 2012. Rel-evance search in heterogeneous networks. In EDBT, 180–191. ACM.Shi, C.; Kong, X.; Huang, Y.; Philip, S. Y.; and Wu, B. 2014.Hetesim: A general framework for relevance measure in het-erogeneous networks. IEEE Transactions on Knowledgeand Data Engineering 26(10):2479–2492.Shi, C.; Li, Y.; Zhang, J.; Sun, Y.; and Philip, S. Y. 2016.A survey of heterogeneous information network analysis.IEEE Transactions on Knowledge and Data Engineering29(1):17–37.Suchanek, F. M.; Kasneci, G.; and Weikum, G. 2007. Yago:a core of semantic knowledge. In WWW, 697–706.Sun, Y., and Han, J. 2013. Mining heterogeneous informa-tion networks: a structural analysis approach. Acm SigkddExplorations Newsletter 14(2):20–28.Sun, Y.; Han, J.; Zhao, P.; Yin, Z.; Cheng, H.; and Wu, T.2009. Rankclus: integrating clustering with ranking for het-erogeneous information network analysis. In EDBT, 565–576. ACM.Sun, Y.; Han, J.; Yan, X.; Yu, P. S.; and Wu, T. 2011. Path-sim: Meta path-based top-k similarity search in heteroge-neous information networks. Proceedings of the VLDB En-dowment 4(11):992–1003.Wang, C.; Sun, Y.; Song, Y.; Han, J.; Song, Y.; Wang, L.;and Zhang, M. 2016. Relsim: relation similarity search inschema-rich heterogeneous information networks. In ICDM,621–629. SIAM.Williams, R. J. 1992. Simple statistical gradient-followingalgorithms for connectionist reinforcement learning. Ma-chine learning 8(3-4):229–256.Xie, R.; Liu, Z.; and Sun, M. 2016. Representation learn-ing of knowledge graphs with hierarchical types. In IJCAI,2965–2971.Xiong, W.; Hoang, T.; and Wang, W. Y. 2017. Deeppath:A reinforcement learning method for knowledge graph rea-soning. In EMNLP, 564–573.Yang, C.; Liu, M.; He, F.; Zhang, X.; Peng, J.; and Han,J. 2018. Similarity modeling on heterogeneous networksvia automatic path discovery. In ECML-PKDD, 37–54.Springer.Zhu, Z.; Cheng, R.; Do, L.; Huang, Z.; and Zhang, H. 2018.Evaluating top-k meta path queries on large heterogeneousinformation networks. In ICDM, 1470–1475. IEEE.

Documents

Reinforcement Learning based Meta-path Discovery in Large ...Reinforcement Learning based Meta-path Discovery in Large-scale Heterogeneous Information Networks Guojia Wan,1,2 Bo Du,1,2