9
Small- and large-scale network structure of live fish movements in Scotland Darren Michael Green a, *, Alison Gregory b , Lorna Ann Munro b a Institute of Aquaculture, University of Stirling, Bridge of Allan, Stirling, Stirlingshire FK9 4LA, UK b Marine Laboratory, PO Box 101, 375 Victoria Road, Aberdeen AB11 9DB, UK 1. Introduction Scotland is the third largest producer of Atlantic salmon Salmo salar, with an ex-farm value of £400 million per annum (http://www.scottishsalmon.co.uk/ economics/economics.asp) and 143 000 tonnes produced in 2007 (http://www.marlab.ac.uk/FRS.Web/Uploads/ Documents/Scottish%20Fish%20Farm%20Production%20 Survey%202006.pdf). This dominates the other commer- cially produced species, which include rainbow trout Oncorhynchus mykiss and brown trout Salmo trutta, with some 7000 tonnes of rainbow trout produced in 2006 (ibid.). As with other farming industries, long-distance spread of disease via movements of live or dead fish is of concern. Well boats have been implicated in facilitating the spread of infectious salmon anaemia (ISA) in Scotland (Murray et al., 2002), and road haulage of live rainbow trout for the spread of bacterial kidney disease (BKD) UK wide in 2005 (http://www.marlab.ac.uk/FRS.Web/ Uploads/Documents/1407.pdf). Furthermore, there is ongoing concern over the potential for introduction of Gyrodactylus salaris into countries which are currently free from it. Over a variety of industries, similar epidemic problems in the UK and elsewhere have led to the recording of live animal movements for a number of species including large mammals and fish, allowing their study (Thrush and Peeler, 2006; Kao et al., 2007). The epidemiological risk posed by movements between sites lends itself well to a network representation, as has been used for a number of species including pigs (Bigras- Poulin et al., 2007; Ribbens et al., 2008), sheep (Webb, Preventive Veterinary Medicine 91 (2009) 261–269 ARTICLE INFO Article history: Received 11 February 2009 Received in revised form 28 May 2009 Accepted 29 May 2009 Keywords: Aquaculture Network Graph Transmission ABSTRACT Networks are increasingly being used as an epidemiological tool for studying the potential for disease transmission through animal movements in farming industries. We analysed the network of live fish movements for commercial salmonids in Scotland in 2003. This network was found to have a mixture of features both aiding and hindering disease transmission, hindered by being fragmented, with comparatively low mean number of connections (2.83), and low correlation between inward and outward connections (0.12), with moderate variance in these numbers (coefficients of dispersion of 0.99 and 3.12 for in and out, respectively); but aided by low levels of clustering (0.060) and some non-random mixing (coefficient of assortativity of 0.16). Estimated inter-site basic reproduction number R 0 did not exceed 2.4 at high transmission rate. The network was strongly organised into communities, resulting in a high modularity index (0.82). Arc (directed connection) removal indicated that effective surveillance of a small number of connections may facilitate a large reduction in the potential for disease spread within the industry. Useful criteria for identification of these important arcs included degree- and betweenness-based measures that could in future prove useful for prioritising surveillance. ß 2009 Elsevier B.V. All rights reserved. * Corresponding author. Tel.: +44 1786 46 7872; fax: +44 (0) 1786 472133. E-mail address: [email protected] (D.M. Green). Contents lists available at ScienceDirect Preventive Veterinary Medicine journal homepage: www.elsevier.com/locate/prevetmed 0167-5877/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.prevetmed.2009.05.031

Small- and large-scale network structure of live fish movements in Scotland

Embed Size (px)

Citation preview

Page 1: Small- and large-scale network structure of live fish movements in Scotland

Preventive Veterinary Medicine 91 (2009) 261–269

Small- and large-scale network structure of live fish movementsin Scotland

Darren Michael Green a,*, Alison Gregory b, Lorna Ann Munro b

a Institute of Aquaculture, University of Stirling, Bridge of Allan, Stirling, Stirlingshire FK9 4LA, UKb Marine Laboratory, PO Box 101, 375 Victoria Road, Aberdeen AB11 9DB, UK

A R T I C L E I N F O

Article history:

Received 11 February 2009

Received in revised form 28 May 2009

Accepted 29 May 2009

Keywords:

Aquaculture

Network

Graph

Transmission

A B S T R A C T

Networks are increasingly being used as an epidemiological tool for studying the potential

for disease transmission through animal movements in farming industries. We analysed

the network of live fish movements for commercial salmonids in Scotland in 2003. This

network was found to have a mixture of features both aiding and hindering disease

transmission, hindered by being fragmented, with comparatively low mean number of

connections (2.83), and low correlation between inward and outward connections (0.12),

with moderate variance in these numbers (coefficients of dispersion of 0.99 and 3.12 for in

and out, respectively); but aided by low levels of clustering (0.060) and some non-random

mixing (coefficient of assortativity of 0.16). Estimated inter-site basic reproduction

number R0 did not exceed 2.4 at high transmission rate. The network was strongly

organised into communities, resulting in a high modularity index (0.82). Arc (directed

connection) removal indicated that effective surveillance of a small number of connections

may facilitate a large reduction in the potential for disease spread within the industry.

Useful criteria for identification of these important arcs included degree- and

betweenness-based measures that could in future prove useful for prioritising

surveillance.

� 2009 Elsevier B.V. All rights reserved.

Contents lists available at ScienceDirect

Preventive Veterinary Medicine

journal homepage: www.elsevier.com/locate/prevetmed

1. Introduction

Scotland is the third largest producer of Atlanticsalmon Salmo salar, with an ex-farm value of £400million per annum (http://www.scottishsalmon.co.uk/economics/economics.asp) and 143 000 tonnes producedin 2007 (http://www.marlab.ac.uk/FRS.Web/Uploads/Documents/Scottish%20Fish%20Farm%20Production%20Survey%202006.pdf). This dominates the other commer-cially produced species, which include rainbow troutOncorhynchus mykiss and brown trout Salmo trutta, withsome 7000 tonnes of rainbow trout produced in 2006(ibid.). As with other farming industries, long-distance

* Corresponding author. Tel.: +44 1786 46 7872;

fax: +44 (0) 1786 472133.

E-mail address: [email protected] (D.M. Green).

0167-5877/$ – see front matter � 2009 Elsevier B.V. All rights reserved.

doi:10.1016/j.prevetmed.2009.05.031

spread of disease via movements of live or dead fish is ofconcern. Well boats have been implicated in facilitatingthe spread of infectious salmon anaemia (ISA) in Scotland(Murray et al., 2002), and road haulage of live rainbowtrout for the spread of bacterial kidney disease (BKD)UK wide in 2005 (http://www.marlab.ac.uk/FRS.Web/Uploads/Documents/1407.pdf). Furthermore, there isongoing concern over the potential for introduction ofGyrodactylus salaris into countries which are currentlyfree from it. Over a variety of industries, similar epidemicproblems in the UK and elsewhere have led to therecording of live animal movements for a number ofspecies including large mammals and fish, allowing theirstudy (Thrush and Peeler, 2006; Kao et al., 2007).

The epidemiological risk posed by movements betweensites lends itself well to a network representation, as hasbeen used for a number of species including pigs (Bigras-Poulin et al., 2007; Ribbens et al., 2008), sheep (Webb,

Page 2: Small- and large-scale network structure of live fish movements in Scotland

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269262

2005; Kiss et al., 2006), chickens (Truscott et al., 2007; Dentet al., 2008) and cattle (Christley et al., 2005a; Green et al.,2008). Networks are a powerful epidemiological toolallowing one to investigate potential for disease spread,the structure of the industry and its implications forbiosecurity, both in terms of the risk of a site beinginfected, and the risk posed by a site, should it becomeinfected. A network consists of nodes – here, theepidemiological unit, or site – connected by bidirectionaledges or unidirectional arcs – here representing potentiallyinfectious contact between sites due to movement.Potentially infectious contact does not imply infection,but it is nevertheless a prerequisite for it. Sites may besources of infection (posing a risk of onward spread), sinksfor infection (at risk from disease spread), or both. Linksbetween sites differ in their contribution to potentialepidemics, in a manner that is not easy to predict byexamining the behaviour of individual nodes, but only byexamining the network as a whole.

Fish farms in Scotland are required by legislation (TheRegistration of Fish Farming and Shellfish FarmingBusinesses Order 1985) to be registered, keep records ofall live fish movements, and submit these records to theFish Health Inspectorate. Furthermore, under the new EUdirective 2006/88/EC (implemented from August 2008), allEU member states must implement risk-based surveillancefor diseases of aquatic animals. Live fish movements are animportant focus for such surveillance. Here, we analyseearlier data with a view to informing surveillance strategyapplicable to data collected in future. Focused selection ofsurveillance targets will improve efficiency in terms ofcost-benefit ratio, important where resources are limited(Stark et al., 2006).

Previous authors have applied different criteria tocontact networks in an attempt to identify high-risk edgesor nodes, whose removal reduces the potential for thenetwork to support an epidemic (Albert et al., 2000; Kisset al., 2006). In terms of a network representation, node oredge removal is equivalent to removal of the risk ofpotentially infectious contact either from a node (noderemoval) or between two nodes (edge removal), and doesnot in any way imply that a node or edge itself has ceasedto exist. For example, effective immunisation of a farmcould be interpreted as node removal, whereas pre-movement testing for bovine tuberculosis as has beenimplemented for cattle in GB could be considered edgeremoval. Identifying high-risk nodes has frequently beenconsidered as a problem of finding network nodes withhigh centrality, for which various metrics exist (Bell et al.,1999; Bonacich and Lloyd, 2001; Christley et al., 2005b;Zemljic and Hlebec, 2005). Broadly centrality measures canbe fitted into three categories: those dependent on theproperties of individual nodes; those dependent uponpaths to, from, or through a particular edge or node; andthose based upon eigen decomposition of a networkadjacency matrix, explained further below.

In terms of the aquaculture industry edge (specificallydirected ‘arc’) removal rather than node removal can beconsidered more appropriate: Concentrated surveillance,such that particular links between sites are withoutepidemiological risk, is equivalent to removing that edge

from the network. Node removal is equivalent to removinga whole site from the epidemiological network, and is bothless achievable and less appropriate. This paper investi-gates methodologies for identifying both high-risk sites,and high-risk interactions between sites – here, move-ments – for commercial salmonids. A previous study hasdescribed the internal movement structure of one of themajor salmon companies in addition to the combined trout(rainbow and brown) movement record for 2003 (Munroand Gregory, 2009). However, our study here presents thefirst mathematical study of the whole of the Scottishsalmonid industry using the records of live fish move-ments.

2. Methods

2.1. Data

The data source was the official fish movement recordsfor Scotland 2003, held by the Fish Health Inspectorate atMarine Scotland, Aberdeen, which is the agency of theScottish Government responsible for regulating andenforcing legislation concerning the aquaculture industryin Scotland. More recent data are not yet available. Thesedata comprise validated movements of live fish, from eggto adult, between registered fish farm sites within Scot-land, considered validated where the paper records of theoff and on movement were legible and could be matched.Movements to fisheries (predominantly freshwater) arenot included as they are not registered under currentlegislation. Neither are imports or exports outwith Scot-land (including to or from England and Wales) included inthe data. The paper records were transferred to anelectronic database recording source and destination sitesfor each of 3696 movements of Atlantic salmon, rainbowtrout and brown trout amongst n = 422 sites.

2.2. Network properties

All network and arc removal algorithms were pro-grammed using C++. The network is represented by theadjacency matrix A where Aij = 1 indicates at least one(potentially several) directed movement from node (site) i

to node j (1 � i,j � n) occurred, and zero, no connection.Rare self-loops (movements from site i back to site i) wereremoved. Each node is described by its in and out degreeskout

i ¼P

jAi j and kini ¼

Pj A ji, and the undirected degree

kundiri ¼ kout

i þ kini �

Pj Ai jA ji, giving a total of M =

PijAij

arcs (directed connections) in the network. The networkcan also be characterised by the matrix of minimum pathlengths Lij, the minimum number of steps along arcsrequired to move from node i to node j (infinite if nopossible path exists; Lii = 0), and the arc ‘betweenness’ Bij,defined as the number of shortest paths amongst all pairsof nodes that pass through arc i! j (or zero where no arcexists). A network with low mean shortest path length, asis found with ‘small world’-type networks (Watts andStrogatz, 1998), will be subject to rapid epidemic spreadcompared to networks with longer mean shortest pathlength, holding all other network properties constant. Thedistribution of path lengths can be usefully compared with

Page 3: Small- and large-scale network structure of live fish movements in Scotland

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269 263

that of ‘rewired’ networks where higher-scale structure isremoved. Rerouting of pairs of arcs of the form A! B andC! D to give arcs A! D and B! C was performed for 10replicate networks, following the procedure of Kiss et al.(2006).

The clustering coefficient C defines the degree towhich ‘any friend of yours is a friend of mine’. For adirected network, a simple and epidemiologicallyappropriate definition is that of the proportion of‘triples’ – three distinct, ordered nodes U, V, and W

with directed arcs U! V and U!W – which are also‘triangles’, with an additional connection V!W: i.e.C ¼

PuvwAuvAuwAvw=

PuvwAuvAuw. Networks with higher

clustering, all other things being equal, are moreresilient to epidemic spread (Keeling, 1999), and thisdefinition of C is compatible with various contact modelsof disease transmission.

The coefficient of assortativity of a network measuresthe extent to which edges join nodes of similar degree(Newman, 2003). For directed networks, a set of potentialcorrelation measures exist of the form:

rðkXi ; kY

j ji! jÞ ¼MP

i! jkXi kY

j �P

i! jkXi

� � Pi! jk

Yj

� �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

MP

i! jðkXi Þ

2 �P

i! jkXi

� �2� �

MP

i! jðkYi Þ

2 �P

i! jkYi

� �2� �s

where X and Y indicate either in degree or out degree andPi!jx stands for

PijAijx, iterating over all edges. Potentially

of most epidemiological interest is that where X is in

degree and Y is out degree. In this case, assortativity meansa large number of arcs come from nodes with high in

degree, and go to nodes with high out degree, potentiallyaiding the spread of disease. Assortativity implies a higherbetween-node basic reproduction number R0 (the numberof secondary cases produced by a typical primary case in afully susceptible network); a negative value, disassorta-tivity is likewise associated with lower R0.

The above metrics all describe the network at the levelof the individual node or interactions between smallgroups of nodes. Of equal interest is the broad-scalestructure of the network at the highest level. Suchstructure can be expressed in terms of network commu-

nities, which can be defined as groups of nodes such thatthere are more interactions within groups than can beexpected by chance, and therefore less interactionbetween groups. Similar to long path lengths, organisationinto communities tends to slow the progress of epidemicspread.

Where each network node belongs to one communityand there is no restriction on how many nodes may belongto each community, the number of possible arrangementsof n nodes within communities is given by the Bell numberBn (Bell, 1934), which rises faster than exponential withincreasing n, combinations outnumbering the atoms in theuniverse for quite modest n. An exhaustive search istherefore not possible. The community structure algorithmwe used was based on measures of ‘modularity’, using the‘greedy’ algorithm introduced by Newman (2004),

amended to account for the strongly directed nature ofthe network as discussed by Kao et al. (2006) and Leichtand Newman (2008). A measure of community fit is givenby

Q ¼ 1

M

Xi j

Ai j � gkout

i kinj

M

!ci ¼ c j

� �; 0 � Q � 1

where ci is the community label for node i, [x = y] returnsunity where x = y and zero otherwise, and g is a constant,implicitly equal to unity in Leicht and Newman (2008) (seeKumpula et al., 2008). Higher Q indicates a larger fractionof arcs within communities. The algorithm proceeded byfirst assigning each node a unique community label 1, . . ., n.Then, each possible merger of two communities wasconsidered, with the merger that resulted in the greatestincrease in Q (or smallest decrease) accepted and nodes ofboth communities assigned the same unique label. Thisprocess was repeated until only a single communityremained, with Q reaching a maximum at some inter-mediate point.

2.3. Arc removal

To determine the susceptibility of a network to an(unknown) epidemic, the size of the giant stronglyconnected component (GSCC), or simulation modelling isfrequently used (e.g. Kiss et al., 2006; Thrush and Peeler,2006). The GSCC represents the largest set of nodes suchthat any two nodes can be connected by directed paths (astrong component, SC). With the highly directed structureof the fish network, strong components become small andless useful. Instead, here, the epidemiological risk posed bya node is defined in terms of its ‘reach’, i.e. the number ofother nodes that can be arrived at from node i by followingdirected paths, in terms of the maximum reach for anynode, and the mean reach across all nodes. This is related tothe chains of infection discussed by Dube et al. (2008).

Of interest is the resilience of the network to theremoval of small numbers of arcs, corresponding to theconcentration of surveillance effort onto particular move-ments, resulting in the potentially infectious contactassociated with these movements becoming negligible.The extent to which networks are thus disrupted dependsgreatly on the choice of arc or edge to be removed (Kisset al., 2006), implying that different criteria for targetingsurveillance will vary considerably in their efficiencies. Theeffect of choosing different surveillance strategies wasexplored by sequentially removing one arc at a timeaccording to one of a set of criteria and reevaluating theproperties of the pruned network, in particular, reach-ability. Arc removal proceeded until no arcs remained.These criteria varied between simple and easily imple-

Page 4: Small- and large-scale network structure of live fish movements in Scotland

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269264

mented, through more complex methods requiring com-puter time to solve with reevaluation necessary after eacharc removal. These are described below in brief. Where thecriterion statistics were tied between arcs, the removed arcwas chosen at random. Due to such stochastic componentsto the algorithms, each algorithm was repeated 80 timesand the means of network properties are reported below.The different strategies are not intended to representdifferent on-site procedures, but different methods ofselecting which sites or movements to concentrate theseprocedures upon.

Arbitrary: A

rcs were chosen at random – thecontrol.

Inter-community: A

rcs connecting nodes in differentcommunities were weighted higherthan those between, but within thesetwo sets, chosen arbitrarily.

Degree: A

simple degree-based measure wastested, denoted for an arc i! j bydi j ¼ kin

i koutj . Arcs with high d produce

a two-node ‘unit’ whose combinedcontribution to potential epidemicspread is large. Networks with morearcs of higher d than expected bychance are assortative according tothe measure of assortativity introducedabove.

Greedy: ‘G

reedy’ algorithms are those whichalways make the locally optimalchoice, which in some cases is suffi-cient to find the globally optimalsolution (Cormen et al., 2001). Foreither maximum reach (greedy max)or mean reach (greedy mean), at eachstep, the arc is chosen that would causethe greatest reduction in the selectedmeasure, once removed. All arcs arethus examined at each step.

Betweenness: E

dge betweenness Bij was calculatedfor each edge and the arc with thehighest value selected. This is closelyrelated to node betweenness, a fre-quently used measure of centrality.

Eigenvector: T

wo measures of eigenvector-typecentrality were implemented (eigenspread and eigen walk) as described inthe following section, with out-arcschosen at random from the node withthe highest such centrality.

The greedy, betweenness and eigen methods all requirerecomputation of the relevant statistics after each step.

2.4. Network eigen analysis

Given an adjacency matrix A, eigen decomposition canprovide us with a dominant eigenvalue l and correspond-ing eigenvector V. The eigenvector has one entry per nodeand gives an estimate of the centrality of each node – ameasure of the relative risk of incidence across nodes for a

disease that is relatively rare. The dominant eigenvalue isrelated to R0 given a per-link transmissibility t, withR0 � tA (Diekmann and Heesterbeek, 2000; Kiss et al.,2008). Inter-site R0 is therefore limited by l, subject todisease-specific parameters encapsulated by t which wedo not attempt to parameterise here. Because the analysisdoes not consider the infection state of an infected node’sneighbours, this estimate does not account for networkclustering and is an overestimate where many bidirec-tional edges are present.

Bonacich and Lloyd (2001) note features of networkconstruction – frequently encountered in the live fishmovement network – where standard eigenvector cen-trality approaches fail. We follow a similar approach towhat they recommend by applying eigen analysis to twomodified adjacency matrices A, using a simple poweriteration method. This modification assumes that a smallamount of additional contact b = 1/2 occur outside of thedocumented connections.

In the first case (eigen spread; Aspread) all arcs wereweighted equally. Alternatively, we consider a simplerandom walk (eigen walk) by assuming that outwardcontact is weighted such that the total outward contactfrom any node is constant (Awalk):

Aspreadi j ¼ b

nþAi j;

Awalki j ¼

b=nþ Ai j

bþ kouti

:

The eigenvector V is then obtained simply by iterating theexpression Vs+1 = AVs until convergence, normalising aftereach step, starting with V0

i ¼ 1=n. The dominant eigenva-lue l is then obtained by solving the equation Al = AV. ForAspread, this eigenvalue provides an upper estimate for R0.

3. Results

3.1. Small- and large-scale network structure

The network of n = 422 nodes is shown in Fig. 1,indicating site type and the direction of movements.Betweenness is indicated by line width, demonstratingsite-to-site links that are important connections betweenparts of the network. Visible are a large number ofdisconnected pairs and small groups of nodes, and moretightly connected clusters of nodes.

Examination of the degree distribution confirms themostly directed nature of the network: mean node degrees(and coefficient of dispersion, i.e. the variance to meanratio) were hkini = 1.48 (0.99), hkouti = 1.48 (3.12), andhkundiri = 2.83 (2.16) for an undirected network (Fig. 2).Only 8.7% of site–site connections were bidirectional.Correlation between the in and out degree of nodes wasweakly positive: rðkin

i ; kouti Þ ¼ 0:12. Mean shortest path

length (where not zero or infinite) was 4.4 (Fig. 2), with2.6% of potential paths existing. This compares with therewired networks, with a mean shortest path length of 7.1and 3.2% of potential paths existing. This reflects the lowmean degree and the fragmented nature of the networkseen in Fig. 1. There was relatively little difference indegree amongst sites moving different species, but a

Page 5: Small- and large-scale network structure of live fish movements in Scotland

Fig. 1. The 2003 Scottish live fish movement network, with sites coded according to species moved between sites. (&) Salmon (S); (&) rainbow trout (R); ( )

S + R; ( ) brown trout (T); (!) T + R; (5) T + S; ( ) T + S + R; ( ) self-loops only. The direction of arcs is indicated by arrows and shading (darker half-arc for

source), and their relative betweenness (loge-scale) indicated is by line width.

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269 265

tendency for lower out degree for site types further downthe production chain, as might be expected (Table A1 inelectronic supplementary material), with notable net flowfrom freshwater to seawater sites (Figure A1 in electronicsupplementary material).

For the movement network, rðkini ; k

outj ji! jÞ ¼ 0:16,

indicating a small degree of assortative mixing. Theclustering coefficient was also low, with a ratio of triangles

Fig. 2. Descriptive statistics for the movement network. (a) Histograms of in and o

Number of paths of length l is expressed as a proportion of the potential total

to triples of C ¼ 0:060. The network therefore has a mixtureof properties that both encourage epidemic spread(assortativity, overdispersion in degree) and discouragespread (clustering at the node and community level, lowin–out degree correlation).

Eigen analysis provided an estimate for R0 of 2.4t forb = 1/2. This is sensitive to the value of b chosen, reachingasymptotes of R0 � bt for large b and R0 = 2.2t as b

ut degree distribution (left axis) and of path length distribution (right axis).

n(n � 1). (b) Histogram of node reach.

Page 6: Small- and large-scale network structure of live fish movements in Scotland

Fig. 3. Community assignment for the live fish movement network for Scotland in 2003. Community membership is indicated by different symbols.

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269266

approaches zero. This is higher than the traditional estimateof R0 taken from degree statistics, of (hkinkouti/hkini)t = 1.7t.Relatively little difference was found between site types inthe associated eigenvectors, unless very small values for bwere used (Table A1 in electronic supplementary material).

The best-fit community assignment is shown in Fig. 3,corresponding to a high modularity index of Q = 0.82. Thedendrogram showing community structure above andbelow this level of joining is shown in Fig. 4. Interpretationof the dendrogram requires care: two structural featuresindicate lack of identifiable sub-structure in the data(identified by symbols in Fig. 4). First, the algorithm joinsthe isolated pairs of nodes into a binary tree visible on thedendrogram in an arbitrary way. Second, many commu-nities are constructed by sequentially adding on singlenodes forming a ‘plume’-like pattern also seen by Newman(2004). However, elsewhere in the tree, communities closetogether can be seen that are also noticeably close togetherin the network picture (Fig. 3).

3.2. Reducing network reach

The mean number of nodes reachable by followingdirected paths from another node was 12 (2.8% of thenetwork), however node reach was highly variable withmany nodes being sinks only, and had a maximum of 130nodes (31%) (Fig. 2b). This contrasts with the smaller GSCCof 17 (and next-largest SCC of 9), again reflecting thestrongly directed nature of the network. The eight

algorithms presented above vary in their efficiency inreducing network reach (Fig. 5). For maximum reach, thecorresponding greedy method is initially the mosteffective, but later performs less well compared with thebetweenness method. The two eigenvalue-based methodsperform almost as poorly as arbitrary arc removal, and lesswell than the simple degree-based or inter-communityapproaches. In reducing mean reach, ordering of effec-tiveness of the eight methods is somewhat different: thecorresponding greedy algorithm is consistently the mosteffective, but equally effective is betweenness. The R0-likeeigenvalue measure and the simple degree-based measureare moderately effective.

4. Discussion

The data presented above include only movementrecords between registered sites confirmed by validationof the paper record for both sender and receiver. Inaddition, circa 500 unconfirmed records were excludedwhere paper records did not match, as well as a further4100 or so from or to unregistered sites (Munro andGregory, 2009). The majority of these latter records are tounregistered fisheries, mostly untraceable, which are sinksfor movements only. Epidemiologically, these can beconsidered ‘dead ends’, at least via movements of livefish, though potentially not through their impact on theaquatic environment. Their exclusion from the dataset willnot therefore affect the ‘core’ of the network. Furthermore,

Page 7: Small- and large-scale network structure of live fish movements in Scotland

Fig. 4. Dendrogram for the community algorithm. Each branch represents a group of nodes that are merged by the algorithm into the same community

before they are merged into another such group, reading top-down. Best-fit communities are as shown in Fig. 3. Symbols indicate ‘plume’ (€) and ‘binary

branching’ (|) structures described in the text.

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269 267

though these movements are large in number, they are ofsmaller numbers of fish; they are also predominantly oftrout (both species). Under EU directive 2006/88/EC,fisheries are required from August 2008 both to beregistered, and to record live fish movements, makingthese data available in future.

A large number of sites are linked to only one other site.These sites likely do interact more within the network, butdo not do so within the single year (2003) of dataexamined in this study. In particular, the natural time-frame for salmon production is the two-year cycle fromegg to smolt to adult. Future extension to further years ofdata will show a more tightly connected network, at leastfor salmon. In datasets of longer timespan, modelling notonly direction of movements, but also their relativetimings will become increasingly important. For example,if movements exist from A to B and B to C, then C is only atrisk of infection from A should the movement A to B occurearlier. This added complexity is difficult to incorporateinto a network approach, but manageable throughsimulation models, as have been developed for sheepand cattle (Green et al., 2006, 2008). Considering networksas static or undirected can lead to overestimation of thenumbers of nodes that are at risk of infection, given adisease incursion, as examined for cattle movementnetworks by Dube et al. (2008) and Vernon and Keeling(2009).

The most salient feature of the network is its strongcommunity structure. This partly reflects low interactionbetween different species, but also within the salmonindustry there are loosely connected portions of thenetwork and disconnected fragments. If this patternpersists over multiple years, it provides benefits tosurveillance and disease control, limiting the extent ofpotential epidemic spread through movements. Futurework will investigate the structure of other routes fordisease transmission that are not present in these data,such as shared use of well boats or presence in the sameriver catchment.

Arc removal demonstrates that the removal of a smallnumber of arcs can have a disproportionally largereduction in the size of any potential epidemic, if thecorrect arcs are chosen. This was also demonstrated for theGB network of sheep movements by Kiss et al. (2006). Forour network, the betweenness measure was found to workwell as a strategy for selecting arcs, whereas eigenvaluemeasures performed relatively poorly. This is in contrast tothe results of Bell et al. (1999), who found betweennessmeasures to perform worse than eigenvector centrality,albeit for identifying vulnerable nodes, rather than edgeremoval.

Both Bell et al. (1999) and Christley et al. (2005b) founddegree-based measures of vulnerability to perform well,and our degree-based measure for arc removal also

Page 8: Small- and large-scale network structure of live fish movements in Scotland

Fig. 5. Mean and maximum reach from all nodes, versus proportion of

network arcs removed (plotted on a square-root scale), for eight different

algorithms for determining precedence of arc removal. Arrows indicate x-

axis values corresponding to the removal of 1, 2, 3, and 4 arcs.

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269268

performed acceptably. Zemljic and Hlebec (2005) note thatcentrality measures differ in their robustness, and are morereliable for dense networks. That the greedy algorithms arenot optimal is unsurprising since such approaches often donot obtain a global optimum (see Figure A2 in electronicsupplementary material).

The two estimates for R0 above differ, with theeigenvalue-based estimate higher than that based on nodedegree. Though the low correlation between in and out

degree is accounted for by both estimates, the modelsdiffer in which other network features they capture. First,clustering decreases R0 (Keeling, 1999) and is accountedfor by neither estimate; but levels of clustering here aresmall. Second, node degree-based measures do not takeinto account assortativity which increases R0 (Newman,2003; Kiss et al., 2008), and the fish network is slightlyassortative. Third, the eigenvalue-based measure is lessappropriate where the network is deeply cleft into distinctcomponents. In extremis, where a network is completelydivided into unlinked subnetworks, eigenvalue R0 wouldrepresent only that subnetwork with the highest intrinsicR0, whereas the degree-based measure would averageacross the entire network, giving a lower value as is seenhere. Given the regional and sector differences in different

farming industries, one must therefore be careful whenestimating R0, and avoid applying single-figure values asdescriptors of processes which are too complex to allowthem.

In summary, though the live fish movement network iscomparatively small compared with other industries, it isnevertheless demonstrable that application of network-based statistical methods is more informative than simplyexamining the behaviour of nodes as individuals. Inter-sitelinks identified as important through the arc removalprocedures above might be considered as a particular focusfor targeted surveillance, giving a more efficient use oflimited surveillance resources. However for this, up-to-date, accurate data must be available, and specialisedsoftware is required. Additionally one must considerdisease-specific factors such as the timescale of disease,the relative size of different nodes in terms of the numberof animals stocked, and other transmission routes: Water-borne, fomite, or airborne transmission will require theinclusion of other inter-site contact structures, crossing theboundaries between network modelling and metapopula-tion-based approaches.

Acknowledgements

We thank the Fish Health Inspectorate for providingaccess to the movement records. DMG’s contribution tothis work was funded through the Marine Scotland. Wethank Istvan Kiss for helpful comment on the manuscript.

Appendix A. Supplementary data

Supplementary data associated with this article can be

found, in the online version, at doi:10.1016/j.prevetmed.

2009.05.031.

References

Albert, R., Jeong, H., Barabasi, A.-L., 2000. Error and attack tolerance ofcomplex networks. Nature 406, 378–382.

Bell, C.B., Atkinson, J.S., Carlson, J.W., 1999. Centrality measures fordisease transmission networks. Soc. Netw. 21, 1–21.

Bell, E.T., 1934. Exponential numbers. Am. Math. Monthly 41, 411–419.Bigras-Poulin, M., Barfod, K., Mortensen, S., Greiner, M., 2007. Relation-

ship of trade patterns of the Danish swine industry animal move-ments network to potential disease spread. Prev. Vet. Med. 80, 143–165.

Bonacich, P., Lloyd, P., 2001. Eigenvector-like measures of centrality forasymmetric relations. Soc. Netw. 23, 191–201.

Christley, R.M., Robinson, S.E., Lysons, R., French, N.P., 2005a. Networkanalysis of cattle movement in Great Britain. In: Mellor, D.J., Russell,A.M., Wood, J.L.N. (Eds.), Proc. Soc. for Veterinary Epidemiology andPreventive Medicine, Nairn, Scotland, March 30–April 1, pp. 234–244.

Christley, R.M., Pinchbeck, G.L., Bowers, R.G., Clancy, D., French, N.P.,Bennett, R., Turner, J., 2005b. Infection in social networks, Usingnetwork analysis to identify high-risk individuals. Am. J. Epidemiol.162, 1024–1031.

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., 2001. Introduction toAlgorithms, 2nd edition. MIT Press (Chapter 16).

Dent, J.E., Kao, R.R., Kiss, I.Z., Hyder, K., Arnold, M., 2008. Contact struc-tures in the poultry industry in Great Britain: exploring transmissionroutes for a potential avian influenza virus epidemic. BMC Vet. Res. 4,27.

Diekmann, O., Heesterbeek, J.A.P., 2000. Mathematical Epidemiology ofInfectious Diseases, Model Building, Analysis and Interpretation.Wiley, Chichester, UK.

Page 9: Small- and large-scale network structure of live fish movements in Scotland

D.M. Green et al. / Preventive Veterinary Medicine 91 (2009) 261–269 269

Dube, C., Ribble, C., Kelton, D., McNab, B., 2008. Comparing network analysismeasures to determine potential epidemic size of highly contagiousexotic diseases in fragmented monthly networks of dairy cattle move-ments in Ontario, Canada. Transbound. Emerg. Dis. 55, 382–392.

Green, D.M., Kiss, I.Z., Kao, R.R., 2006. Modelling the initial spread of footand mouth disease through animal movements. Proc. R. Soc. B 273,2729–2735.

Green, D.M., Kiss, I.Z., Mitchell, A.P., Kao, R.R., 2008. Estimates for local andmovement-based transmission of bovine tuberculosis in British cat-tle. Proc. R. Soc. B 275, 1001–1005.

Kao, R.R., Danon, L., Green, D.M., Kiss, I.Z., 2006. Demographic structureand pathogen dynamics on the network of livestock movements inGreat Britain. Proc. R. Soc. B 273, 1999–2007.

Kao, R.R., Green, D.M., Johnson, J., Kiss, I.Z., 2007. Disease dynamics over verydifferent time-scales: foot-and-mouth disease and scrapie on the net-work of livestock movements in the UK. J. R. Soc. Interface 4, 907–916.

Kiss, I.Z., Green, D.M., Kao, R.R., 2006. The network of sheep movementswithin Great Britain: network properties and their implications forinfectious disease spread. J. R. Soc. Interface 3, 669–677.

Kiss, I.Z., Green, D.M., Kao, R.R., 2008. The effect of network mixingpatterns on epidemic dynamics and the efficacy of disease contacttracing. J. R. Soc. Interface 5, 791–799.

Keeling, M.J., 1999. The effects of local spatial structure on epidemiolo-gical invasions. Proc. R. Soc. B 266, 859–867.

Kumpula, J.M., Saramaki, J., Kaski, K., Kertesz, J., 2008. Limited resolutionand multiresolution methods in complex network community detec-tion. Fluct. Noise Lett. 7, 209–214.

Leicht, E.A., Newman, M.E.J., 2008. Community structure in directednetworks. Phys. Rev. Lett. 100, 118703.

Munro, Gregory, 2009. Application of network analysis to fish movementdata. Journal of Fish Diseases 32, 641–644.

Murray, A.G., Smith, R.J., Stagg, R.R., 2002. Shipping and the spread ofinfectious salmon anemia in Scottish aquaculture. Emerg. Infect. Dis.8, 1–5.

Newman, M.E.J., 2003. Mixing patterns in networks. Phys. Rev. E. 67,026126.

Newman, M.E.J., 2004. Fast algorithm for detecting community structurein networks. Phys. Rev. E 69, 066133.

Ribbens, S., Dewulf, J., Koenen, F., Mintiens, K., de Kruif, A., Maes, D., 2008.Type and frequency of contacts between Belgian pig herds. In: Peeler,E.J., Alban, L., Russell, A. (Eds.), Proc. Soc. for Veterinary Epidemiologyand Preventive Medicine, Liverpool, UK, March 26–28, pp. 155–171.

Stark, K.D.C., Regula, G., Hernandez, J., Knopf, L., Fuchs, K., Morris, R.S.,Davies, P., 2006. Concepts for risk-based surveillance in the field ofveterinary medicine and veterinary public health: review of currentapproaches. BMC Health Serv. Res. 6, 1–8.

Thrush, M., Peeler, E., 2006. Stochastic simulation of live salmonid move-ment in England and Wales to predict potential spread of exoticpathogens. Dis. Aquat. Org. 72, 115–123.

Truscott, J., Garske, T., Chis-Ster, I., Guitain, J., Pfeiffer, D., Snow, L., Wile-smith, J., Ferguson, N.M., Ghani, A.C., 2007. Control of a highlypathogenic H5N1 avian influenza outbreak in the GB poultry flock.Proc. R. Soc. B 274, 2287–2295.

Vernon, M.C., Keeling, M.J., 2009. Representing the UK’s cattle herd asstatic and dynamic networks. Proc. R. Soc. B 276, 469–476.

Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of ‘small-world’networks. Nature 393, 440–442.

Webb, C.R., 2005. Farm animal networks: unraveling the contact structureof the British sheep population. Prev. Vet. Med. 68, 3–17.

Zemljic, B., Hlebec, V., 2005. Reliability of measures of centrality andprominence. Soc. Netw. 27, 73–88.