A spatio-temporal Bayesian network classifier for understanding visual field deterioration

A spatio-temporal Bayesian network classifier forunderstanding visual field deterioration

Artificial Intelligence in Medicine (2005) 34, 163—177

http://www.intl.elsevierhealth.com/journals/aiim

Allan Tuckera,*,1, Veronica Vinciottia,1, Xiaohui Liua,David Garway-Heathb

aDepartment of Information Systems and Computing, Brunel University, Uxbridge, Middlesex UB8 3PH, UKbGlaucoma Unit, Moorfields Eye Hospital, London, UK

Received 4 March 2004; received in revised form 2 July 2004; accepted 9 July 2004

KEYWORDSClassification;Multivariate time

series;Bayesian networks;Visual field;Glaucoma

Summary

Objective: Progressive loss of the field of vision is characteristic of a number of eyediseases such as glaucoma which is a leading cause of irreversible blindness in theworld. Recently, there has been an explosion in the amount of data being stored onpatients who suffer from visual deterioration including field test data, retinal imagedata and patient demographic data. However, there has been relatively little work inmodelling the spatial and temporal relationships common to such data. In this paperwe introduce a novel method for classifying visual field (VF) data that explicitlymodels these spatial and temporal relationships. Methodology: We carry out ananalysis of our proposed spatio-temporal Bayesian classifier and compare it to anumber of classifiers from the machine learning and statistical communities. Theseare all tested on two datasets of VF and clinical data. We investigate the receiveroperating characteristics curves, the resulting network structures and also make useof existing anatomical knowledge of the eye in order to validate the discoveredmodels. Results: Results are very encouraging showing that our classifiers arecomparable to existing statistical models whilst also facilitating the understandingof underlying spatial and temporal relationships within VF data. The results reveal thepotential of using such models for knowledge discovery within ophthalmic databases,such as networks reflecting the ‘nasal step’, an early indicator of the onset ofglaucoma. Conclusion: The results outlined in this paper pave the way for a sub-stantial program of study involving many other spatial and temporal datasets,including retinal image and clinical data.# 2004 Elsevier B.V. All rights reserved.

* Corresponding author. Tel.: +44 1895 816 253;fax: +44 1895 251 686.

E-mail address: [email protected] (A. Tucker).1 These authors contributed equally to this work.

0933-3657/$ — see front matter # 2004 Elsevier B.V. All rights resedoi:10.1016/j.artmed.2004.07.004

1. Introduction

Progressive loss of the field of vision is character-istic of a number of eye diseases such as glaucoma, aleading cause of irreversible blindness in the world.

rved.

164 A. Tucker et al.

Recently, there has been an explosion in the amountof data being stored on patients who suffer fromvisual deterioration, including visual field (VF) test,retinal image and patient demographic data. Theaim now is to extract as much information as pos-sible from these data in order to address funda-mental questions still open within the glaucomaresearch community. For example, the diagnosisof glaucoma made by clinicians would be highlyimproved by the identification of the various causesof VF loss as well as the detection of patterns of VFloss that match with glaucomatous patterns. Itwould also be very beneficial to be able to integratedifferent types of clinical data (such as intraocularpressure and VF data) for both the diagnosis anddetection of the disease progression. Furthermore,since the VF loss is characterised by a slow progres-sion, early detection of glaucoma can be invaluableas early intervention can slow VF deterioration.Statistical and classification models that addressall of these issues would therefore be extremelyhelpful to glaucoma specialists. Fig. 1 shows anexample of a VF for both a healthy eye and glau-comatous one. In this paper, we look at classifica-tion models to predict whether a certain patient hasglaucoma or not, given measurements of sensitivityat different points of their VF.

The progression of the VF over time, when avail-able, can be included in the model. In addition,clinical and demographic information can be incor-porated as it may affect the spatial deterioration ofthe VF. There has been very little modelling of thespatial and temporal relationships which are char-acteristic of VF data. Swift and Liu [1] have lookedinto learning statistical time series models of VFdata. Other approaches that explore the temporalaspect of VF data include trend and event analysis[2,3], and state space models [4]. Ibanez and Simo[5] have investigated spatio-temporal statisticalmodels with the aim of forecasting VF deterioration,but to date have only looked at VFs of normal eyes.

Various models have been developed for classify-ing VF data. Hilton et al. [6] use logistic regression

Figure 1 A typical VF test from a healthy eye and a g

models for the classification of glaucoma by VFmeasurements. Hothorn and Lausen [7] make useof the retinal image data to classify glaucoma withtree classifiers. Chan et al. [8] and Goldbaum et al.[9] document a comprehensive comparison of sta-tistical and machine learning classifier systems forthe classification of glaucomatous VFs. Many ofthese classifiers are ‘black box’ in nature and there-fore do not give much insight into the behaviour ofthe VF. Much research in glaucoma has involvedexploring the distribution of point-by-point lightsensitivity, at a single point in time, in normal[10,11] and glaucomatous populations [12]. How-ever, much remains unknown about the behaviourof the VF test, such as the light sensitivity relation-ship between adjacent and distant VF points, therelationship between light sensitivity and other ocu-lar parameters (such as optic nerve appearance andintraocular pressure level) and how stable and dete-riorating VFs behave over time.

It is our intention to make use of the largeamount of data available in order to build modelsfor classification that fully exploit the spatio-tem-poral nature of these data whilst avoiding the inher-ent problem of black box paradigms. This has led usto investigate the use of Bayesian networks whichare transparent in the way that they model data. Toour knowledge, Bayesian network classifiers havenot been extended to classify multivariate timeseries data using the inclusion of temporal links.Furthermore, the spatial nature of data such as VFdatasets has not been exploited in the learning ofsuch classifiers. In this paper we extend existingBayesian network classifiers to explicitly handle thespatial and temporal relationships of VF data.

The paper is broken down into the followingsections. In Section 2, we describe some relevantbackground in classification methods. In Section 3,we describe existing Bayesian network classifiers aswell as the extension of Bayesian networks to incor-porate temporal links. In Section 4, we describe ourspatio-temporal Bayesian network classifier. Themodel is a combination of the existing Bayesian

laucomatous eye. Darker regions represent VF loss.

A spatio-temporal Bayesian network classifier 165

classifiers, temporal Bayesian networks and spatiallearning operators, in order to model and classifydata with spatial and temporal relationships. Thissection includes an outline of the architecture ofthe model, the learning algorithm and the spatialoperators used. Section 5 includes a description ofthe parameters and the datasets used in the experi-ments, one dataset being non-temporal and theother being a longitudinal time series. In Section6, the results of the experiments are documented,firstly just applying the non-temporal classifiers onthe non-temporal data, and secondly applying ourspatio-temporal classifier to the temporal dataset.Furthermore, we compare our method with twostandard statistical classifiers, linear regressionand k nearest neighbour, and illustrate the benefitsthat spatio-temporal Bayesian networks offer overthese models. Results include receiver operatingcharacteristic analysis, network structure analysisand inference analysis. Section 7 discusses the clin-ical implications of our results, whilst in Section 8conclusions are made and future work is outlined.

2. Overview of classification problems

Denote with Y ¼ ðX;CÞ the vector of variables forthe problem of classifying a VF as glaucomatousgiven a set of attributes X. The attributes includethe VF point values, and possibly other variablessuch as age, gender and intraocular pressure, andC ¼ f0; 1g is the class variable (with 0 = normal, 1 =glaucoma). Statistical classifiers provide an esti-mate of pðcjxÞ, the probability that a patient withobserved measurement vector x belongs to class c.Linear and logistic regression, linear and quadraticdiscriminant analysis, k-nearest neighbours and gra-phical models are amongst the most popular statis-tical classifiers.

The estimated pðcjxÞ provided by one of thesemodels, either directly or indirectly via Bayes the-orem, is compared to a threshold t to predict theclass of x. This threshold is usually chosen to mini-mise the expected misclassification loss [13], result-ing in the classification rule: classify x to class 1 if

pð1jxÞ> t ¼ k0

k0 þ k1; (1)

and otherwise to class 0, where ki denotes the costof misclassifying an object from class i.

When the misclassification costs are equal(k0 ¼ k1), the resulting rule will classify an objectto the class with the highest predicted probability.Although this is the default option in many imple-mentations, in real applications the misclassifica-tion costs are seldomly the same. In credit scoring

applications, for example, it is more costly for thebank to classify a bad customer as a good one thanvice versa [14]; in glaucoma applications, like theone that we are considering in this paper, the cost ofa false positive is often considered by glaucomaexperts to be greater than a false negative [9].Hand and Vinciotti [15] discuss the importance oftaking the relative misclassification costs into con-sideration when building and assessing the model.After all, we want the classifier to perform well forthe particular choice of costs that we make.

In order to assess the performance of a classifierand to compare different classifiers, it is commonpractice to use receiver operating characteristic(ROC) curves [13]. An ROC curve allows one to viewgraphically the performance of a classifier by plot-ting the sensitivity, which in our case is the propor-tion of glaucomatous eyes correctly classified asglaucomatous, versus (1-specificity), the proportionof normal eyes misclassified as glaucomatous, as thethreshold t in Eq. (1) assumes increasing values be-tween 0 and 1. Different points in the curve willcorrespond to different values of the threshold, i.e.different values of the misclassification costs. Theperfect classifier would have an ROC curve thatfollows the top-left corner of the unit square, where-as the worst situation would be a classifier whosecurve follows the diagonal. Real applications willusually show curves between these two extremes.

A global measure of the classifier performance,often used in classification problems, is the areaunder the ROC curve (AUC). This will be some valuebetween 0.5, associated to the diagonal of thesquare, and 1, corresponding to the curve thatfollows the top-left corner. Such a global measureof performance will not be very useful in the situa-tion where curves relative to different classifierswill cross at multiple points, which is actually verycommon in real-life applications. If curves cross atvarious points, then one classifier is better than theothers on a certain range of values for the thresh-old, but worse on other values. In situations likethis, it is more appropriate to compare the classi-fiers for the values of the threshold that one isinterested in. A common measure of classifier per-formance relative to a certain threshold is given bythe misclassification cost

C ¼ k0 � FP þ k1 � FN

n; (2)

where FP and FN denote the number of misclassifiedpoints from class 0 (false positives) and class 1 (falsenegatives), respectively, and n is the total numberof points.

In addition to validation using ROC and misclas-sification cost, we will make use of expert knowl-


Figure 2 Expert knowledge: the angle between each VFpoint and the optic-nerve-head, including anatomicallabelling.

edge of the VF in order to assess the quality of theresulting Bayesian network classifiers. Fig. 2 showsthe VF of the right eye with the angle of the nervefibre bundle entry to the optic-nerve-head corre-sponding to each VF point [16]. The optic nerve iswhere information is carried from the retina to thevisual cortex. Also included are the anatomicalnames of the different sections of the VF.

It is expected that visual function at VF pointswith similar angles are more closely related and sowe introduce a metric that scores the quality oflinks in a directed network (such as a Bayesiannetwork) by calculating the mean optic-nerve-headangle difference between the parent and the childof each link in the network. This results in the score

Xmi¼1

1

mjaðparðiÞÞ aðchdðiÞÞj; (3)

where m is the number of links in the network, aðiÞ isthe optic-nerve-head angle of VF point i, par ðiÞ andchd ðiÞ are the VF points of the parent and child oflink i, respectively.

In order to test the stability of the Bayesiannetwork classifiers generated by repeated experi-ments, we consider a measure of structural differ-ence between networks. Let L be the set of directedlinks associated to a network. For two sets of direc-ted links, L1 and L2, the structural differencebetween the two networks can be measured by

jL1nL2j þ jL2nL1j; (4)

that is, the number of links in L1 but not in L2 and thenumber of links in L2 but not in L1.

3. Bayesian networks

Bayesian networks (BNs) are probabilistic modelsthat can be used to combine expert knowledge anddata. They facilitate the discovery of complex rela-tionships in large datasets and enable non-statisti-

cians to query resultant models. For this reason theyare particularly useful in the analysis of VF datawhen trying to understand underlying relationshipsbetween VF points and other clinical variables.

3.1. The basics

A BN consists of a directed acyclic graph, made up oflinks between nodes that represent variables in thedomain. The links are directed from a parent nodeto a child node and with each node there is anassociated set of conditional probability distribu-tions. A BN thus consists of the following: a set of Nnodes, fY1; . . . ;YNg, representing the N variables inthe domain, and directed links between the nodes.Associated with each node Yi with parents pi, thereis a probability table, ui ¼ pðYijpiÞ. The set of theseprobabilities for all nodes Yi and configuration ofthe parents pi provides an efficient factorization ofthe joint probability pðYÞ in terms of dependenciesbetween variables [17].

The process of learning a BN from data is made upof two distinct phases [18]. First of all, a networkhas to be selected amongst the space of all possiblemodels. Then, the vector of probabilities u ¼ ðuiÞihas to be estimated. Given a known network struc-ture bn, the probabilities ui are estimated in aBayesian framework by the mean of the posteriordistribution pðujbn;DÞ, where D is the database ofobserved values. The posterior is updated from aprior probability pðujbnÞ, which is usually chosen tobe Dirichlet. The special case of prior ignorancecorresponds to a uniform distribution on the para-meters u. Learning the structure bn from data is anon-trivial problem due to the large number ofcandidate networks. In the medical community, thisstructure is often manually set up based on domainknowledge. However, this process can be difficultand lengthy, so an automated system where struc-ture and parameters are both estimated is parti-cularly desirable. As a result there has beensubstantial research in developing efficient optimi-sation algorithms. Most methods involve scoringcandidate network structures and one of the mostcommon metrics is the marginal log-likelihood. Fora candidate BN structure bn and under the assump-tion of a uniform prior on the parameters u this isgiven by

logpðDjbnÞ ¼ log

ZpðDjuÞpðujbnÞ du

¼ logYNi¼ 1

Yqi

j¼1

ðri 1Þ!ðFij þ ri 1Þ!

Yri

k¼1

Fijk!; (5)

where pðDjuÞ is the likelihood function, N is thenumber of variables in the domain, ri is the number


Figure 3 A typical TBN with N variables and 2 timeslices, t and t 1. Note the links within one time sliceand those spanning from one to the next.

of states that a node Yi can take, qi is the number ofunique instantiations of the parents of node Yi, Fijk isthe number of cases in the database D where Yi

takes on its kth unique instantiation and the parentset of i takes on its jth unique instantiation andFij ¼

Prk¼1Fijk. Assuming a uniform prior on a set of

candidate networks bn, maximising the posteriorprobabilities logpðbnjDÞ over the models is equiva-lent to maximising the marginal log-likelihoodlogpðDjbnÞ.

One can apply inference to a BN given someevidence about a subset of variables. Inferenceinvolves calculating the posterior distribution overthe variables by propagating this evidence through-out the network. Exact inference in multiplynetworks has been shown to be NP-hard [19].Approximate algorithms can be used, such as sto-chastic simulation, which run repeated simulationsin order to generate multiple samples over thenodes. As the number of simulations increases,the frequencies of the states of each node occurringin the samples will approximate the exact posteriordistribution. Logic sampling as suggested by [20]handles the propagation of evidence by discardingall simulations where the observed nodes have beeninstantiated to states different to those observed.This can result in many simulations being discarded,especially as the number of observations increases.Pearl [21] suggests a method to overcome this pro-blem by clamping the observed nodes to theirrespective states and applying a two-step algorithm.

3.2. Bayesian networks for classification

BN classifiers have recently shown excellent proper-ties [22]. Because the discovered relationships inthe classifiers are explicit, it means that the modelscan be analysed to understand how classificationdecisions are made. This is an extremely usefulproperty when trying to understand classificationdecisions in medical data such as glaucomatous VFs.

There are a number of BN classifiers, the simplestbeing the naı̈ve Bayes classifier. This architectureassumes that every feature in the classifier is inde-pendent given the class, i.e. pðxjcÞ ¼

QN1i¼1 pðxijcÞ.

Despite the fact that the independence assumptionin the naı̈ve Bayes classifier is almost always incor-rect in real applications, many studies have shown avery good performance of the model, even in com-parison with much more sophisticated classifiers (forexample [22—26]). These results are of particularinterest especially considering the many advantagesof the naı̈ve Bayes classifier in practical applications:it is very simple, very efficient and very easy tointerpret and implement. Hand and Yu [24] givesome mathematical justifications of why such a sim-

ple and unrealistic model might perform so well onfuture observations: simple models, like this, have alower variance than more complex models. Hence,despite having a larger bias, they might performbetter on observations outside the training data.

Other authors have suggested extensions to thisclassifier that relax the strong assumption of inde-pendence. The tree augmented network (TAN) isone of these [22]: in addition to the naı̈ve Bayesstructure, this model learns a tree structureamongst the features. Another possible extensionto the naı̈ve model is to learn a BN where the classvariable is simply included as one of the nodes.

3.3. Temporal Bayesian networks

A typical feature of VF data is their temporal aspect:patients attend the clinic regularly to take VF testsand at each visit a classification is made to decidewhether an eye has developed glaucoma or, if glau-coma is already present, whether it has worsened(progressed). For this reason, we have also looked atclassifiers that take the time aspect into considera-tion. In particular, we have looked at temporal Bay-esian networks (TBNs), also known as dynamic BNs.

A TBN is a BN where the N nodes representvariables at differing time slices. Therefore linksoccur between nodes over time and within the sametime slice. Fig. 3 shows an example of a TBN whereeach node represents a variable at a certain timeslice and each link represents a conditional depen-dency between two nodes.

Given some evidence about a set of variables attime t, one can infer the most probable explana-tions for the current observations. Inference in TBNsis very similar to standard inference in static BNs[27]. In this paper, we use a form of stochastic


simulation [21] as described in Section 3.1 becauseof its speed and its intuitive appeal. Previously,we have developed methods for learning differentspecialist TBN structures, such as networks withlarge time lags [28].

4. The spatio-temporal Bayesiannetwork classifier

A spatio-temporal Bayesian network is a special TBNthat assumes dependencies between variablesbased on some spatial neighbourhood, such as firstorder Cartesian coordinates. A set of operators arerequired for learning network structures thatexploit the spatial nature of the dataset. In [29]we developed such operators. In this section, wedescribe an extension of these models that isdesigned to classify data. We call this model thespatio-temporal Bayesian network classifier (STC).

Fig. 4 illustrates the main features of an STC: trepresents the time slice and C represents the classnode, in this case the presence of glaucoma at timet. For the scope of this paper, we exclude the classnode at time t 1 as this would trivialise the pro-blem of classification for glaucoma (once the VF hasbeen classified as glaucomatous, it will remainglaucomatous according to the clinicians).

The model contains relationships between vari-ables that are both temporal and non-temporal.When learning these relationships it is assumed thatthere is a spatial relationship between nodes in thenetwork based upon the Cardinal coordinate sys-tem. Therefore, nodes that are spatially close toother nodes are deemed more likely to be depen-dent upon one another. Note that the spatial rela-tionships can include more than first order Cardinalneighbours. The same applies for temporal order,although in this paper we only look at first ordertemporal relationships. Note also that the directionof links between the class node and the feature

Figure 4 The spatio-temporal Bayesian network classi-fier for time slice t and t 1. The class node is denoted by Cand the spatial nodes are depicted within the larger circle.

nodes may vary. However, this only applies to nodeswithin the same time slice as the class, because alllinks between nodes in different time slices must bedirected according to the flow of time.

4.1. The algorithm

In our learning algorithm, candidate structures of anetwork, bn, given a dataset, D, are scored usingthe metric in Eq. (5). In order to increase theefficiency of our algorithms, we have developedan evolutionary approach without the necessity ofstoring a population of candidate solutions. Rather,we consider each point in the spatial dataset to bean individual within the population of points. There-fore, the population, itself, is the candidate solu-tion. We have looked at a similar method before forgrouping algorithms [30]. The algorithm also makesuse of a simulated annealing type of selection cri-teria [31], where good operations are always carriedforward, but sometimes less good ones are alsoaccepted dependent upon a temperature para-meter. A form of elitism [32] is employed to ensurethat the final structure is the best discovered. This isto prevent the simulated annealing process frommoving away from a better solution when the tem-perature is still high. We formally define the algo-rithm below, where maxfc is the maximum numberof calls to the scoring function, c is the ‘coolingparameter’, t0 is the initial temperature, b is thebranching factor of a network, and Rð0; 1Þ is a uni-form random number generator with limits 0 and 1.


4.2. Operators

We now introduce the spatial and non-spatial opera-tors used to learn an STC. The operators involvemanipulating links within the network. Note that arandom link can be either temporal or non-temporal.

4.2.1. Non-spatial operatorsWe have chosen three non-spatial operators com-monly used in optimisation techniques, such as hillclimbing and simulated annealing.

� A

Ficirpanene

dd — A link with random parent and child isadded to the network.

� T
ake — Randomly remove a single existing link. � M utate — Randomly change the parent of an
existing link.

4.2.2. Spatial operatorsFor the scope of this paper, we assume that thepoints in a spatial dataset are located according toCartesian coordinates. Therefore, each point in adataset with coordinates (x, y) has a first orderneighbourhood which includes all nodes with coor-dinates (i, j) for i ¼ x � 1 and j ¼ y � 1. The spatialoperators that we have developed exploit the Car-tesian spatial nature of a dataset.

Fig. 5 shows examples of parent coordinates,relative to the child node, when applying the opera-tors. Unfilled circles represent child nodes, filledcircles represent parents of the child. In Fig. 5a andb shaded squares represent potential parents, inFig. 5c shaded squares with dots represent theparents of node x.

� S
patial add (Fig. 5a) — Add a link with randomchild and a parent that is one of the child’s firstorder neighbours.
gure 5 Examples of the spatial operators. Unfilledcles represent child nodes, filled circles representrents of the child: (a) add a link in the 1st orderighbourhood (b) mutate a parent to within its 1st orderighbourhood (c) spatial crossover.

� S
patial mutate (Fig. 5b) — Randomly change theparent of an existing link by setting it to the firstorder neighbour of its previous position.
� S
patial crossover (Fig. 5c) — Randomly swap therelative positions of two parent nodes.
5. Experimental set-up

Our experiments compare the proposed STC withdifferent classifiers from the statistic and machinelearning communities, including the existing familyof BN classifiers, linear regression and the k-nearestneighbour method, in the context of glaucomadetection. We apply these methods to two VF data-sets, which we describe in this section, followed bya description of how the data were preprocessedand how the methods were parameterised.

5.1. The datasets

The VF test assesses the sensitivity of the retina tolight. It is typically measured by automated peri-metry, a technique in which the subject views a dimbackground as brighter spots of light are shone ontothe background at various locations in a regular gridpattern. The brightness at which the subject seesthe spots of light is related to the retinal sensitivity.See Fig. 1 for an example of two VF tests, one from ahealthy eye and one from a patient suffering fromglaucoma.

All VF testing was performed with the Humphreyfield analyzer and the 24-2 full threshold program.We investigate two separate datasets

(1) S
ubjects included 78 patients with establishedearly glaucomatous VF loss and 102 normalvolunteers known not to be sufferers. One VFper subject was used for analysis. Early glauco-matous VF was defined as an AGIS scorebetween 1 and 5, on three consecutive repro-ducible and reliable Humphrey 24-2 strategy VFswith at least one location consistently belowthe threshold for normality [33]. Normal sub-jects had VF tests scoring ‘0’ in the AGIS classi-fication. Also included in these data were twoclinical variables: age and gender.
(2) T
he VFs of 24 subjects attending the OcularHypertension Clinic at Moorfield’s Eye Hospitalwere examined at 4-monthly intervals [34]. Allsubjects suffered ocular hypertension (above 21mmHg) and initially had normal VFs but devel-oped reproducible glaucomatous VF damage in areliable VF in their right eye during the course offollow-up (conversion). 229 of the VF test


Ta

Da

# V# C# V# PPoIOPIOPAgAg

results were classed as converted, where con-version was defined as the development of anAGIS score greater than or equal to 1 from aninitial score of 0, on three consecutive repro-ducible and reliable Humphrey 24-2 strategy VFswith at least one location consistently belowthe threshold for normality. If a patient devel-oped a VF defect, then the test was repeatedwithin one month, and if the same defect wasthen reproduced on a reliable second field, thena third test was performed 3—4 months later.Conversion was confirmed if the field defect waspresent on the three consecutive reliable tests.The dataset in this paper only considers thisthird test as converted. VFs were not re-classi-fied following conversion. For this dataset, theaverage number of field tests in each patient’sseries is 24.5, the maximum is 45, and theminimum is 1. The minimum is one and notthree because during data extraction for oneparticular case the first two tests were notavailable despite the patient having undergone3 tests. We can still use this patient’s data toextract useful non-temporal information. Alsoincluded in the dataset are three clinical vari-ables: age, gender and intraocular pressure(IOP). The latter is known to be a risk factorfor developing glaucoma.

Both datasets are slightly imbalanced (the num-ber of glaucomatous VFs is not equal to the numberof non-glaucomatous VFs). Table 5.2 provides asummary of each dataset.

5.2. Data preprocessing andparameterisation

For the BN models, all continuous data are discre-tised into four states using a frequency-basedmethod, that is bin sizes are determined such thatthere is an equal number of observations in eachbin. This method is useful for learning BNs as itensures that there are enough instances of each

ble 1 Breakdown of the datasets

taset VFstatic VFtemporal

F variables 54 54linical variables 2 3F tests 180 588atients 180 24

s: Neg class ratio 78:102 229:359healthy:mean(sd) NA 25.05(4.31)glaucoma NA 18.60(3.12)

e healthy 58(13) 61(13)e glaucoma 68(9) 66(11)

state in order to perform a reliable inference. Othermethods such as range-based discretisation can leadto states that only occur a small number of timeswithin a database. Discretisation was performed ona point-wise basis.

For our experiments, the following parameterswere used. For the BN learning, we setmaxfc ¼ 50; 000, t0 ¼ 5 and c ¼ 0:9999, with theseparameters defined as in Section 4.1. These werechosen as they were found to be the most efficientbased upon previous empirical studies. We havefound that the individual improvements in scoreduring the early iterations of our algorithm are agood indication for the value of t0. The maximumnumber of parents, b, was varied between 1 and 3depending upon the experiments, and t0 and c wereset so as to finish on a suitably ‘cold’ temperature toensure a stable solution. Inference involved settingthe number of stochastic simulations to 10,000 so asto compromise the time taken to perform inferencewith generating an accurate posterior distribution.

6. Results

The following results are split into two main sec-tions for each dataset. Firstly, the non-temporal VFdata are analysed with respect to classification andstructural analysis. Secondly, a comparison is car-ried out of the classifiers when applied to the multi-variate time series VF data. As well as learning staticmodels from the first dataset and temporal modelsfrom the second, we will also investigate how thenon-temporal classifiers, trained on the non-tem-poral data, perform on unseen time series data andcompare how they perform to the classificationsmade by clinicians. This will give us a more accurateidea of how the classifiers perform when tested on acompletely independent dataset.

6.1. Non-temporal classification andanalysis

6.1.1. Comparison of classifiers and networkanalysisFig. 6 shows the ROC curves of the BN classifierswhen 10-fold cross-validation is applied to eachclassifier on the non-temporal VF data. Alsoincluded is the AUC for each method. It can be seenthat of the BN classifiers the best curve is generatedby BN1, that is the BN with only one parent allowed(AUC is 0.94). The next best is naı̈ve Bayes classifier,with an AUC of 0.9, with TAN and the BN with twoparents performing worst. This could be due tooverfitting as these models will have a higher num-ber of parameters.


Figure 6 ROC curves of BN classifiers on static VF dataincluding AUC scores.

Fig. 7 shows a comparison between the best BNclassifier found with linear regression and k-nearestneighbour (k-nn), where k was chosen using 10-foldcross validation. The BN with one parent performsgenerally better than k-nn (which has an AUC of0.91) and is comparable to linear regression (bothhave an AUC of 0.94).

We now make use of expert knowledge concern-ing the anatomy of the eye in order to assess the

Figure 7 ROC curves of best BN classifier and standardstatistical classifiers on static VF data including AUCscores.

quality of the BN structures learnt. This involvesscoring the mean difference of the optic-nerve-head angle using Eq. (3) over all networks discov-ered using cross validation. Note that a low angledifference between two VF points does not neces-sarily imply spatial proximity of the two points onthe retina. For example, Fig. 2 shows that the twoextreme points in the nasal area have an angle of 83and 278 degrees despite being neighbours. For BN1it was found to be 16.87 degrees, as compared to19.73 and 28.47 for BN2 and TAN, respectively.These results reflect the AUC scores, matchingthe good performance of BN1 over the other classi-fiers. The high optic-nerve-head measure for TANcould be a result of constraining all nodes to belinked to the class node, thus reducing the prob-ability of discovering interesting spatial relation-ships between the VF points.

6.1.2. Comparison of Bayesian inference withclinicians’ decisionsWe now investigate how the non-temporal BN clas-sifiers, trained on the non-temporal data, performon unseen time series data and compare how theyperform to the classifications made by clinicians(according to the AGIS score and conversion criteriaas described in Section 5.1). Fig. 8 shows typicalresults: it plots the probability of a VF being glau-comatous according to the BN classifier with amaximum of one parent (BN1) along with the pointof conversion according to clinicians (dotted line).Note that clinicians did not reclassify the field afterconversion has been determined.

In approximately a third of cases within theunseen data, the probability of glaucoma accordingto the BN classifier rapidly increases from almost 0—1 several time points before the clinician decidesthat the VF has converted to glaucomatous (forexample see Fig. 8a). This may be due to themethods employed by clinicians whereby a VF isnot considered to be glaucomatous until three VFtests in a row have reached the threshold for clas-sification of glaucoma. Another common feature (inabout 38% of cases) was where the probabilityremained low for a while and then began to fluc-tuate from one time point to the next before set-tling on a high probability of glaucomatous, shortlybefore the clinicians classifies the VF as converted(Fig. 8b). This may be useful in that the fluctuationmay be an early indicator that the VF is about toconvert as it tends to begin some time before theclinician makes the decision. Another interestingresult that was observed in about 17% of cases waswhere BN1 classified the VF as glaucomatous fromthe outset (Fig. 8c). This could be due to the factthat the classifier has discovered a feature not used


Figure 8 The probability of glaucoma conversion, pð1jxÞ, generated from a BN1 trained on the static data and tested onthe temporal data (dotted lines represent the clinician’s decision). The figures illustrate the four main characteristicsobserved in the BN1 classifier: (a) gradual increase in probability prior to conversion (b) fluctuation prior to conversion (c)classed as converted throughout (d) noisy classification.

Figure 9 ROC curves of BN classifiers on temporal VFdata including AUC scores.

by the clinicians and has in fact correctly identifiedthe early onset of glaucoma. Examples were alsonoticed during the experiments, where the classi-fications are less clear cut. For example, Fig. 8dimplies that the BN classifier has failed to success-fully classify the VF before or after conversion,though it must be stressed that these were uncom-mon (in less than 13% of cases).

6.2. Analysis of STC for temporalclassification

We now explore how the different classifiers per-form on temporal VF data. This includes the STCwhich makes use of our spatio-temporal learningalgorithms described in Section 4. Fig. 9 shows theROC curves for the different BN classifiers includingthe STC with a maximum of two and three parents (amaximum of four parents would result in moreparameters than data observations). It can be seenthat the worst performers on this dataset are naı̈veBayes and TAN (with respective AUCs of 0.77 and0.79). We also carried out the experiment with BN1and found that the AUC was only 0.76 (and havetherefore omitted it from the figure to make thisclearer). This could be due to there being more datathan on the static dataset and so the more complexmodels are less prone to suffering from over-para-meterisation. The BN classifier with two parents

(BN2) scores an AUC of 0.81. However, the STCswith two and three parents do better with AUCs of0.84 and 0.85, respectively.

Fig. 9 shows that there is a degree of overlappingbetween the ROC curves, particularly at high andlow specificities. The misclassification versus cost


Figure 10 Misclassification cost (Eq. (2)) vs. k0 for BNclassifiers on temporal VF data.

Figure 11 ROC curves of best STC, temporal linearregression and temporal k-nn on temporal VF data includ-ing AUC scores.

chart in Fig. 10 shows how the classifiers performwhen we assign different costs to the misclassifica-tions. This corresponds to considering differentvalues for the threshold t in Eq. (1). After scalingthe costs k0 and k1 such that k0 þ k1 ¼ 1, it followsthat t ¼ k0, the cost of misclassifying a normal eyeas glaucomatous. This is reported on the horizontalaxis. As such, higher values will tend to correspondto higher specificity. On the vertical axis the mis-classification cost of Eq. (2) is plotted. The plotshows that Naı̈ve and TAN perform worse than theother classifiers for all values of the threshold. Thiswas already quite clear from the ROC curves and itimplies that the models are not recommended forany value of the classification threshold. The BNclassifier with two parents performs worse than thetemporal models for an intermediate range of thecost, but is competitive with STC2 and STC3 for lowand high values of specificity. The latter was notthat evident from the ROC curves.

To have a fairer comparison with the STC3 model,we decided to include time delayed variables forthe linear regression and k-nn models. The columnsof the database are now the VF variables and IOP attime t 1 and t, as well as age and gender. Thevalue of k was chosen using 10-fold cross validation.Fig. 11 shows the ROC curve of the STC with amaximum of three parents (STC3) against linearregression and k-nn, both using the time-shiftedvariables. It seems that linear regression does some-what better (AUC = 0.93) than STC and k-nn, both ofwhich have similar ROC curves, the AUCs being 0.89

and 0.88, respectively. When we look at the mis-classification versus cost in Fig. 12, we can see that,whilst linear regression does considerably betterwhen the cost is not extreme (between 0.4 and0.6), the performances of the three methods arerelatively similar beyond these costs, at eitherextreme.

Generally, because glaucoma prevalence is lowand the condition is usually only slowly progressive,a high specificity is often required. This tends tocorrespond to false positive diagnoses beingregarded as more costly. Cost includes harm tothe patient through a false diagnosis as well asthe actual cost of treatment. This means that theright-hand side of the plots in Figs. 10 and 12 is themost interesting. However, in reality, costs dependalso on the individual patient. For instance, in ayoung patient with advanced disease, the cost offailing to identify progression is greater than in anelderly patient with early disease.

The mean optic-nerve-head angle metric of Eq.(3) over all the networks discovered using crossvalidation was equal to 23.73 degrees for bothBN2 and STC3, as compared to 32.82 degrees forSTC2 and 35.08 degrees for TAN. This roughlyreflects the results of the AUC scores (Fig. 9). Notethat the mean angle difference metric is in generalhigher on this dataset than on the static one. Thiscould be due to the fact that the normal andglaucomatous classes are less separated in thetemporal dataset, since all patients are on the vergeof developing glaucoma. This makes the classifica-


Figure 12 Misclassification cost (Eq. (2)) vs. k0 for STC3,temporal linear regression and temporal k-nn on temporalVF data.

Figure 13 Structure discovered by BN1 when applied tostatic data, showing the spatial layout of links on the VF.Also included with asterisks are the direct descendents ofthe class node.

tion problem harder and might be reflected in theangle difference metric as well as in the AUC scores.

In order to test the stability of the classifiersgenerated from repeated experiments, we havelooked at the variation in structures for eachcross-validation run on STC3. We calculated themean structural difference in Eq. (4) for all pairwisecombinations of networks from the cross-validationexperiments of STC3. We then compared this to thedistribution of structural differences of 100 ran-domly generated networks (which had the samerestrictions as the STC3, e.g. a maximum of threeparents). The mean and standard deviation of thestructural differences for the STC3 classifiers was142.44 and 8.28, respectively. This was significantlysmaller than the distribution of structural differ-ences of random networks, which had a mean of176.88 and a standard deviation of 6.37. We foundthat a high proportion of links were discovered inonly 10% of the cross validation experiments (that isonly in one of the networks) and this contributed to ahigher structural difference. For this reason, in thenext section, we will investigate network featuresthat are consistently found between cross validationexperiments in order to increase confidence whenvalidating using clinical knowledge.

7. Discussion

One of the advantages of a BN classifier over otherclassifiers, such as k-nn and linear regression, is the

transparency of the model. That is, all of the rela-tionships are made explicit in the network struc-ture. Fig. 13 shows one such structure with respectto the spatial arrangement of the VF. This has beenlearnt from the non-temporal VF dataset using a BNwith a maximum of one parent.

Marked with an asterisk on this figure are the VFpoints that are discovered with direct links from theclass node. These are the most predictive points inclassifying the VF as glaucomatous or not. Learningthese links can be thought of as a form of featureselection, similar for example to a selective naı̈veBayes classifier [35]. Interestingly, some of thediscovered VF points reflect a common feature ofthe early stages of glaucoma called the ‘nasal step’,where a reduced sensitivity of particular VF pointsaround the nasal and superior peripheral areas indi-cates the early onset of glaucoma (see Fig. 2 foranatomical labels). The frequency with which theselinks occur in the networks learnt during crossvalidation vary from 40% to 80%, with an averageof 60%. This indicates a fairly high confidence in thefeature. Also notice how some of the points markedwith an asterisk appear correlated to others, indi-cating that some interaction between these pointscould assist classification, whilst others appear toassist classification independently of their relation-ship to other points, particularly in the superiortemporal area of the VF (temporal in the anatomicalsense). VF points in this area were associated withthe class node independently of other points in anaverage of 80% of the networks during cross valida-tion, increasing the confidence of the discoveredfeature. The low correlation between these pointsis supported by clinical evidence, as the points inthis area have a larger difference between one


Figure 14 Network structures on temporal VF datausing BN2 (top) and STC3 (bottom). The grey lines repre-sent first order temporal links, the unfilled circles denoteautoregressive links and the asterisks represent directdescendants of the class node.

another in optic-nerve-head angle than in otherareas (Fig. 2). Some other discovered features wereunexpected, such as VF points in the arcuate para-central VF, which are not conventionally thought tobe important points for classification. These fea-tures were found in up to 80% of the networks. Thiswill be followed up in further research and mayshow the potential of BN models in learning linksthat can inform clinicians of interesting patterns inclinical data. Similarly, Fig. 14 shows the networkstructures learnt from the temporal dataset using aBN with a maximum of two parents and the STC witha maximum of three parents. It is evident primarilythat both networks have a distinct spatial nature aswould be expected. The asterisks on the plot markthe direct links from the class node. Notice thatthese are far fewer than those discovered from thenon-temporal data (Fig. 13). This is likely to be dueto the IOP variable, that was available in the tem-poral data and has been found to be a strong risk

factor for glaucoma. In fact the mean number ofdirect descendants of the class node was found toincrease from 4.9 to 8.3 when the experiment wasrepeated with the IOP variable removed. In addi-tion, Fig. 14 shows the autoregressive links (thoselinks from one variable to itself at subsequent timepoints) marked with unfilled circles and temporallinks in grey. Amongst these links, it is interesting tonote the temporal links pointing to an asterisk.These correspond to the situation where a temporallink was found between a VF variable at time t 1and at time t and the VF variable at time t was alsothe child of a non-temporal link from the class node.Therefore, the probability distribution of the classnode is governed by the ‘explaining away effect’[36], where the observation of one parent of a node(here the VF point at time t 1) will affect theposterior distribution of another parent of that node(here the class node). This feature was discovered in70% of the networks. Obviously, this type of inter-action cannot be modelled by non-temporal classi-fiers and the ability of the STC model to capture thismight account for the improvement in classificationperformancewhentemporal linksare included(Fig.9).Whilst providing clear benefits to the model, theinclusion of temporal links can increase the risk ofspurious correlation between variables. For exam-ple, it was noted that on some of the STC networks,like the one in Fig. 14, the blind spot, which shouldbe independent of other VF variables, was linked toother variables. However, links associated with theblind spot were found consistently in less than 20%of the networks, with the exception of immediateneighbours of the blind spot where the frequencywas more than 50%. This phenomenon is supportedby clinical evidence that an enlargement of the blindspot (peripapillary atrophy) is a feature of glaucoma.

The STC model brings many benefits to the glau-coma community. First of all, it allows one to easilycombine different types of data, for example VF,IOP and optic-nerve-head structures. These mea-surements are rarely useful when taken in isolation,so a model that combines them into a commonframework whilst making explicit the relationshipsbetween them is highly useful to clinicians. Sec-ondly, the STC allows one to incorporate the tem-poral aspect of the data in the model. This will helpresearchers model the disease process and learnabout its pathogenesis, which could result in moreaccurate and precise estimates of the rate of pro-gression and in the identification of risk factors forprogression (indeed, the STCs will identify some ofthese themselves) and responses to therapeuticintervention. BN models are similar in nature tothe way in which clinicians work — various exam-inations are made and the result of each is added to


‘the clinical picture’ to either increase or decreasethe probability of disease being present or of dis-ease progressing. As such, the network providesthe clinician with probabilities for abnormality.These are clinically more meaningful than discretethresholds on scores, which are currently used todetermine glaucoma conversion.

8. Conclusion and future work

In this paper, we have investigated a number ofdifferent classifiers for identifying deterioration inthe VF associated with glaucoma. We have focussedon BN classifiers due to their ability to explicitlymodel the relationships between variables. Withinthe family of BN classifiers, we have introduced aspatio-temporal classifier (STC). We have shownhow this classifier leads to an improvement overexisting BN classifiers, whilst being comparable toother statistical classifiers. We have tested theclassifiers on two VF datasets, one non-temporaland one temporal, and have shown how the result-ing BN classifiers can be used to help understand thenature of VF deterioration in the form of networkstructure analysis and inference. We have identifiedwithin the structures various characteristics includ-ing the ‘nasal step’, whereby certain areas of the VFindicate the onset of glaucoma. Inference hasshown that there is a potential to understand howthe clinicians come to their decisions and possiblyuse this information to improve upon the currentclassification algorithms of a VF. We have begunexperiments on some incoming data, which includemore clinical variables such as IOP and medication.We have found interesting results whereby theglaucoma converter class node is regulated by IOPwhich is in turn regulated by whether a patientreceives medication. The medication node itselfis regulated by the glaucoma node and so a cycleexists over time (see Fig. 15).

If someone exhibits high IOP, it is likely to berelated to the risk of converting to glaucomatous VF

Figure 15 A temporal cycle whereby IOP regulatesglaucoma which in turn regulates medication.

loss. If someone suffers from glaucoma, they arelikely to be given medication (e.g. to lower IOP),resulting in their IOP dropping by their next visit.This means that in many cases a low IOP is observeddespite the onset of glaucoma. The temporal natureof this can easily be modelled by temporal BNs.However, such temporal cycles cannot be modelledwith static models. The inclusion of spatial andtemporal relationships within the STC has shownto improve the classification of VF data. The classi-fier is trained on the decision made by clinicians.This requires a repeat of 3 abnormal fields beforeglaucoma is diagnosed. Note that the first two abnor-mal fields are currently classified as healthy and notre-classified after conversion. This might explain thefluctuations in the probability of glaucoma found insome of our experiments (Fig. 8).We intend toinvestigate this in a future study by applying theAGIS score to VFs after conversion. The performanceof the classifier could be improved by modifying thecriterion for conversion to take account of this.Alternatively, one could submit only normal patientdata and allow our models to identify VFs that aresignificantly different from normal.

Since we aim to learn temporal models from VFdata, a natural extension of the work in this paperwould be to see how STC models can be used toforecast future states of the VF given previousobservations. It would also be extremely valuableif we could use these models to predict future classstates given the previous states of a VF. Further-more, we intend to integrate prior clinical knowl-edge into the model. For example, we can integrateexpert knowledge regarding the structure of nodesas well as the angle from each VF point to the optic-nerve-head. BNs are designed to facilitate this typeof integration.

Acknowledgments

The work of Allan Tucker on this project was fundedby the EPSRC, GR/R35018/01. The work of VeronicaVinciotti on this project was funded by the BBSRC,grant 100/EGM17735. We would like to thank NickStrouthidis for his help with collating the visual fielddataset. We would also like to thank the three anony-mous referees for their constructive comments.

References

[1]. Swift S, Liu X. Predicting glaucomatous visual field dete-rioration through short multivariate time series modelling.Artif Intell Med 2002;24(1):5—24.


[2]. Fitzke FW, Hitchings RA, Poinoosawmy D, Mc-Naught AI,Crabb DP. Analysis of visual field progression in glaucoma.Br J Ophthalmol 1996;80(1):40—8.

[3]. Heijl A, Lindgren G, Lindgren A. Extended empirical sta-tistical package for evaluation of single and multiple fieldsin glaucoma: Statpac 2. In: Mills R, Heijl A, editors. Perimetry update 1990/1. Kugler Publications; 1990. p. 303—15.

[4]. Anderssen KE, Jeppesen V. M.Sc. thesis. Classifying visualfield data, Technical report. Aalborg University, Denmark;1998.

[5]. Ibanez MV, Simo A. Spatio-temporal modelling of peri-metric test data, Technical report 55. Department ofMathematics, University Jaume I. Castellon, Spain; 2003.

[6]. Hilton S, Katz J, Zeger S. Classifying visual field data. StatMed 1996;15(1):1349—64.

[7]. Hothorn T, Lausen B. Bagging tree classifiers for laserscanning images: a data and simulation-based strategy.Artif Intell Med 2003;27(1):65—79.

[8]. Chan K, Lee W, Sample PA, Goldbaum MH, Weinreb RN,Sejnowski TJ. Comparison of machine learning and tradi-tional classifiers in glaucoma diagnosis. IEEE Trans BiomedEng 2002;49(9):963—73.

[9]. Goldbaum MH, Sample PA, Chan K, Williams J, Lee T,Blumenthal E, Girkin CA, Zangwill LM, Bowd C, SejnowskiT, Weinreb R. Comparing machine learning classifiers fordiagnosing glaucoma from standard automated perimetry.Invest Ophthalmol Vis Sci 2002;43(1):162—9.

[10]. Heijl A, Lindgren G, Olsson J. Normal variability of staticperimetric threshold values across the central visual field.Arch Ophthalmol 1987;105(1):1544—9.

[11]. Katz L, Sommer A. Asymmetry and variation in the normalhill of vision. Arch Ophthalmol 1986;104(1):65—8.

[12]. Weber J, Rau S. The properties of perimetric thresholds innormal and glaucomatous eyes. Ger J Ophthalmol1992;1(1):79—85.

[13]. Hand DJ. Construction and assessment of classificationrules. Wiley; 1997.

[14]. Vinciotti V, Hand DJ. Scorecard construction with unba-lanced class sizes. J Iran Stat Soc 2003;2(2):189—205.

[15]. Hand DJ, Vinciotti V. Local versus global models for classi-fication problems: fitting models where it matters. Am Stat2003;124(2):131.

[16]. Garway-Heath DF, Fitzke F, Hitchings RA. Mapping thevisual field to the optic disc. Opthalmology 2000;107(2):1809—15.

[17]. Ramoni M, Sebastiani P. Bayesian Methods. In: Berthold M,Hand D, editors. Intelligent data analysis, an introduction.Springer; 1999.

[18]. Cooper GF, Herskovitz E. A Bayesian method for the induc-tion of probabilistic networks from data. Machine Learn1992;9(2):309—47.

[19]. Cooper GF. The computational complexity of probabilisticinference using Bayesian belief networks. Artif Intell1990;42(2):393—405.

[20]. Henrion M. Propagating uncertainty in Bayesian networksby probabilistic logic sampling. In: Lemmer J, Kanal L,

editors. Proceedings of the 2nd Annual Conference onUncertainty in AI. Elsevier Science; 1986. p. 149—63.

[21]. Pearl J. Probabilistic reasoning in intelligent systems:networks of plausible inference. Morgan Kaufmann; 1988.

[22]. Friedman N, Geiger D, Goldszmidt M. Bayesian networkclassifiers. Machine Learn 1997;29(2):131—63.

[23]. Domingos P, Pazzani M. On the optimality of the simpleBayesian classifier under zero-one loss. Machine Learn1997;9(2):309—47.

[24]. Hand DJ, Yu K. Idiot’s Bayes — not so stupid after all? IntStat Rev 2001;69(2):385—98.

[25]. Kohavi R. Scaling up the accuracy of naı̈ve-Bayes classifier:a decision-tree hybrid. In: Simoudis E, Han J, Fayyad U,editors. Proceedings of the 2nd International Conferenceon Knowledge Discovery and Data Mining. AAAI Press; 1996.p. 202—7.

[26]. Langley P, Iba W, Thompson K. An analysis of Bayesianclassifiers. In: Proceedings of the 10th National Conferenceon Artificial Intelligence. AAAI Press; 1992. p. 223—8.

[27]. Dagum P, Galper A, Horvitz E. Dynamic network models forforecasting. In: Dubois D, Wellman M, D’Ambrosio B,Smets P, editors. Proceedings of the 8th Annual Conferenceon Uncertainty in AI. Morgan Kaufmann; 1992 . p.41—8.

[28]. Tucker A, Liu X, Ogden-Swift A. Evolutionary learning ofdynamic probabilistic models with large time lags. Int JIntell Syst 2001;16(5):621—46.

[29]. Tucker A, Garway-Heath D, Liu X, Spatial operators forevolving dynamic probabilistic networks from spatio-tem-poral data. In: CantuPaz E, et al., editors. Proceedings ofthe Genetic and Evolutionary Computation Conference,Springer-Verlag, Chicago, USA; 2003. p. 12—7.

[30]. Liu X, Swift S, Tucker A, Using evolutionary algorithms totackle large scale grouping problems. In: Spector L, et al.,editors. Proceedings of the Genetic and Evolutionary Com-putation Conference, Morgan Kaufmann, San Francisco,USA; 2001. p. 7—11.

[31]. Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simu-lated annealing. Science 1983;220(4598):671—80.

[32]. Grefenstette JJ. Optimization of control parameters forgenetic algorithms, IEEE transactions on systems. ManCybernet 1986;16(1):122—8.

[33]. Advanced. Glaucoma Intervention Study 2. Visual fieldtest scoring and reliability. Ophthalmology 1994;101(1):1445—55.

[34]. Kamal D, Garway-Heath D, Ruben S, O’Sullivan F, Bunce C,Viswanathan A, Franks W, Hitchings R. Results of thebetaxolol versus placebo treatment trial in ocular hyper-tension. Graefes Arch Clin Exp Ophthalmol 2003;241(1):196—203.

[35]. Langley P, Sage S. Induction of selective Bayesianclassifiers. In: Poole D, Lopez de Mantaras R, editors.Proceedings of the 10th Conference on Uncertainty in AI.Morgan Kaufmann; 1994. p. 399—406.

[36]. Wellman MP, Henrion M. Explaining explaining away. IEEETrans Pattern Anal Machine Intell 1993;15(3):287—307.

Documents

A spatio-temporal Bayesian network classifier for understanding visual field deterioration