8
Swarm intelligence in humans: diversity can trump ability Stefan Krause a, * , Richard James b,1 , Jolyon J. Faria c, 2 , Graeme D. Ruxton d, 3 , Jens Krause e, 4 a Department of Electrical Engineering and Computer Science, Lübeck University of Applied Sciences b Department of Physics, University of Bath c Institute of Integrative and Comparative Biology, University of Leeds d Division of Ecology and Evolutionary Biology, University of Glasgow e Leibniz Institute of Freshwater Ecology & Inland Fisheries, Department of Biology and Ecology of Fishes, Berlin article info Article history: Received 22 June 2010 Initial acceptance 3 August 2010 Final acceptance 22 December 2010 Available online 3 March 2011 MS. number: 10-00476R Keywords: decision making group size human behaviour swarm intelligence We identify some of the possibilities and limitations of human swarm intelligence (SI) using the response of the public to two types of cognitive problems. Furthermore, we propose a simple measure for the quantication of collective information that could form the basis for SI in study populations for specic tasks. Our three main results are (1) that the potential benets of SI depend on the type of problem, (2) that individual performance and collective performance can be uncorrelated and that a group of indi- vidually high performers can be outcompeted by a same-size group of individually low performers, and (3) that adding diversity to a group can be more benecial than adding expertise. Our results question the emphasis that societies and organizations can put on individual performance to the detriment of diversity as far as teams are concerned. Nevertheless, it is important to point out that while diversity is a necessary condition for effective SI, diversity alone is clearly not sufcient. Finally, we discuss the potential implications of our ndings for the evolution of group composition and the maintenance of personality diversity in animals. Ó 2011 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved. In addition to the familiar benets of living in groups (such as attacking larger prey more successfully; Krause & Ruxton 2002), it has been recognized that grouping can also help with the solution of cognitive problems, a phenomenon known as swarm intelligence (SI; Krause et al. 2010). SI is well documented in social insects where the contrast between the abilities of single individuals and the colony is particularly striking (Seeley 1995; Bonabeau et al. 1999; Camazine et al. 2001; Couzin 2009), but there are also case studies from other taxonomic groups including sh (Sumpter et al. 2008; Ward et al. 2008) and humans (reviewed in Krause et al. 2010). In speaking of swarm intelligence we follow the denition by Krause et al. (2010, page 29): Two or more individuals inde- pendently, or at least partially independently, acquire information and these different packages of information are combined and processed through social interaction, which provides a solution to a cognitive problem in a way that cannot be implemented by iso- lated individuals. However, in this study the main focus is on the potential value of SI and we allow direct social interactions to be substituted through calculations. After all, once the mechanism for information processing is known, it becomes largely immaterial whether it takes place through real interactions or simulated ones. Here, we assess the utility of swarm intelligence by assuming that individuals would be able to average the information produced by the group: a simple processing rule of individual inputs which has been suggested previously (Simons 2004; Couzin 2009; Faria et al. 2009). Interest in SI has not been limited to biology but also includes disciplines such as psychology, sociology and economics (Krause et al. 2010). In economics, recent emphasis has been placed on studying the performance of so-called prediction markets, which had considerable success in providing better predictions of economic developments and the outcome of elections than some more conventional methods (Arrow et al. 2008). Prediction markets are tools that are used to predict future events, for example the outcomes of elections, by trading virtual shares (Wolfers & Zitzewitz 2004). In psychology, SI is usually described as group decision making and the performance of small groups is compared to that of high-ability single people. Many studies have demon- strated the superior performance of groups over singletons in * Correspondence: S. Krause, Department of Electrical Engineering and Computer Science, Lübeck University of Applied Sciences, Mönkhofer Weg 239, 23562 Lübeck, Germany. E-mail address: [email protected] (S. Krause). 1 R. James is at the Department of Physics, University of Bath, Bath BA27AY, U.K. 2 J. J. Faria is at the Institute of Integrative and Comparative Biology, University of Leeds, Leeds LS2 9JT, U.K. 3 G. D. Ruxton is at the Division of Ecology and Evolutionary Biology, University of Glasgow, Glasgow G12 8QQ, U.K. 4 J. Krause is at the Leibniz Institute of Freshwater Ecology and Inland Fisheries, Department of Biology and Ecology of Fishes, Müggelseedamm 310, 12587 Berlin, Germany. Contents lists available at ScienceDirect Animal Behaviour journal homepage: www.elsevier.com/locate/anbehav 0003-3472/$38.00 Ó 2011 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.anbehav.2010.12.018 Animal Behaviour 81 (2011) 941e948

Swarm intelligence in humans: diversity can trump ability

Embed Size (px)

Citation preview

lable at ScienceDirect

Animal Behaviour 81 (2011) 941e948

Contents lists avai

Animal Behaviour

journal homepage: www.elsevier .com/locate/anbehav

Swarm intelligence in humans: diversity can trump ability

Stefan Krause a,*, Richard James b,1, Jolyon J. Faria c,2, Graeme D. Ruxton d,3, Jens Krause e,4

aDepartment of Electrical Engineering and Computer Science, Lübeck University of Applied SciencesbDepartment of Physics, University of Bathc Institute of Integrative and Comparative Biology, University of LeedsdDivision of Ecology and Evolutionary Biology, University of Glasgowe Leibniz Institute of Freshwater Ecology & Inland Fisheries, Department of Biology and Ecology of Fishes, Berlin

a r t i c l e i n f o

Article history:Received 22 June 2010Initial acceptance 3 August 2010Final acceptance 22 December 2010Available online 3 March 2011MS. number: 10-00476R

Keywords:decision makinggroup sizehuman behaviourswarm intelligence

* Correspondence: S. Krause, Department of ElectricScience, Lübeck University of Applied Sciences, MönkhGermany.

E-mail address: [email protected] (S. Krause).1 R. James is at the Department of Physics, Universi2 J. J. Faria is at the Institute of Integrative and Comp

Leeds, Leeds LS2 9JT, U.K.3 G. D. Ruxton is at the Division of Ecology and Evolu

Glasgow, Glasgow G12 8QQ, U.K.4 J. Krause is at the Leibniz Institute of Freshwater

Department of Biology and Ecology of Fishes, MüggeGermany.

0003-3472/$38.00 � 2011 The Association for the Studoi:10.1016/j.anbehav.2010.12.018

We identify some of the possibilities and limitations of human swarm intelligence (SI) using the responseof the public to two types of cognitive problems. Furthermore, we propose a simple measure for thequantification of collective information that could form the basis for SI in study populations for specifictasks. Our three main results are (1) that the potential benefits of SI depend on the type of problem, (2)that individual performance and collective performance can be uncorrelated and that a group of indi-vidually high performers can be outcompeted by a same-size group of individually low performers, and(3) that adding diversity to a group can be more beneficial than adding expertise. Our results questionthe emphasis that societies and organizations can put on individual performance to the detriment ofdiversity as far as teams are concerned. Nevertheless, it is important to point out that while diversity isa necessary condition for effective SI, diversity alone is clearly not sufficient. Finally, we discuss thepotential implications of our findings for the evolution of group composition and the maintenance ofpersonality diversity in animals.� 2011 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.

In addition to the familiar benefits of living in groups (such asattacking larger prey more successfully; Krause & Ruxton 2002), ithas been recognized that grouping can also help with the solutionof cognitive problems, a phenomenon known as swarm intelligence(SI; Krause et al. 2010). SI is well documented in social insectswhere the contrast between the abilities of single individuals andthe colony is particularly striking (Seeley 1995; Bonabeau et al.1999; Camazine et al. 2001; Couzin 2009), but there are also casestudies from other taxonomic groups including fish (Sumpter et al.2008; Ward et al. 2008) and humans (reviewed in Krause et al.2010). In speaking of swarm intelligence we follow the definitionby Krause et al. (2010, page 29): ‘Two or more individuals inde-pendently, or at least partially independently, acquire informationand these different packages of information are combined and

al Engineering and Computerofer Weg 239, 23562 Lübeck,

ty of Bath, Bath BA2 7AY, U.K.arative Biology, University of

tionary Biology, University of

Ecology and Inland Fisheries,lseedamm 310, 12587 Berlin,

dy of Animal Behaviour. Published

processed through social interaction, which provides a solution toa cognitive problem in a way that cannot be implemented by iso-lated individuals’. However, in this study the main focus is on thepotential value of SI and we allow direct social interactions to besubstituted through calculations. After all, once the mechanism forinformation processing is known, it becomes largely immaterialwhether it takes place through real interactions or simulated ones.Here, we assess the utility of swarm intelligence by assuming thatindividuals would be able to average the information produced bythe group: a simple processing rule of individual inputs which hasbeen suggested previously (Simons 2004; Couzin 2009; Faria et al.2009).

Interest in SI has not been limited to biology but also includesdisciplines such as psychology, sociology and economics (Krauseet al. 2010). In economics, recent emphasis has been placed onstudying the performance of so-called prediction markets, whichhad considerable success in providing better predictions ofeconomic developments and the outcome of elections than somemore conventional methods (Arrow et al. 2008). Predictionmarketsare tools that are used to predict future events, for example theoutcomes of elections, by trading virtual shares (Wolfers &Zitzewitz 2004). In psychology, SI is usually described as groupdecision making and the performance of small groups is comparedto that of high-ability single people. Many studies have demon-strated the superior performance of groups over singletons in

by Elsevier Ltd. All rights reserved.

S. Krause et al. / Animal Behaviour 81 (2011) 941e948942

awide range of situations (Kerr & Tindale 2004). The success of SI inprediction markets has been reported to depend on factors such as(1) motivation of participants to provide information, (2) truthfulopinions, (3) minimization of partisanship and (4) a suitablemechanism for processing the opinions (Wolfers & Zitzewitz 2004;Arrow et al. 2008). Little attention, however, has been given to thelimitations of SI and the consequences of violations of the abovefactors.

Much of the recent attention regarding SI has been focused onthe comparison of group performance and that of high-performingindividuals, asking whether the wisdom of the crowd can beat anexpert’s opinion (Surowiecki 2004). However, if we keep group sizeconstant we are faced with a more interesting problem. Can groupsof lay people beat the experts even when group size is controlledfor? And if so, how could this work? The relationship betweenindividual performance and collective performance is a problem ofgreat interest in many areas of life. In team sports it is often, but byno means always, the team with the greatest star power that wins.Outside the sport and entertainment sector this problem becomesan even more important issue when applied to the solution ofcognitive problems. An assumption underlying the hiring strategyof many organizations is that a group composed of the strongestindividual performers will automatically also make the strongestteam. However, this assumption has been called into question bya theoretical study (Hong & Page 2004) which highlighted the issuethat individual performance levels are just one factor and thatindependence and diversity of opinion are other important criteriathat need to be taken into account when predicting the problem-solving potential of groups. In fact, Hong & Page’s (2004) modelpredicted that groups of diverse problem solvers can (underspecific conditions) outperform groups of high-ability performers.If this were more generally the case then it could have importantimplications for the evolution of personality diversity in animals aswell, because the group composition-dependent contribution ofindividuals might provide an explanation of how personalitydiversity in populations could be maintained.

In this study, we tested Hong & Page’s prediction by presentingvisitors to a public exhibition and a university open day with ques-tions that tested their cognitive abilities. Using these data, weinvestigated the relationship between individual performance andcollective performance, which makes it possible to evaluate therelative importanceofopiniondiversityand individual abilitywhen itcomes to the solution of some cognitive problems. We also assessedmore generally some of the possibilities and limitations of SI.

EXPERIMENT 1: POSSIBILITIES AND LIMITATIONS OF SI

Methods

Experiment 1 was carried out at the Max Liebermann Haus ofthe Stiftung Brandenburger Tor (in Berlin, Germany) in connectionwith a biomimetics exhibition called ‘Prototypen’, which was opento the general public from 24 May to 24 August 2008. Members ofthe public were invited to answer two questions on a computerscreen. The first question (1a) was designed to demonstrate thepower of SI whereas the second question (1b) was aimed at iden-tifying some of its limitations. In 1a, visitors needed to estimatehow many marbles were in a large glass jar (next to the computer)and the correct answer here was 562. In 1b, they were asked toestimate how many times a coin needs to be tossed for the prob-ability that the coin will show heads each time to be roughly assmall as that of winning the German lotto (the correct answer is24). This second question was designed to show that for certaintypes of problems SI would not be beneficial because here expert

knowledge is likely to be superior to the common sense of evena large number of people.

For each question, an introduction screen explained the purposeof the experiment to the participants. All data were collectedanonymously. Individual identities were never known because theparticipants entered their guesses directly into a computer withoutbeing observed (therefore no information on age or gender isavailable).

In some studies, people have been offered a reward for goodanswers (i.e. guesses that were close to the correct answer) toprovide an incentive for truthful reporting (Kerr & Tindale 2004).However, in this scenario, with thousands of people visiting theexhibition, this was not possible. We made the assumption thatpeople who visit an exhibition on biomimetics and pay an entry feeof €4 will be inclined to take the exhibits seriously. To preventindividuals from entering unrealistic guesses we did, however,apply a numerical filter which only allowed values in the range of40e1500 marbles and 2e9999 coin flips to be entered. People whoentered values outside these ranges were asked by the program toenter new values. The data distribution shows that this is not anunreasonable filter to use given that only very few guesses wereequal to these limits (4.2% for the first question and 6.3% for thesecond question).

After entering their guesses, people were provided with feed-back on how much their own guess deviated from the real valueand how close the mean of the guesses of visitors from earlier thatday compared to the real value, to demonstrate to visitors thepower of collective guesses. Visitors, however, were never shownthe real numbers but only bars whose length indicated whether thecollective guess had come closer to the real value than their ownguess. Nevertheless, this made it possible for visitors to entera second informed guess (after a time barrier of 30 s) that wouldhave been likely to get closer to the real value. To control for this,we filtered our data by using only those data points that wereentered at least 120 s after the previous ones. Of course, this doesnot guarantee independence of the filtered guesses, but it seemsunlikely that many visitors waited 120 s or more before theyentered a second guess. The frequency distribution of the originalunfiltered data for the first question (N ¼ 6568) showed an unlikelyhigh peak at the correct value. This frequency peak probably cameabout because some visitors repeatedly entered guesses to improvetheir performance. However, no effect of this was found afterapplying a time distance filter of 120 s (N ¼ 2057). The same holdsfor the coin-flipping problem where the unfiltered data containedN ¼ 6266 guesses and the filtered data N ¼ 1953 guesses. Theshapes of frequency distributions for time intervals � 120 s werealmost identical. Mean, standard deviation, skewness and kurtosisof the sets of data points for these time intervals are very similar(time filter of 120 s: N ¼ 2057, mean ¼ 553.6, SD ¼ 342.0,skewness ¼ 0.655, kurtosis ¼ 0.100; time filter of 240 s: N ¼ 1319,mean ¼ 554.4, SD ¼ 346.7, skewness ¼ 0.642, kurtosis ¼ 0.087).Since the sizes of the data sets decrease with increasing timeintervals, we used the data resulting from a filter of 120 s in ouranalysis.

Forquestion1b, visitorswereallowed to enteramuch larger rangeof values than for 1a. The restrictions for 1a might favour reasonableresults. Therefore, we introduced an analogous restriction for 1b byreplacing all guesses above 1500 by the value 1500. This restrictionhad a major effect on the mean of the guesses, which was 1196.3 forthe unrestricted data, and 498.3 for the restricted data. However, itdid not have any influence on the tendencies shown in Figs 2 and 3 intheResults. Inouranalysisweused the restricteddata forquestion1b.As for question 1a, mean, standard deviation, skewness and kurtosisfor time intervals � 120 s were very similar (time filter of120 s: N ¼ 1953, mean¼ 498.3, SD¼ 581.4, skewness¼ 0.718,

0

2

4

6

8

10

12

0

5

10

15

20

25

30

35

40

45

Rel

ativ

e fr

equ

ency

(%

)

Guess

Good half

62 562 662462 1012 1462

24 124 1024 1474

(a)

(b)

Figure 1. Frequency distributions for visitor guesses of questions (a) 1a (marbles) and(b) 1b (coin-flipping problem). The interval size is 50, and the interval limits are chosensuch that the correct values (562 for 1a and 24 for 1b) are the midpoints of theirintervals. For both distributions a time filter of 120 s was applied. For (b), all guessesabove 1500 were replaced by the value 1500.

S. Krause et al. / Animal Behaviour 81 (2011) 941e948 943

kurtosis¼ �1.128; time filter of 240 s: N ¼ 1253, mean¼ 499.5,SD¼ 573.4, skewness¼ 0.713, kurtosis¼ �1.099).

Given that we had a large number of guesses (about 2000) foreach question, we regarded our data sets as populations andinvestigated the influence of group size on the quality of collectiveguesses by drawing a number of groups of certain sizes (between 2and 80). The sample sizes were smaller than 5% of the populationsize. Statistical textbooks (for example, Dorofeev & Grant 2006)suggest that in this case we may safely ignore the fact that ourpopulation is finite. We drew sample groups without replacementand for eachgroupdetermined the collective error bycomputing theunsigned deviation of the group mean from the correct value. Also,to compare the groups’ achievements to that of single individualsfrom each sample group we picked individual guesses in thefollowing way. First, we sorted the guesses in a group according totheir individual error, that is, their unsigned deviation from thecorrect value. For a guess g and the correct value cwe computed thisdeviation in the following way: d ¼ jg � cj. Then we randomlypicked one individual guess from the top half, and one from the topquarter. This simulates a situation where some information isavailable about the experience or expertise of individuals. It is aninteresting question whether, for a certain problem, we should relyon a group’s collective guess, that is, on SI, or rather on expertknowledge. By comparing the performance of groups to that ofsingle ‘experts’ in these groups, we can draw conclusions about thepotential of SI in our sets of data points for each of the twoquestions.

We can go one step further by investigating the relationshipbetween individual performance and collective performance,testing Hong & Page’s (2004) prediction that opinion diversity canpotentially outweigh individual ability when it comes to the solu-tion of some cognitive problems. If group performance does notsimply depend on the individual performance but also on thediversity of guesses, then it should not be possible to improve(significantly) a group’s performance by removing the poor guessesand just keeping the good ones. To find out whether this is the casefor our experiment, we sorted all guesses according to their indi-vidual errors. Then, for each possible error threshold t we split thewhole population into two sets, one that contained guesses with anerror less than or equal to t, and one that contained the remainingguesses (with an error greater than t).

Results

The mean and median for guesses for the marbles (experiment1a) were 553.6 and 516, respectively, and came within 1.5% and8.2% of the correct value of 562. For the coin-flipping problem (1b),however, where the correct answer was 24, values of 498 and 100(for mean and median) indicate that the collective guesses did notget anywhere near the real value. The frequency distributions areshown in Fig. 1.

It might have been expected that the guesses in experiment 1awould be normally distributed. However, this is not the case even ifwe smooth the guesses that are close to the boundaries of theallowed input interval (40e1500). To test this we replaced allguesses that have a distance greater than 500 from the mean, thatis, guesses outside the range 553 � 500, with random values froma normal distributionwith the samemean and the same variance asthe set of all guesses. Both the original and the smoothed set ofguesses are not normally distributed (ShapiroeWilk test: originalset: W ¼ 0.9533, P < 0.0001; smoothed set: W ¼ 0.9883,P < 0.0001). One reason for this may be the decimal numbersystem. Many people were attracted by round numbers. About 40%of the guesses weremultiples of 100, and about 69% weremultiplesof 10. The three most frequent values were 500 (7.5% of the

guesses), 1000 (7.0%) and 300 (3.9%). As can be seen in Fig. 1a, thedistribution is slightly asymmetric (skewness ¼ 0.655).

The frequency distribution for experiment 1b is clearly non-normal (ShapiroeWilk test: W ¼ 0.5531, P < 0.0001). Here, fewervalues than in experiment 1a were multiples of 100 (29%) and of10 (42%). The three most frequent values were 1000 (10% of theguesses), 24 (6%) and 6 (4.5%).

For a number of different group sizes (between 2 and 80) wedrew 1 million sample groups without replacement from the set ofall guesses for each question. For each drawn group we computedthe errors of the collective guess, an individual from the top halfand an individual from the top quarter of this group as describedabove. Here, ‘error’ means the unsigned deviation from the correctvalue. Figure 2a shows the relative frequency of guesses for ques-tion 1a with an error less than 10% as a function of group size. Forexample, groups with 20 members have a 50% chance of providinga result that has an error of no greater than 10% and it takes50 people to reach a 75% chance. For very small group sizes (below10) the relative frequenciesof goodguesses grow faster for randomlyselected individuals from the top half or the top quarter than forcollective guesses. However, the relative frequencies for such indi-viduals are limited by (multiples of) the percentage of high-per-forming individuals in the population. As a consequence, for groupsizes � 40 even individuals from the top quarterwere outperformedby the collective guesses. The shape of the curve for the collectiveguesses in Fig. 2a is not surprising. Each groupmean canbe regardedas an estimation of the population mean (i.e. the mean of allguesses). It is a well-known statistical fact that the accuracy of suchan estimation increases with the square root of the sample size.Since the mean of all guesses is close to the correct value, the

0

25

50

75

100

10 20 30 40 50 60 70 80

0

25

50

75

100

10 20 30 40 50 60 70 80

Rel

ativ

e fr

equ

ency

(%

)

Group size

(a)

(b)

Figure 2. Relative frequencies of values with errors below a fixed threshold asa function of group size in (a) experiment 1a (marbles) and (b) experiment 1b (coin-flipping problem). The errors are mean unsigned deviations of sample means (blackdiamonds), and of randomly picked individuals from the top halves (grey triangles)and of the top quarters (grey circles) of the samples, from the real value of 562 marblesin (a) and of 24 coin flips in (b). The error thresholds were 10% for (a) and 50% for (b).For each data point 1 million replicates were carried out.

S. Krause et al. / Animal Behaviour 81 (2011) 941e948944

percentage of collective guesses below the error threshold alsoroughly increases with the square root of the group size.

For the coin-flipping problem (1b), individuals from the top halfand top quarter were limited in the same way as described above(Fig. 2b). Nevertheless, they were consistently more accurate thanthe collective guesses. If we are interested in getting a guess for thesecond problem with an error of less than 50% from a group of anysize, we need to select a guess from the top quarter rather thanrelying on the collective mean.

For each question in experiment 1we split thewhole populationof guesses into two sets: one that contained guesses with an error(i.e. unsigned deviation from the correct value) less than or equal toa certain error threshold t, and one that contained the remainingguesses (with an unsigned deviation greater than t). Figure 3 showsthe unsigned deviations of the means of both sets as a function ofthe error threshold. For the marble problem (1a) no clear rela-tionship between the quality of individual guesses and the qualityof collective guesses is observed (Fig. 3a). In particular, with anerror threshold that splits the population in half (indicated by the50% line in Fig. 3a), the half with the good individual guesses yieldsa surprisingly bad collective result compared to the half with poorindividual guesses. The curves in Fig. 3a show a number of jumps,which are caused by the fact that people were attracted by roundnumbers. The value 500 had a relative frequency of 7.5% while only0.2% of the guesses were in the range 491e499 and 0.4% in therange 501e509. As a consequence, a small shift of the errorthreshold, for example from 62 to 63, could have a major effect onthe relative error of the two sets. With a threshold of 62 the guessesequal to 500 belong to the set of good guesses, and with a thresholdof 63 they belong to the poor guesses.

For the coin-flipping problem (1b), however, we see that expertknowledge was consistently superior to collective guesses (Fig. 3b).Here, the curve also contains jumps. However, because of the largescale of the Y axis and the monotonic increase of the curve, thejumps in Fig. 3b are hardly visible.

For the marble problem it is not clear whether most of the goodguesses really are from ‘experts’. It might be the case that ‘nonex-perts’were attracted by round numbers such as 500 or 600. For thecoin-flipping problem, which obviously requires expert knowledge,it should be reasonable to regard good guesses as guesses fromexperts (people with previous experience of probability calcula-tions). Most visitors answered both questions. Therefore, we canalso analyse the relationship between the individual abilityregarding question 1b and the quality of collective guesses forquestion 1a. Figure 3c shows the performance of groups for ques-tion 1a where the error threshold t used to split the populationwascomputed in terms of the performance for question 1b. Again, as inFig. 3a, no clear relationship between the individual abilities andthe quality of collective guesses can be concluded. Also, when thepopulation was split in half, the nonexperts performed better thanthe experts. However, it can be seen that small groups of peoplethat performed poorly on question 1b did not produce reliableresults. This is because of the large variance of their guesses. Smallgroups of experts seem to produce more stable results, althoughthey underestimate the correct value.

The numerical filter applied to the guesses for question 1amighthave had some influence on the results. To check the robustness ofour results we repeated the analysis with a data set where all valuesequal to 40 (the lower limit) were replaced by uniformly distrib-uted random numbers in the range 1e40, and where all valuesequal to 1500 (the upper limit) were replaced by uniformlydistributed random numbers in the range 1500e2000. The resultsshowed the same tendency as the results for the original data set.

EXPERIMENT 2: RELEVANCE OF INTERINDIVIDUALDIFFERENCES

Methods

Experiment 2 was carried out at the University of Leeds on11 October 2008 during an Open Day. Visitors (N ¼ 50), mostlyschool pupils aged 16e18 and their parents, were asked to estimatethe number of beans in 10 jars. Each jar contained a randomlygenerated number of beans between 80 and 1000 (309, 865, 422,784, 234, 379, 116, 544, 214 and 612). The jars had different shapes(e.g. a round-bottomed flask, a cylinder and a cuboid) to reducedependencies between the guesses that might result from theestimation of relative differences between the numbers of beans injars of the same shape. Everyone estimated the 10 jars in the sameorder. Unlike in experiment 1, the participants did not get feedback,and they were not allowed to enter multiple guesses. Therefore, nofiltering of the guesses was carried out. All data were collectedanonymously and with the permission of the participants.

In this second experiment, we further explored the relationshipbetween individual performance and collective performance bydeveloping performance profiles of individuals over multipleguesses of different quantities. This experiment allowed a moredetailed analysis of the relevance of individual ability for SI thanexperiment 1 because performance profiles allow us to sort ourstudy population reliably into good and poor performers. Therebywe can specifically test the contribution that individuals make tothe collective performance relative to their own performancelevel.

By analysing individual guesses for multiple jars, we were ableto measure the individual performance and the potential individual

0

1

2

3

4

5

6

50 100 150 200 250

Rel

ativ

e er

ror

(%)

Error threshold for good guesses

0

1000

2000

3000

4000

20 40 60 80 100

Rel

ativ

e er

ror

(%)

Error threshold for good guesses

500

520

540

560

580

600

620

1 10 100 1000 10000

Gro

up

mea

n

Error threshold for group of good performers on question 1b

Correct value = 562

10% 20%30%

40%

50%

50%

10% 20% 30% 40% 50%

(a)

(b)

(c)

Figure 3. Quality of the group mean as a function of the quality of individual guesses in a group for (a) the marble problem and (b) the coin-flipping problem. For each errorthreshold t the individual guesses were split into the ‘good’ ones (absolute error � t) and the ‘poor’ ones (absolute error > t). The black curve shows the unsigned relative error of themean of the good guesses and the grey curve shows the unsigned relative error of the mean of the poor guesses. Dashed vertical lines indicate population percentages belonging todata points of the black curve. (c) The group mean for the marble problem as a function of individual performance on the coin-flipping problem (without the filter of 1500). Theblack curve shows the mean of guesses that came from individuals who provided ‘good’ guesses for the coin-flipping problem and the grey curve shows the mean of the remainingguesses.

S. Krause et al. / Animal Behaviour 81 (2011) 941e948 945

contribution to collective results in the context of a group. We usedthis information in an attempt to improve collective guesses byassigning weights to individuals. Our aim was to find out, givena problem where SI is beneficial, whether the weights should beassigned according to the individual’s previous performance orwhether the weights should rather be assigned according to the

individual’s potential contribution to collective results. In otherwords, to increase collective performance, is it better to rely onindividual performance levels or rather on diversity? In thefollowing paragraphs, we explain this issue in more detail.

Let us assume that we have the guesses of N individuals for nineof the 10 jars with a known number of beans in it. Then we can

0

0.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5

Cu

mu

lati

ve f

req

uen

cy

No weightsContribution-orientedweights 1,2Performance-orientedweights 1,2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4

Sum of relative errors on all 10 jars

Contribution-orientedPerformance-oriented

Weights1−21−31−4

Weights1−21−31−4

(a)

(b)

Figure 4. Cumulative frequencies of overall collective errors for three scenarios inexperiment 2. Weighted collective guesses were computed for each jar Jk, where theweights were determined by optimizing the weighted collective guesses for all othernine jars, excluding jar Jk. (a) Overall error of weighted collective guesses of 10 000randomly selected groups of size 10 for the bean jars from experiment 2. The weights 1and 2 were assigned following a contribution-oriented weighting scheme (grey curve),and following a performance-oriented weighting scheme (black dashed curve). Theblack curve shows the overall error of unweighted collective guesses. (b) Like (a) butwith additional curves for weight ranges 1e3 and 1e4.

S. Krause et al. / Animal Behaviour 81 (2011) 941e948946

compute optimal weights for each individual such that the overallcollective error on the nine jars is as small as possible. Theseweights represent the relevance of the contribution of each indi-vidual in the context of a group consisting of the N peopleregardless of the individual performance. In a second step we cancheck whether these weights are useful for future collectiveguesses by applying them to the guesses of the N individuals fora 10th jar that has not been taken into account when computing theweights.

With the above-described weighting scheme it can happen thatindividually low performers have higher weights than individuallyhigh performers. It is an interesting questionwhether the collectiveperformance increases if we restrict the weighting scheme suchthat the weight assigned to an individual I1 that performs betterthan an individual I2 must be greater than or equal to the weightassigned to I2.

To find out whether or not the performance-orientedweighting scheme outperforms the purely contribution-orientedone, it is not sufficient to look at the performance of one group onone focal jar. Therefore, we randomly selected 10 000 groups ofsize 10 from our data. For each group we computed weightedcollective guesses for each single jar (focal jar) by applying theweighting schemes to the remaining nine jars (all jars but thefocal one).

More formally, to measure the group performance, for a group Gand a jar J we computed the relative error reðG; JÞ ¼jmeanG;J � cJ j=cJ . Here, meanG,J denotes the (possibly weighted)mean of group G for jar J and cJ denotes the correct number of beansin jar J. We used relative errors to make the errors for different jarscomparable. We defined the group’s overall error for a sequence ofjars to be the sum of the relative errors for all jars in this sequence.In the same way, we defined the overall individual performance fora sequence of jars to be the sum of the individual relative errors forall jars in this sequence.

For the investigation of the purely contribution-orientedweighting scheme we proceeded as follows. Given a group G anda jar Jk we assigned weights to the group members such that thesum of all re(G,Ji), 1 � i � 10, is k, reached a minimum value. Forthe performance-oriented weighting scheme we additionally tookinto account the constraint that an individual with greater overallerror (i.e. lower performance) must not be assigned a weightgreater than that of an individual with smaller overall error (i.e.higher performance).

Results

Figure 4a shows cumulative frequency distributions of overallerrors for all 10 jars for three scenarios based on 10 000 randomlyselected groups of size 10. In the first scenario, no weights wereassigned. In the second scenario, the weights 1 or 2 were assignedto individuals in a group in an unconstrained way and regardless ofthe individual performance (following purely the contribution-orientedweighting scheme as described in theMethods) for each ofthe 10 jars as a focal jar. In the third scenario, again the weights 1 or2 were assigned, but additionally the individual performance wastaken into account (following the performance-oriented weightingscheme as described in the Methods).

Both types of weight assignment generally improved collectiveperformance (Fig. 4a). Contribution-oriented weight assignmentson average consistently outperformed the ones that were con-strained by considerations of individual performance (Fig. 4a).When larger weight ranges were used (1e3 or 1e4), the differencesbetween the two approaches (i.e. constrained and unconstrained)became even bigger (Fig. 4b).

DISCUSSION

Our results show that SI (or at least the SI potential we identi-fied) performed well for the marble problem (experiment 1a) butnot for the coin-flipping problem (1b). Plotting performance asa function of group size can provide a good measure of whethera given problem is amenable to the SI approach in the first place(Fig. 2). Regarding the marble problem, averaging of opinions canresult in a good approximation of the correct value because theguesses are imprecise but not fundamentally biased as they are inthe coin-flipping problem. Our results show that the groups wouldonly outperform experts for group sizes above approximately 40individuals. Studies on humans (Wolfers & Zitzewitz: predictionmarkets) and nonhuman animals (Camazine et al. 2001; Krause &Ruxton 2002) show that groups of such sizes that make collectivedecisions are common. Our results suggest that the benefits of SIincrease with increasing group size. However, there can be costs ofSI that also increase, for example, the time needed to make deci-sions (Krause et al. 2010) because the individual packages ofinformation need to be combined and processed through socialinteraction. Averaging has been suggested as a simple processingrule of individual inputs (Simons 2004; Couzin 2009) and there issome evidence from navigation behaviour in human groups(N ¼ 10 individuals) that individual interactions result in someform of averaging process (Faria et al. 2009). However, despitesome progress in this area (Camazine et al. 2001), the contributionthat individuals make to a collective decision-making processremains an interesting challenge for future studies.

S. Krause et al. / Animal Behaviour 81 (2011) 941e948 947

Quantity guesses by humans often show a logarithmic distri-bution (Longo & Lourenco 2007). Therefore the question mightarise whether, for both experiments 1 and 2, one should take thelogarithm of all number guesses performed by visitors instead ofthe untransformed data we used above. We preferred to use theuntransformed data for two reasons. First, we found no clearevidence for a logarithmic distribution of guesses in our data.Second, and more importantly, we are concerned with how indi-vidual guesses perform relative to collective guesses in this study.The latter is based on the assumption that the group members findsomeway to process the individual guesses (see Krause et al.’s 2010definition of SI) to arrive at a group response. While simpleaveraging has been proposed as a likely processing rule in thiscontext, it is not clear howa complex transformation function couldhave evolved in human information processing based on socialinteraction.

One important question is how to evaluate the performance ofSI and how to evaluate the potential for SI in study populations. Wefollowed previous studies (Kerr & Tindale 2004) and evaluatedgroup performance compared to high-performing singletons. Ourapproach, however, means that the experimenter can identify thegroup size that achieves an acceptable performance level fora collective guess by selecting the probability with which a collec-tive answer should fall within a certain permissible error range. Forinstance, if we think that it is satisfactory to have a 50% chance ofgetting within 10% of the correct value then we know that wewould need to get the opinions of about 20 people for this kind ofproblem (Fig. 2a).

Since Galton (1907) first published his work on SI there havebeen many empirical studies that show that groups can beatexperts (Kerr & Tindale 2004). However, work by Clearwater et al.(1991) showed that agent diversity could speed up problemsolving and Hong & Page’s (2004) model predicted that even forsame-size groups, high-ability performers could potentially beoutcompeted by diverse problem solvers, which means thatdiversity can trump ability. In our data from experiment 1a, no clearrelationship between individual performance (deviation of guessfrom correct value) and collective performance (deviation of meanguess from correct value) was found for the marble question. Infact, the group made up of the individually high-performing tophalf of our exhibition visitors performed collectively considerablyworse than the group of the individually low-performing half. Thelatter comparison has to be approached with considerable cautionof course. The data from experiment 1 were filtered (to deal withthe possibility of repeated guesses and extreme values) and weconducted this experiment only once and therefore do not havereplication for our collective performance of the upper and lowerhalves. On the other hand, with a sample size of about 2000 indi-viduals, our result is unlikely to be influenced by stochastic effectsin a major way and we are approaching the magnitude of samplesize that is used for surveys to forecast local election results in somecountries.

A close inspection of Figs 1a, 3a and c makes clear that the poorcollective performance in the marble problem of the better half isexplained by the fact that many individuals among the highperformers slightly underestimated the correct value (see brace inFig. 1a), whereas the low performers have an error distributionwhose mean more closely approximates the correct value. Thisphenomenon raises a number of important questions. Are wedealing with an unusual situation in our data set, or should wegenerally expect high performance (in humans and nonhumananimals) in cognitive problems to be associated with directionalerror bias? Hong & Page (2004) showed in their model that diver-sity has to be low among the highest-ability problem solvers. Anda lack of diversity in turn could easily lead to systematic bias. For

instance, if people employ a similar method for solving a problemthis could bias the probability that they over- or underestimate.This effect may be meaningless at the individual level because herethe use of a method is only meant to improve individual accuracy.But these individual errors cannot cancel each other out, whichcollectively could come at a cost of relatively poor group perfor-mance. In our data set, those individuals who were likely to haveused a sophisticated method in making their guesses instead ofpure guessing (which we can infer from their ability to givea reasonably good answer to the coin-flipping question) areprecisely the ones who created the underestimation regarding themarble question.

Our results from experiment 2 regarding the optimization ofgroup performance demonstrate that all individuals can potentiallymake a contribution regardless of ability. This result is in contrast tosome models of prediction markets in which a performance profileis built up over time to give higher weighting to high performersthan to poor ones (Chen et al. 2003). In our study (experiment 2),weighting schemes that make the best use of any individualregardless of his or her individual performance aremore successful.This result is a powerful argument in favour of diversity andsuggests we should question the strong belief (generally held inmany Western societies) in individual performance regardless ofthe context. However, it is important to keep in mind that whilediversity seems to be a necessary condition for swarm intelligence,it is clearly not a sufficient one. Our results (Fig. 4) indicate that it isa particular combination of opinions that produces the desiredeffect of low collective error and not just any type of diversity ofopinion. As Hong & Page (2004) pointed out in their theoreticalstudy, the contribution that an individual can make is largelycontextual.

Our results potentially have interesting consequences for groupcomposition and group choice in animals. If the contribution thatan individual can make (to a group) is often contextual, thengroups with an optimal composition of individual types (orpersonalities) should thrive (in contrast to other groups). Optimalgroup size models do not take into account the different contri-butions that different individual types might be able to make togroups (Krause & Ruxton 2002) which could introduce intriguingnew complexities. For example, whether a newcomer wants tojoin a group and remain in it (or whether it is tolerated) might bepartly dependent on the compatibility of its behavioural type withthat of the other group members. Therefore, our finding thatdiversity can trump ability might also have important implicationsfor the evolution of personality diversity in nonhuman animalsbecause it potentially identifies a mechanism through whichpersonality diversity in populations could be maintained. On themechanistic side relatively little work has been done so far on thecontribution of individual diversity in the context of collectivedecisions in animal groups which should be a promising area forfuture research.

The fact that organizations in most societies primarily useindividual ability for hiring people and building teams and not theirsuitability for collective decision making has at least partly to dowith the problem that swarm intelligence is less well understood(than individual performance indicators) and therefore moredifficult to assess, predict and harness. Another reason could bethat the performance difference between diverse and high-abilityindividuals in our experiments concerns the average performance.If wewere primarily concerned about minimizing the impact of theworst-case scenario, then groups of high-ability performers wouldbe preferable because their performance has the smaller standarddeviation. However, large standard deviations in group responsesare mainly a problem for small groups (as in experiment 2) and notso much for large ones (as in experiment 1).

S. Krause et al. / Animal Behaviour 81 (2011) 941e948948

Given the nature of our data collection, we do not know theorigin of our performance diversity. It is possible that it is correlatedwith identity diversity (differences in race, gender, ethnic back-ground) and functional diversity (differences in the approach usedfor problem solving; Polzer et al. 2002; Jackson et al. 2003; Mathieuet al. 2008). Studies that have investigated the effect of identitydiversity on performance (in terms of financial gain) in teams ororganizations have obtained mixed results with no clear indicationthat diversity is always beneficial (Jackson et al. 2003). This is atleast partly because differences in perspective can create commu-nication barriers and because identity diversity can result in a lackof trust and mutual respect (Jackson et al. 2003). One could furtherimagine that diversity in the motivation for carrying out a task wellcould be another hurdle in this context. Most sociological studiescorrelate differences in group diversity with a performance outputsuch as financial gain or self-reported team efficiency (Jackson et al.2003). This approach, however, does not allow for quantitativepredictions of what diversity could potentially contribute. Theresults by Hong & Page (2004), which champion functional diver-sity, and our results, which advocate diversity in ability and errordirection for problem solving, point to a performance potential(obtained through increased diversity) which can be realized if noconfounding factors are in operation. In conclusion, carefulmodelling in combination with experimental work is required tofurther our understanding of the fundamental problem of howdiversity influences group performance.

Acknowledgments

We thank Markus Peter, Hans Timmermann, Christos Ioannou,Heather Lyon, Rebecca Henderso, Tommaso Pizzari and two anon-ymous referees for their constructive comments. Financial supportwas provided by the Natural Environment Research Council toJ.K. and G.D.R. J.K. also acknowledges funding from the IGB.

References

Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O.,Levmore, S., Litan, R., Milgrom, P. & Nelson, F. D., et al. 2008. The promise ofprediction markets. Science, 320, 877e878.

Bonabeau, E., Dorigo, M. & Theraulaz, G. 1999. Swarm Intelligence: From Natural toArtificial Systems. Oxford: Oxford University Press.

Camazine, S., Deneubourg, J.-L., Franks, N. R., Sneyd, J., Theraulaz, G. &Bonabeau, E. 2001. Self-organization in Biological Systems. Princeton, NewJersey: Princeton University Press.

Clearwater, S. H., Huberman, B. A. & Hogg, T. 1991. Cooperative solution ofconstraint satisfaction problems. Science, 254, 1181e1183.

Chen, K. Y., Fine, L. R. & Bernardo, A. H. 2003. Predicting the future. InformationSystems Frontiers, 5, 47e61.

Couzin, I. D. 2009. Collective cognition in animal groups. Trends in CognitiveSciences, 13, 36e43.

Dorofeev, S. & Grant, P. 2006. Statistics for Real-life Sample Surveys. Cambridge:Cambridge University Press.

Faria, J. J., Codling, E. A., Dyer, J. R. G., Trillmich, F. & Krause, J. 2009. Navigation inhuman crowds; testing the many wrongs principle. Animal Behaviour,78, 587e591.

Galton, F. 1907. Vox populi. Nature, 75, 450e451.Hong, L. & Page, S. E. 2004. Groups of diverse problem solvers can outperform

groups of high-ability problem solvers. Proceedings of the National Academy ofSciences, U.S.A., 101, 16385e16389.

Jackson, S. E., Joshi, A. & Erhardt, N. L. 2003. Recent research on team and orga-nizational diversity: SWOT analysis and implications. Journal of Management,29, 801e830.

Kerr, N. L. & Tindale, R. S. 2004. Group performance and decision making. AnnualReview of Psychology, 55, 623e655.

Krause, J. & Ruxton, G. D. 2002. Living in Groups. Oxford: Oxford University Press.Krause, J., Ruxton, G. D. & Krause, S. 2010. Swarm intelligence in animals and

humans. Trends in Ecology & Evolution, 25, 28e34.Longo, M. R. & Lourenco, S. F. 2007. Spatial attention and the mental number line:

evidence for characteristic biases and compression. Neuropsychologia,45, 1400e1406.

Mathieu, J., Maynard, M. T., Rapp, T. & Gilson, L. 2008. Team effectiveness1997e2007: a review of recent advancements and a glimpse into the future.Journal of Management, 34, 410e476.

Polzer, J. T., Milton, L. P. & Swann, W. B. Jr. 2002. Capitalizing on diversity:interpersonal congruence in small work groups. Administrative Science Quar-terly, 47, 296e324.

Seeley, T. H. 1995. The Wisdom of the Hive. Cambridge, Massachusetts: HarvardUniversity Press.

Simons, A. M. 2004. Many wrongs: the advantage of group navigation. Trends inEcology & Evolution, 19, 453e455.

Sumpter, D. J. T., Krause, J., James, R., Couzin, I. D. & Ward, A. J. W. 2008.Consensus decision-making by fish. Current Biology, 18, 1773e1777.

Surowiecki, J. 2004. The Wisdom of Crowds: Why the Many are Smarter than the Few.New York: Little Brown.

Ward, A. J. W., Sumpter, D. J. T., Couzin, I. D., Hart, P. J. B. & Krause, J.2008. Quorum decision-making facilitates information transfer in fishshoals. Proceedings of the National Academy of Sciences, U.S.A., 105,6948e6953.

Wolfers, J. & Zitzewitz, E. 2004. Prediction markets. Journal of Economic Perspec-tives, 18, 107e126.