Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Paper to be presented at the DRUID Academy Conference 2018at University of Southern Denmark, Odense, Denmark
January 17-19, 2018
The Effect of Peer Pressure on Performance in Crowdsourcing Contests
Jonas HeiteMax Planck Institute for Innovation and Competition
Innovation and Entrepreneurship [email protected]
Karin HoislUniversity of Mannheim
Chair of Organization and [email protected]
AbstractIn the present study, we investigate whether andwhy performance differences exist betweencontestants with the same abilities but whocompete against more skilled or less skilledcontestants. Performance in contests is a functionof the ability and the effort of the contestants.Whereas the ability of the individual contestant isexogenous, the effort can be influenced by thedesign of a contest. The design configurationsthat have been studied, so far, include thestructure and level of prizes (Ehrenberg andBognanno 1990, Orszag 1994), the number ofcontestants (Boudreau et al. 2016, Garcia and Tor2009), or the composition of the group ofcontestants (Brown 2011, Konrad 2009, Tanakaand Ishino 2012).
Recently, ability-configurations of groups ofcontestants have attracted notable attention ofeconomists and management scholars. Competingagainst contestants with the same ability providesincentives to maximize effort in order to win aprize. Competing against contestants with ahigher ability, on the contrary, was shown todecrease the performance of the lower-ability
contestants. The negative relationship is evenmore pronounced in tournaments that contain?star performers?, since contestants reduce theireffort, because they assume that it will not sufficeto overcome the ability gap between them andthe other contestants (Brown 2011). Competingagainst contestants with a lower ability shouldalso lead to a reduction in effort caused by afeeling of superiority to the other contestants(Tanaka and Ishino 2012).
To shed light on the mechanisms causingperformance differentials, we use data oncrowdsourcing contests hosted on the topcoderplatform. The data allow us to implement aRegression Discontinuity Design analysis. Wecompare two groups of contestants characterizedby the same abilities. One group competes againstcontestants who are equally or more skilled, theother group competes against contestants whoare equally or less skilled. Based on the literatureabove and in case the two groups act rationally,we expect similar performance for both groups.Interestingly, our results show thatbottom-performers of a high-ability group are
characterized by a performance that is 36 % lowerthan that of contestants who have the same skilllevel but compete as top-performers of alow-ability group.
To investigate the mechanisms causing theseresults, we investigate the behavior of thecontestants. First evidence shows that thebottom-performers of a high-ability group putmore effort into their task by trying to solve moredifficult problems than individuals who competeagainst equally or less skilled contestants. Sincerational behavior cannot explain thisoverinvestment in effort of the former group, wetest the explanatory power of behavioral factorslike the willingness to take risks (Konrad andLommerud 1993), or mistakes (Camerer et al.2011, Sheremeta 2014).
References
Boudreau, K.J., Lakhani, K.R., Menietti, M. (2016).Performance responses to competition across skilllevels in rank?order tournaments: field evidenceand implications for tournament design. TheRAND Journal of Economics, 47(1), 140-165.
Brown, J. (2011). Quitters Never Win: The(Adverse) Incentive Effect of Competing withSuperstars. Journal of Political Economy, 119,982-1013.
Camerer, C.F., Loewenstein, G., Rabin, M. (2011).Advances in behavioral economics, PrincetonUniversity Press.
Ehrenberg, R.G., Bognanno, M.L. (1990). DoTournaments Have Incentive Effects? Journal ofPolitical Economy, 98, 1307-1324.
Garcia, S.M., Tor, A. (2009). The N-Effect: MoreCompetitors, Less Competition. PsychologicalScience, 20, 871-877.
Konrad, K.A. (2009). Strategy and Dynamics inContests. Oxford, UK: Oxford University Press.
Konrad, K.A., Lommerud, K.E. (1993). Relativestanding comparisons, risk taking, and safetyregulations. Journal of Public Economics, 51(3),345-358.
Sheremeta, R.M. (2014). Behavior in Contests.MPRA Paper No. 57451, July 21, 2014,
http://mpra.ub.uni-muenchen.de/57451.
Tanaka, R., Ishino, K. (2012). Testing the IncentiveEffects in Tournaments with a Superstar. Journalof the Japanese and International Economies, 26,393-404.
1
Peer Pressure in Crowdsourcing Contests
Jonas Heite Max Planck Institute for Innovation and Competition, Munich, DE, Marstallplatz 1, 80539 Munich, Germany,
Karin Hoisl University of Mannheim, Mannheim, DE / Copenhagen Business School, Copenhagen, DK / Max Planck Institute for
Innovation and Competition, Munich, DE, L5, 4, 68161 Mannheim, Germany, [email protected]
Abstract
We investigate whether and why performance differences exist between contestants with the same abilities
but who compete against more skilled or less skilled contestants. We analyze 1,677 unique coders
competing in 38 software algorithm competitions with random assignment. Part of these coders compete
amongst the top-performers of a low-ability group, the others compete amongst the bottom-performers of
a high-ability group. We compare the performance of the coders competing in the two groups using a
Regression Discontinuity Design (RDD) and investigate to what extent the effort exerted by the coders
can explain performance differentials. We find that bottom-performers of a high-ability group are
characterized by a performance that is 17% lower than that of coders who have the same ability-level but
compete as top-performers of a low-ability group. However, a decrease in effort cannot explain the
performance differentials we observe. Instead, we find that psychological factors like choking under
pressure and a rational decision to take higher risks hamper the problem-solving behavior of the
contestants under pressure. Our paper contributes to the literature on performance in contests by providing
new and causal evidence of the mechanisms causing performance differentials.
Keywords:
Contests; tournaments; pressure; performance differentials; mechanisms; risk-taking; coking under
pressure; rationality; behavioral factors
2
1 Introduction
In the present study, we investigate whether and why performance differences exist between contestants
with the same abilities but who compete against more skilled or less skilled contestants1. Performance in
contests is a function of the ability and the effort of the contestants. Whereas the ability of the individual
contestant is exogenous, the effort can be influenced by the design of a contest. Design configurations that
have been studied, so far, include the structure and level of prizes (Ehrenberg and Bognanno 1990, Orszag
1994), the number of contestants (Boudreau et al. 2016, Garcia and Tor 2009), or the composition of the
group of contestants (Brown 2011, Konrad 2009, Tanaka and Ishino 2012).
Recently, ability-configurations of groups of contestants have attracted notable attention of economists
and management scholars. Competing against contestants with a similar ability provides incentives to
maximize effort in order to win a prize. Competing against contestants with a higher ability, on the
contrary, was shown to decrease the performance of the lower-ability contestant (Casas-Arce and
Martinez-Jerez 2009). The negative relationship is even more pronounced in tournaments that contain
“star performers”, since contestants reduce their effort, because they assume that it will not suffice to
overcome the ability gap between them and the other contestants (Lalleman et al. 2008, Brown 2011).
Competing against contestants with a lower ability should also lead to a reduction in effort caused by a
feeling of superiority to the other contestants (Brown 2011, Tanaka and Ishino 2012).
We compare contestants characterized by the same abilities. Part of them competes against contestants
who are more skilled, which represents our treatment of higher competitive pressure; the rest competes
against contestants who are less skilled. In case all contestants acted rationally, we expect all contestants
to reduce their efforts put into the task. This should result in a similar performance of all of them. Our
results, however, reveal that individuals who compete against less skilled contestants outperform
individuals who compete against more skilled contestants. One possible explanation could be that only the
latter contestants reduced their effort even more than the contestants in the other group. A closer look at
the problem-solving behavior of the two groups of contestants, however, shows that a decrease in effort
cannot explain the performance differentials we observe.
Since standard economic theory cannot explain our results, we investigate two alternative explanations.
First, these results may be explained by various psychological factors. Competing against contestants with
a higher ability may, for instance, lead to intimidation or choking under pressure (Baumeister 1984,
Baumeister and Showers 1986, DeCaro et al. 2011). Competing against contestants with a lower ability
may result in a reduction in carefulness (Barrick and Mount, 1991, Hurtz and Donovan 2000) or fear to
make a fool of oneself (Baumeister 1984, Schlenker 1980, Zajonc 1965). Additionally, efficacy
expectations may play a role (Bandura 1977). Second, our findings may be the outcome of rational
1 Within this paper we understand contests as rank-ordered tournaments, where the ordinal rank of output
determines a contest’s compensation (Lazear and Rosen 1981).
3
behavior – even though not related to effort. In particular, contestants may be willing to take higher risks
in case they think that the risk can considerably increase their probability of winning (Konrad and
Lommerud 1993, Buser 2016). Interviews conducted with nine experienced contestants, asked about their
perception of competition as well as their problem-solving behavior and strategies when competing in
crowdsourcing contests confirmed that psychological as well as rational factors affect their performance.
We base our predictions about whether psychological or rational factors can explain our findings on prior
literature from contest theory, behavioral theory, psychology, and the literature on innovation and
crowdsourcing contests. We use data on crowdsourcing contests hosted on the topcoder platform, which
was created in 2001. Today, it has more than 1 million expert members who compete in design,
development and data science challenges. We analyze 38 software algorithm competitions, so-called
Single Round Matches (SRMs), organized between August 2001 and February 2002. SRMs are timed
contests where contestants compete online. In our sample, all contestants solve the same set of three
problems with increasing complexity (problem 1 = simple, problem 2 = medium difficult, problem 3 =
difficult) under the same time constraints. These topcoder contests allow observing contestants’ ability
and performance based on objective measures.
As mentioned above, we study software coders with equal abilities. Part of these coders compete amongst
the top-performers of a low-ability group (control group), the others compete amongst the bottom-
performers of a high-ability group (treatment group). Thus, competitive pressure is higher for our
treatment group. In a first step, we compare the performance of the coders competing in the two groups
using a Regression Discontinuity Design (RDD) and investigate to what extent the effort exerted by the
coders can explain performance differentials. In a second step, we further increase competitive pressure
and investigate the effect of a larger number of competitors, additional to our baseline treatment of
competing in a high-ability group. Our results show that bottom-performers of a high-ability group are
characterized by a performance that is 61 points lower than that of coders who have the same ability-level
but compete as top-performers of a low-ability group. The performance differential between the two
groups equals 17%. Hence, the effect is not only statistically but also economically significant. Once,
pressure is further increased by increasing the number of contestants, the difference between the treatment
and control group increases to 93 points, which equals a performance differential amounting to 25%.
To investigate psychological factors like intimidation or carefulness and rational factors like the
willingness to take risks, we analyze the choice of tasks (tasks vary in the level of difficulty), the speed of
problem solving, the problem-solving experience, and the mistakes the contestants make. We find that the
bottom-performers of a high-ability group tend to try to solve more difficult problems than the top-
performers of a low-ability group, either because they are aware of the fact that they can only win in case
they take more risks or to signal their abilities. Moreover, these contestants also make more mistakes at
the easy and medium problems which indicate lower carefulness or choking under pressure. Robustness
4
checks show that “rating diving”, i.e. a deliberately drop back into the low-ability group to increase the
chance of winning in a subsequent contest is not a relevant concern. In sum, we find first evidence that
both, psychological and rational factors explain performance differentials of equally skilled contestants
that compete in groups of contestants characterized by different ability-configurations.
Our paper contributes to the literature on performance in contests by providing new and causal evidence
of the mechanisms causing performance differentials. We add to the literature on behavioral factors to
explain performance differentials in contests. The project also contributes to the literature on the optimal
design of crowdsourcing contests, since our study uncovers factors that potentially lead to failures of
contest-based knowledge sourcing in the innovation process.
2 Theoretical Framework
The focus of contest theory is relative performance evaluation. Individuals are rewarded based on their
performance relative to other contestants (Knoeber and Tsoulouhas 2013). The theory was originally
developed by Lazear and Rosen (1981) to design optimal labor contracts based on differences in
individual productivity. Since the early 1980s, contest theory has been applied to various fields such as
sports (Bothner et al. 2007), law (Anabtawi 2005), and research and development (Dechenaux et al. 2015).
The interest derives from the fact that contests incentivize individuals to exert higher effort than in non-
competitive environments.
Contest models assume that individuals exert efforts while competing for a prize (Boudreau et al. 2016).
The probability of winning the prize of equally skilled individuals depends on the efforts of all
contestants. In particular, it equals the ratio of the effort of an individual contestant to the sum of all
contestants’ efforts. Consequently, it increases with the individual’s own effort and decreases with the
effort of other contestants (Tullock 1980). The expected payoff of an individual equals the expected
benefit, i.e. the probability of winning the prize, times the prize value minus the costs of effort. Whereas
the payoff (and consequently the effort) increases with the value of the prize, it decreases with the number
of contestants (Skaperdas 1996, Sheremeta 2014, Boudreau et al. 2016). In case we assumed contestants
with equal abilities and kept the number of contestants and the prize level constant, rational individuals
should exhibit the same amount of effort, thus having a similar performance and win with the same
probability.
Whereas contests are most effective in case all contestants have similar abilities, they are less effective if
contestants have heterogeneous abilities. Competing against contestants with a higher ability decreases the
effort of the lower-ability contestants, since the latter assume that they will not overcome the ability gap
(Lalleman et al. 2009, Brown 2011). Competing against contestants with a lower ability should also lead
to a reduction in effort caused by a feeling of superiority to the other contestants (Brown 2011, Tanaka
5
and Ishino 2012). An extreme situation is competing against star performers, i.e. individuals who
consistently show a superior performance relative to other contestants (Rosen 1981). Economic theory
suggests that star performers adversely affect the incentives to exert effort in contests. Lallemand et al.
(2009) and Brown (2011) show that lower-ranked professional tennis and golf players underperform in
matches characterized by heterogeneity of the contestants’ abilities. The effect is even larger in case a star
performer is among the contestants.
However, what if performance differentials of heterogeneous groups cannot be explained by differentials
in the effort put into the task? The literature provides two possible alternative explanations: psychological
factors and a rational decision to take higher risks. Psychological factors can manifest in various forms.
First, intimidation or choking under pressure might explain lower performance (Riley 2012). Situational
pressure may cause individuals to perform below their abilities despite incentives to put effort into a task
(Baumeister 1984, Baumeister and Showers 1986, DeCaro et al. 2011). Distraction theories provide an
explanation for this phenomenon. According to these theories choking occurs because of information
overload or because individuals focus on task-irrelevant cues like worry. Whereas the former leads to a
failure to adequately concentrate on the task, the latter results in neglecting critical characteristics of the
task (Morris and Liebert 1969, Kahnemann 1973, Baumeister and Schowers 1986). Both types of
distractions can result in errors. This is confirmed by Boudreau et al. (2012) who find that errors in logic
are the negative response to an increase in competition in contests. A possible reaction of contestants,
once under pressure, may be to increase carefulness (conscientiousness) to avoid mistakes. Whereas
carefulness, in general, positively affects performance (Barrick and Mount, 1991, Hurtz and Donovan
2000), it might turn into negative in case tasks have to be performed under time constraints. The latter is
typically the case in contests.
A contestant’s performance may also be influenced by her expectancy of success or failure based on
earlier experience. Contestants who believe that they can win are more likely to win than contestants who
are in doubt about their own abilities. Bandura (1977) refers to these believes as “efficacy expectations”.
A possible explanation of the higher likelihood of winning is that positive expectancies balance the
negative effects of pressure (Carver et al. 1979).
Finally, the fear to make a fool of oneself may also affect the performance of contestants. It has been
shown that audience causes performers to be concerned (Baumeister 1984, Schlenker 1980). Whereas
audience can have a positive effect on performance in case individuals perform a well-known task, the
effect is negative in case of a poorly known task (Zajonc 1965). In a contest where each contestant is
aware of her competitors, their ability, and their live-performance during the contest, competitors can be
considered the audience. The expected effect of audience is negative, since, in contests, tasks are poorly
known and have to be performed under time constraints. The negative effect of audience should be even
more pronounced if the same group of contestants repeatedly competes against each other. In particular, it
6
can be assumed that the performance of known contestants attracts more attention than the performance of
unknown contestants.
A rational explanation for performance differentials of contestants with the same ability competing against
better or worse contestants may be their risk-taking propensity. Individuals in contests may be willing to
take higher risks in case they think that the risk can considerably increase their probability of winning the
prize or in case they feel that they have nothing to lose (Konrad and Lommerud 1993, Buser 2016,
Mueller-Langer and Andreoli-Versbach 2017). Genakos and Pagliero (2012) study weightlifting
competitions and show that contestants that are ranked behind the leader take higher risks. Chevalier and
Ellison (1997) find that mutual funds adapt the riskiness of their portfolio depending on the mid-year
performance, increasing the fund volatility in case the mid-year performance is lagging behind. Literature
from psychology shows that individuals in a negative affective state, for instance, caused by the fact that
contestants expect to lose, tend to seek higher risks than those in a positive affective state (Isen and Geva
1987, Mittal and Ross 1998). Furthermore, individuals who are at risk of failing tend to try everything to
avoid failure – taking a higher risk in a contest may be one way of trying to avoid the shame experienced
by failing (Elliot and Thrash 2004, Elliot and Church 1997).
3 Empirical Context – Topcoder and Algorithm Contests
To answer our research question, we use crowdsourcing data. topcoder is an algorithm, development and
design platform, specialized in online programming contests. The platform was founded in 2001 by Jack
Hughes and Mike Lydon2. Today, topcoder has more than 1 million expert members who compete in
challenges. In total, topcoder hosts more than 7,000 contests every year. It is thus, the “world’s largest
community of competitive designers, developers, and data scientists”3. topcoder hosts different types of
competitions, amongst them algorithmic contests, software design contests, or graphic design contests.
We focus on weekly so-called algorithm contests, i.e. timed contests where all contestants compete online
and are given a set of three problems to solve under time constraints. Individual contests are called “Single
Round Matches” (SRMs).4 Coders must register for SRMs and are assigned randomly to so-called virtual
rooms, i.e. groups of up to 8 coders, who compete against each other. SRMs are split into two divisions.
Division 1 (D1) contains coders with medium to high skill levels; Division 2 (D2) comprises coders with
low to medium skill levels. To assign the coders to D1 and D2, the individual skill ratings are used.
topcoder has developed its own rating system based on an Elo rating, which compares the predicted future
rank based on past performance with the performance of all contestants. The cutoff value, i.e. the
2 See https://www.crunchbase.com/organization/topcoder, accessed on November 26, 2017.
3 See https://herox.com/topcoder, accessed on November 26, 2017.
4 In the following we refer to the setting and information environment of our specific sample in 2001 and 2002.
topcoder has changed the rules of SRMs several times over the last years.
7
threshold that determines whether a coder competes in D1 or D2 is unknown until shortly before the start
of the contest but is around 1,489 in our setting. Figure 1 summarizes the composition of SRMs.5
[Insert Figure 1 about here]
Each SRM consists of three phases, the coding phase, the challenge phase, and the system testing phase.
During the coding phase, the contestants must solve three problems characterized by an increasing level of
difficulty. Problem complexities are represented by the maximum reachable points, which are 250 points
for the easy problem (problem 1), 500 points for the medium difficult problem (problem 2) and 1000
points for the difficult problem (problem 3). The algorithmic problems require logical and structural
thinking in order to convert a certain task into a working computer solution. As soon as a contestant
selects and opens a problem, the achievable score (submission points) for that problem begins to decrease.
Thus, the number of points depends on the time elapsed since opening the problem statement and
submitting a solution. For each solution that successfully compiles (= source code that can be transferred
into an executable program), the contestants get the submission points for that problem. If the solution
does not pass the challenge or system testing phase, the contestant loses all submission points for that
specific problem. Hence, the attained score per problem is a function of the correctness and speed of the
respective solution. In total, the coding phase, i.e. the time to solve all three problems, lasts between 60
and 90 minutes (75 minutes in most of the cases).
The challenge phase takes 15 minutes. During this time, the contestants have the possibility to challenge
the functionality of the solutions of their contestants in the same room. In case any of the coders finds an
error in one of the contestants’ code, the contestant who submitted the erroneous code loses all submission
points earned for that specific problem at the end of the coding phase. The successful challenger gets a 50-
point reward. Contrary, if a challenge is unsuccessful, the contestant having made the challenge will lose
50 points from her score and the score of the challenged solution remains unchanged.
Submitted code, not yet successfully challenged, must pass through the system testing phase. In case the
topcoder system test finds an error in the code, the respective coder will, again, lose all her points for that
problem originally earned at the end of the coding phase. Successful challenges from the challenge phase
of both divisions are added to the system test so that all contestants across divisions are treated equally at
the end of the system testing phase.6
Generally, we distinguish between three possible outcomes: (1) opened problem (the contestant opened
the problem description), (2) submitted problem (the contestant submitted a solution for the problem), (3)
passed challenge and system test phase (the submitted solution passed both, the challenge and system test
phase). Only if the solution passed the third step, the contestant receives points and has the chance to win
a price. The final points awarded to a contestant equal the sum of the points received for successful
5 See https://apps.topcoder.com/forums/%3bjsessionid=69D6B19B12B97C75CB96F247E3679D15?module=
Thread&threadID=685368&start=0&mc=5#1306267, https://www.topcoder.com/community/how-it-works/,
accessed on November 26, 2017. 6 See https://help.topcoder.com/hc/en-us/articles/115006162527-SRM-Overview, accessed on November 26, 2017.
8
submissions (outcome 3) and the additional points earned for successful challenges of solutions of other
coders or points lost for unsuccessful challenges.
Coders are motivated to take part in an SRM by extrinsic rewards (prize money) and intrinsic rewards
(e.g., reputation). Generally, the top three performers in a room earn money. In D1, the winner is awarded
300 USD, the second place 150 USD, the third place 75 USD. In D2, the winner gets 150 USD and the
second and third place get 75 USD and 25 USD7, respectively. All other coders do not get any price
money. The intrinsic motives of the coders are in line with motives of open source programmers that have
repeatedly been discussed in the literature (e.g., Lakhani and Wolf 2005) and were confirmed by our
interviewees: signaling in the job market, community recognition, improvement of programming skills,
and fun.
After assignment to the different rooms, the contestants have access to information about their contestants.
They can view the profiles of the other coders in the room containing their programming skills, the time
since joining topcoder, and their skill rating. The coders’ name on topcoder is color-coded depending on
her skill rating. The top performers are colored in red (skill rating: 2200+), coders with a skill rating
between 1500 and 2199 are colored in yellow, coders with a skill rating between 1200 and 1400 are
colored in blue, and bottom performers (skill rating between 900 and 1199 or below 900) are colored in
green and gray. The color coding allows the coders to rapidly detect the composition of the skill
distribution of contestants and thus, their specific competitive environment.8 Additional live information
about the current state of a contest can be obtained from the “Leader Board”. Coders can click on the
"Leader Board" button on their screen, which then shows the current leader in the specific room.
Additionally, the window contains information about the actual points of the leader. Our interviews
indicate that coders check the group composition of the room that they are randomly assigned to before
the start of the contest.
4 Data and Descriptive Statistics
We focus on SRMs 26-66, organized between August 2001 and February 2002. We restricted the sample
to these SRMs, since, in these contests, coders in D1 and D2 solved the same problems and faced the same
time constraints. The latter ensures comparability of the scores achieved in D1 and D2. We dropped three
SRMs because of a lack of challenging points (SRMs 28 and 33) and total time not specified (SRM 35).
Our final sample consists of 38 SRMs and 1,677 unique coders. Some of the coders repeatedly competed
in SRMs, hence, the regression analyses are based on 10,038 observations, i.e. the achievements of coders
in different SRMs.
7 Since the effort is a function of the expected payoffs, it can be assumed that contestants in D1 put more effort into
solving the tasks. This may result in a higher performance of D1 coders than D2 coders. Hence, our assumption
that both parties exert equal efforts is a conservative assumption. 8 See https://apps.topcoder.com/wiki/display/tc/Algorithm+Competition+Rating+System, accessed on November
26, 2017.
9
Table 1 describes the problem solving behavior of the contestants. Whereas 98% of the contestants opened
problem 1, only 95% of the contestants opened problem 2, and 77% opened problem 3. Once opened, a
possible solution of problem 1 was submitted with a probability of 88%, of problem 2 with a probability
of 59% (if opened), and of problem 3 only with a probability of 25%. In case the contestants submitted a
solution of problem 1, 65% of these solutions were correct. Problem 2 solutions were correct with a
probability of 50% and problem 3 solutions only with a probability of 34%. Contestants who submitted a
solution for problem 1 got on average 129.8 points, contestants who solved and submitted a solution for
problem 2 got on average 151.6 points and, for a solution of problem 3, the contestants got, on average
202.2 points. Again conditional on a submission, the contestants dedicated more time to problem 2 (mean
= 33.4 minutes) and 3 (mean = 33.5 minutes) than to problem 1 (mean = 19.03 minutes).
Table 2 contains descriptive statistics of the variables used for our RDD analysis. Column 1 reports
descriptive statistics of the full sample. Columns 2 and 3 report descriptive statistics by division.
Test and ranking variables
The total number of contestants by SRM varies between 117 and 576 and amounts to 160 on average. The
average number of contestants competing in D1 is significantly smaller than the average number of
contestants competing in D2 (meanD1=31; meanD2=131). As mentioned earlier, in D1, the winner is
awarded 300 USD, the second place 150 USD, the third place 75 USD. In D2, the winner gets 150 USD
and the second and third place get 75 USD and 25 USD. In D1, the contestants got an average amount of
70 USD and D2 contestants an average amount of 32 USD. The maximum skill rating of the contestants in
D1 amounted to 3,111 (mean=1,835). The maximum skill rating in D2 amounted to 1,522 (mean=866).
Dependent variables
On average, the contestants opened 2.7 problems (2.9 in D1, 2.6 in D2). Contestants in D1 on average
submitted more problems than contestants in D2 (2.3 in D1 vs. 1.4 in D2). Contestants in D1 on average
received twice as many submission points (769.0 vs. 339.9). They, however, also lost more points after
other contestants challenged their solutions or the solutions did not pass the system tests.9 D1 contestants
lost on average 267.5 points whereas D2 contestants lost on average 173.6 points. The overall
performance of the contestants, which is defined as the final points excluding all additional points earned
during the challenge phase, amounts to 501 on average for contestants in D1 and to 166 on average for
contestants in D2.10
As expected, all differences turned out to be highly significant.
9 Lost points are defined as the difference between final points (performance of contestant) and submission points
and are the results of erroneous submissions detected at the challenge or system test phase. A high number of lost
points are associated with fast but risky submissions. 10
In our analysis, we exclude all additional points that have been earned during the challenge phase in order to be
able to compare the performance of contestants in D1 and D2. It is obvious (and can been shown) that bottom
performer of the high-ability group will earn less additional challenge points than top performer of the low-ability
group. This difference is excluded in our RDD analysis to compare the quality of the solutions between treatment
and control group.
10
Control variables
To avoid biased results, we control for the experience of the contestants, i.e. the number of earlier SRMs
in which the contestants took part.11
Whereas D1 contestants on average took part in 22 earlier contests,
D2 contestants on average only took part in 9 earlier contests. D1 contestants on average had switched 2.7
times between D1 and D2 (in case the contestants earned no points in a contest where they competed in
D1, they might have reduced their skill rating resulting in a move back to D2; from there, they again had
to work their way up to D1).12
D2 contestants on average are characterized by 0.5 switches between D1
and D2. We further control for the number of contestants per room. Rooms in D1 contain 7.5 contestants
on average, rooms in D2 7.8 contestants on average. This variable will also be used in the later regression
analysis to test whether additional competitive pressure is increasing the performance differentials
between D1 and D2 contestants. Following Boudreau et al. (2016), we use the number of contestants per
room as a proxy for the intensity of competitive pressure. Finally, we accounted for the number of contests
of each contestant in D1 and D2 during which the contestants had not submitted any solutions for any of
the problems.13
This behavior is known as “rating diving” thus deliberately performing worse in D1 to
drop back to D2 and having a higher likelihood to win in the next SRM. Again, all differences between D1
and D2 contestants are highly significant.
[Insert Tables 1 and 2 about here]
5 Methodological Approach
For SRMs, contestants are assigned to two groups according to their skill rating, which is a function of the
individual’s performance in previous topcoder contests. As mentioned earlier, those contestants with a
score above a certain threshold determined by topcoder compete in D1 and those below the threshold
compete in D2. Individuals who have a score varying around the threshold (medium-skilled contestants)
repeatedly move between D1 and D2. It is reasonable to assume that those who just barely pass the
threshold to D1 are comparable to those who just miss out on being assigned to this group, i.e. who
compete in D2. Consequently, the assignment of individuals to the two groups, whose ability score is at
the threshold, can be assumed to be like random.
As indicated above, individuals at the threshold, but assigned to D2, are the top-performers of the low-
ability group and contestants just assigned to the high-ability group (D1) are the bottom-performers of the
11
The variable “number contests participated” is a running variable that includes all SRMs that the contestant had
participated in prior the focal contest. 12
The variable “number switches between Divisions” is a running variable that includes all number of switches
between D1 and D2 prior the focal contest. A switch counts in both directions (from D1 to D2 and from D2 to
D1). 13
The variable “No submission Div1” (“No submission Div2”) is a running variable that counts all SRMs, where
the contestant did not submit any solution for the three problems in D1 (D2). For D1 this is referred as “rating
diving”. For D2 this behavior has no benefit and might reflect inexperience or curiosity of beginners.
11
high-ability group. Hence, peer pressure should be higher for individuals at the threshold but competing in
D1. This higher pressure is defined as the treatment in our setting.
The organization of the contests, i.e. an assignment of the contestants to the two divisions according to a
clear cutoff of the skill rating, allows us to implement a Regression Discontinuity Design (RDD) analysis
to assess the causal effect of competitive pressure on the performance of contestants (Thistlethwaite and
Campbell 1960, Lee 2008). In our case, the RDD is sharp, since the assignment to the two groups is a
deterministic function of the skill rating of the contestants (Jacob et al. 2012). In other words, all
contestants above the skills threshold set by topcoder are assigned to the treatment group (D1) and all
contestants with a skill rating below the threshold are assigned to the control group (D2) (Imbens and
Lemieux 2008).14
Figure 2 shows visual evidence of the RDD comparing the performance of the contestants according to
their skill rating and assigned division, where the rating is centered at the threshold. The functional form
of the RDD is based on a polynomial fit of the order 3 (cubic model) and represents the relationship
between the rating variable and the outcome. We further include an interaction between the rating variable
and the treatment which accounts for the fact that the treatment impacts not only the intercept but also the
slope of the regression line (Jacob et al. 2012). Restricting the analysis to contestants around the threshold,
who are all characterized by the same medium-ability level, shows that contestants in D1 (bottom-
performers of the high-ability group) indeed perform worse than those in D2 (top-performers of the low-
ability group). Whereas Figure 2 shows performance differentials without considering challenge points,
Figure 3 takes the additional challenge points into account. As expected, the latter increases the observable
performance differential.
[Insert Figures 2 and 3 about here]
In a second step, we estimate the following equation using an RDD analysis:
𝑌𝑖 = 𝛼 + 𝛽0𝑇𝑖 + 𝛽1𝑟𝑖 + 𝛽2𝑟𝑖2 + 𝛽3𝑟𝑖
3 + 𝛽4𝑟𝑖𝑇𝑖 + 𝛽5𝑟𝑖2𝑇𝑖 + 𝛽6𝑟𝑖
3𝑇𝑖 +
𝛽7(𝑋𝑖 − �̅�) + 𝛽8(𝑋𝑖 − �̅�)𝑇𝑖 + 𝑉 + 𝜖𝑖
where i refers to the contestant per contest (SRM), 𝛼 is the average value of the outcome for those in the
control group after controlling for the rating variable, 𝑌𝑖 refers to the outcome measure, i.e. the
performance of the contestants for observation i, 𝑇𝑖 is the treatment dummy (i.e. being in D1 with higher
peer pressure equals 1), 𝑟𝑖 is the rating variable for observation i, centered at the cutoff value, 𝑋𝑖 represents
the covariate to test heterogeneous treatment effects (centered at the mean to simplify interpretation), V is
14
Of all 10,038 observations, 10 observations (from 8 different SRMs) should have been assigned to D2, because
they had a skill rating that was lower than the specific threshold, but were assigned to D1. We assume that this
was done to fill up competition rooms in D1. As a robustness check, we excluded these observations with no
effect on the results.
12
a vector including round fixed effects (SRM) and control variables for some specifications, and 𝜖𝑖 is the
error term for observation i.
6 Multivariate Analysis
Tables 3 and 4 show the results of an RDD analysis. In each model, we cluster the standard errors at the
contestant level and control for contest (round) fixed effects. Whereas Table 3 shows the results of RDD
models, in which we estimate the effect of a cubic function of the skill rating with different slopes above
and below the threshold, Table 4 (as a robustness check) shows the results based on a quartic function.
Since the results of both models are very similar in terms of coefficient size and sign as well as level of
significance, in the following, we focus on the outcomes displayed in Table 3. Models 1 and 3 only
contain the treatment. Models 2 and 4 add specific control variables. Models 3 and 4 contain an interaction
between the treatment and the number of contestants competing in a room (i.e. intensity of peer pressure).
Results show that competing against contestants with higher ability (i.e. competing in D1) decreases
performance of the contestants with a skill level around the threshold by 61 points (Model 1). Once we
add the controls, the decrease in performance still amounts to 57 points (Model 2). Increasing competitive
pressure (i.e. an increase in the number of contestants in room) further increases the performance
differential by 24 points (Model 4).
With respect to the control variables, we find that experience, measured by number of earlier contests
participated, has a non-linear effect on performance. The number of contests in which the contestants did
not submit any solutions at all decreases performance by 68 points (no submissions in D1) and 20 points
(no submissions in D2).
[Insert Tables 3 and 4 about here]
Since we are interested in the mechanisms leading to performance differentials of individuals who are
characterized by the same ability level but compete in different competitive environments, we have a
closer look at the problem-solving behavior of the contestants, which is displayed in Figures 4 to 8. It has
to be noted that the following results refer to contestants around the cutoff value, i.e. at the threshold of
being assigned to D1 or D2. Figure 4 shows RDD estimates of the problem solving behavior for problem
1. Figures 5 and 6 show RDD analyses describing the problem solving behavior of the contestants for
problems 2 and 3. Figures 7 and 8 display RDD analyses of the overall problem solving behavior (all three
problems combined) of the contestants.
Figure 4 shows that contestants facing higher peer pressure (having a skill rating around the cutoff but
competing in D1) are characterized by a smaller likelihood of submitting the easy problem (problem 1)
and by lower submission points than contestants facing lower peer pressure (having a skill rating around
the cutoff but competing in D2). They, however, spend more time on easy problems than contestants
facing lower peer pressure and have less easy problems correct and less final points. In other words, they
13
make more mistakes than D2 contestants around the cutoff. These results may provide a first indication for
choking under pressure. A higher willingness to take risks might also be an explanation for what we
observe in the data. More mistakes at easy problems may occur due to less emphasis on the easy problem
and more emphasis on the medium or difficult problem.
Figure 5 focuses on the medium difficult problem (problem 2). Results indicate that contestants facing
higher peer pressure (D1) are characterized by a lower likelihood to open and submit medium difficult
problems than contestants competing in D2. Additionally, D1 contestants earn lower submission points for
medium difficult problems than D2 contestants. D1 contestants are further characterized by a lower
likelihood to have medium difficult problems correct, i.e. they earn less final points. They also spend less
time on submitted solutions. Hence, they get higher submission points (they need less time to submit a
solution). However, they submit less correct solutions than D2 coders. As a result, they lose more points
during the challenge or systems test phase. In sum, the problem solving behavior of contestants under
pressure indicates that they spend less emphasis on the medium problem (lower likelihood to open
problem 2) and solve the medium problems less carefully by submitting fast but erroneous solutions. Our
results may also point at a higher risk taking propensity of D1 contestants compared to D2 contestants. In
particular, D1 coders might submit solutions before they are sure that their solution is correct to save time,
which, in turn, leads to more submission points but lower final points. This indicates high risk
submissions.
The problem solving behavior of contestants solving problem 3 (Figure 6) reveals that contestants who
face higher peer pressure (competing in D1) are characterized by a higher likelihood to submit difficult
problems (conditional on opening the problem). The difference diminishes once competitive pressure
increases even further when considering the heterogeneous treatment effects of an increased number of
competitors in a specific contest room. D1 contestants also spent more time on the most difficult problem
(conditional on opening the problem). Figure 6 also indicates that D1 contestants are more likely to submit
erroneous solutions. Hence, even though D1 contestants dedicate more time to difficult problems (once
opened) compared to D2 contestants, they are more likely to make mistakes. A possible interpretation of
this finding is that D1 contestants take a higher risk. In other words, they try to solve problem 3 even
though they might not have the skills to succeed. This may well be their only option to win a price, i.e. to
beat the better peers.
Figure 7 summarizes the overall problem solving behavior of the contestants. It indicates that contestants
facing high competition (competing in D1) open less problems, submit less problems, spent more time on
solving problems, submit less correct solutions (i.e. make more mistakes), and, consequently, receive less
final points than D2 contestants. These results may point at choking under pressure. However, again, a
higher willingness to take risks might also be an explanation.
Finally, Figure 8 shows that contestants facing high competition (D1) have a higher ratio of time elapsed
by problems opened than D2 contestants. This could mean that they take more time to solve problems.
However, it may also be that they open more than one problem at the same time. They also have a higher
14
ratio of submission points by problems submitted than D2 contestants. This indicates a higher
performance per submission. However, they have a lower ratio of problems correct by problems
submitted. Hence they make more mistakes, either because of choking under pressure or because of taking
higher risks.
[Insert Figures 4 and 8 about here]
7 Implications, Conclusions and Limitations
The aim of our analysis was to investigate whether and why performance differences exist between
contestants with the same abilities but who compete against more skilled or less skilled contestants. We
observe lower performance of D1 contestants at all problems which can be explained by more mistakes,
even at the easy and medium problem. This, in turn, might explain choking under pressure because
contestants around the cutoff should be able to solve these problems correctly, but make more mistakes.
We also find evidence of a higher risk taking propensity of contestants facing higher competitive pressure.
This behavior might be rational in order to overcome the ability gap to higher-skilled contestants.
In sum, the detailed analysis of the problem solving behavior of the contestants sheds some light into what
drives the lower performance of equally skilled contestants that compete in D1 compared D2 contestants
around the cutoff. However, these results still do not suffice to disentangle the different possible
explanations of rational (e.g. higher risk taking) or behavioral (e.g. choking under pressure) mechanisms.
Our study contributes to the literature on performance in contests by providing new and causal evidence
performance differentials that cannot be explained by differences in ability or effort. The results also add
to the literature on the design of crowdsourcing contests by relating different levels of pressure to the risk
taking and problem-solving behavior of the contestants. Furthermore, our results help platforms to
optimally design their (crowdsourcing) contests. In order to disentangle the mechanisms of rational or
behavioral factors that influence the performance differential, a lab or field experiment might be useful.
15
References
Anabtawi, I. (2005). Explaining pay without performance: The tournament alternative. Emory Law
Journal, 54, 1557-1602.
Bandura, A. (1977). Self-efficacy: toward a unifying theory of behavioral change. Psychological review,
84(2), 191.
Baumeister, R. F. (1984). Choking under pressure: self-consciousness and paradoxical effects of
incentives on skillful performance. Journal of Personality and Social Psychology, 46(3), 610.
Baumeister, R.F., Showers, C. J. (1986). A review of paradoxical performance effects: Choking under
pressure in sports and mental tests. European Journal of Social Psychology, 16(4), 361-383.
Barrick, M. R., Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta
analysis. Personnel Psychology, 44(1), 1-26.
Bothner, M.S., Kang, J., Stuart, T.E. (2007). Competitive crowding and risk taking in a tournament:
Evidence from NASCAR racing. Administrative Science Quarterly, 52, 208-247.
Boudreau, K., Helfat, C.E., Lakhani, K.R., Menietti, M.E. (2012). Field evidence on individual behavior
& performance in rank-order tournaments. Harvard Business School Working Paper # 13-016, August
9, 2012; https://dash.harvard.edu/bitstream/handle/1/9502862/13-016.pdf?sequence=1.
Boudreau, K.J., Lakhani, K.R., Menietti, M. (2016). Performance responses to competition across skill
levels in rank‐order tournaments: field evidence and implications for tournament design. The RAND
Journal of Economics, 47(1), 140-165.
Brown, J. (2011). Quitters Never Win: The (Adverse) Incentive Effect of Competing with Superstars.
Journal of Political Economy, 119, 982-1013.
Buser, T. (2016). The Impact of Losing in a Competition on the Willingness to Seek Further Challenges.
Management Science 62(12), 3439-3449.
Carver, C. S., Blaney, P. H., Scheier, M. F. (1979). Reassertion and giving up: The interactive role of self-
directed attention and outcome expectancy. Journal of Personality and Social Psychology. 37(10),
1859.
Casas-Arce, P., Martínez-Jerez, F. A. (2009). Relative performance compensation, contests, and dynamic
incentives. Management Science, 55(8), 1306-1320.
Chevalier, J., G. Ellison (1997). Risk Taking by Mutual Funds as a Response to Incentives. Journal of
Political Economy 105(6), 1167–1200.
DeCaro, M. S., Thomas, R. D., Albert, N. B., Beilock, S. L. (2011). Choking under pressure: multiple
routes to skill failure. Journal of Experimental Psychology: General, 140(3), 390.
Dechenaux, E., Kovenock, D., Sheremeta, R. M. (2015). A survey of experimental research on contests,
all-pay auctions and tournaments. Experimental Economics, 18(4), 609-669.
Ehrenberg, R.G., Bognanno, M.L. (1990). Do Tournaments Have Incentive Effects? Journal of Political
Economy, 98, 1307-1324.
Elliot, A.J., Church, M.A. (1997). A hierarchical model of approach and avoidance achievement
motivation.” Journal of Personality and Social Psychology, 72(1), 218.
Elliot, A.J, Thrash, T.M. (2004). The intergenerational transmission of fear of failure. Personality and
Social Psychology Bulletin, 30(8), 957–971.
Garcia, S.M., Tor, A. (2009). The N-Effect: More Competitors, Less Competition. Psychological Science,
20, 871-877.
16
Genakos, C., M. Pagliero (2012), Interim Rank, Risk Taking and Performance in Dynamic Tournaments.
Journal of Political Economy 120(4), 782–813.
Hurtz, G. M., Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of
Applied Psychology, 85(6), 869-879.
Imbens, G. W., Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of
econometrics, 142(2), 615-635.
Isen, A. M., Geva, N. (1987). The influence of positive affect on acceptable level of risk: The person with
a large canoe has a large worry. Organizational Behavior and Human Decision Processes, 39(2), 145-
154.
Jacob, R., Zhu, P., Somers, M. A., Bloom, H. (2012). A Practical Guide to Regression Discontinuity.
MDRC.
Kahnemann, D. (1973). Attention and Effort, Prentice Hall, Englewood Cliffs, NJ.
Knoeber, C.R., Tsoulouhas, T. (2013). Introduction to the Special Issue on Tournaments and Contests.
International Journal of Industrial Organization, 31(3), 195-197.
Konrad, K.A. (2009). Strategy and Dynamics in Contests. Oxford, UK: Oxford University Press.
Konrad, K.A., Lommerud, K.E. (1993). Relative standing comparisons, risk taking, and safety regulations.
Journal of Public Economics, 51(3), 345-358.
Lallemand, T., Plasman, R., Rycx, F. (2008). Women and competition in elimination tournaments:
evidence from professional tennis data. Journal of Sports Economics, 9(1), 3-19.
Lazear, E. P., Rosen, S. (1981). Rank-Order Tournaments as Optimum Labor Contracts. The Journal of
Political Economy, 89(5), 841-864
Lee (2008). Randomized Experiments from Non-random Selection in U.S. House Elections, Journal of
Econometrics, 142(2): 675-697.
Mittal, V., Ross, W. T. (1998). The impact of positive and negative affect and issue framing on issue
interpretation and risk taking. Organizational Behavior and Human Decision Processes, 76(3), 298-
324.
Morris, L.W., Liebert, R.M. (1969). Effects of anxiety on timed and untimed intelligence tests. Journal of
Consulting and Clinical Psychology, 33:240-244.
Mueller-Langer, F, Andreoli-Versbach, P (2017). Leading-effect, risk-taking and sabotage in two-stage
tournaments: Evidence from a natural experiment. Journal of Economics and Statistics, 237(1): 1-28.
Orszag, J.M. (1994). A New Look at Incentive Effects and Golf Tournaments. Economics Letters, 46, 77-
88.
Riley, D., 2012. New tiger, old stripes. Gentlemen's Quarterly.
Rosen, S. (1981). The Economics of Superstars. American Economic Review, 71(5): 845-858.
Schlenker, B. R. (1980). Impression management: The self-concept, social identity, and interpersonal
relations. Monterey: Brooks/Cole.
Sheremeta, R.M. (2014). Behavior in Contests. MPRA Paper No. 57451, July 21, 2014,
http://mpra.ub.uni-muenchen.de/57451.
Skaperdas, S. (1996). Contest success functions. Economic Theory, 7 (2), 283-290.
Tanaka, R., Ishino, K. (2012). Testing the Incentive Effects in Tournaments with a Superstar. Journal of
the Japanese and International Economies, 26, 393-404.
17
Thistlethwaite, D., Campbell, D.T. (1960). Regression-Discontinuity Analysis: An alternative to the ex
post facto experiment. Journal of Educational Psychology, 51(6): 309-317.
Tullock, G. (1980). Efficient rent seeking, in J.M. Buchanan, R.D. Tollison, G. Tullock (Eds): Towards a
Theory of the Rent-Seeking Society, College Station, TX: Texas A&M University Press: 97-112.
Zajonc, R. B. (1965). Social facilitation. Science, 149(3681), 269-274.
18
TABLES AND FIGURES
Notation Obs. Mean St.D. Min. Max.
Problem level one (easy problem)
P1 opened 10038 0.98 0 1
P1 submitted 10038 0.87 0 1
P1 correct 10038 0.57 0 1
P1 submission points 10038 163.56 75.92 0 250
P1 final points 10038 112.90 103.00 0 250
P1 lost points 10038 -50.66 81.21 -250 0
P1 time elapsed 10038 25.93 26.38 0 135
P1 submitted if opened 9864 0.88 0 1
P1 correct if submitted 8729 0.65 0 1
P1 submission points if opened 9864 166.45 73.38 0 250
P1 final points if submitted 8729 129.83 100.01 0 250
P1 lost points if submitted 8729 -58.26 84.51 -250 0
P1 time elapsed if submitted 8729 19.03 15.27 0 85
Problem level two (medium problem)
P2 opened 10038 0.95 0 1
P2 submitted 10038 0.56 0 1
P2 correct 10038 0.28 0 1
P2 submission points 10038 158.61 151.72 0 499
P2 final points 10038 84.95 142.37 0 490
P2 lost points 10038 -73.65 123.54 -499 0
P2 time elapsed 10038 44.56 25.72 0 135
P2 submitted if opened 9531 0.59 0 1
P2 correct if submitted 5622 0.50 0 1
P2 submission points if opened 9531 167.04 151.10 0 499
P2 final points if submitted 5622 151.68 161.47 0 490
P2 lost points if submitted 5622 -131.51 140.16 -499 0
P2 time elapsed if submitted 5622 33.43 15.97 1 86
Problem level three (difficult problem)
P3 opened 10038 0.77 0 1
P3 submitted 10038 0.19 0 1
P3 correct 10038 0.07 0 1
P3 submission points 10038 108.17 231.56 0 999
P3 final points 10038 39.06 152.85 0 972
P3 lost points 10038 -69.11 188.83 -999 0
P3 time elapsed 10038 34.87 27.73 0 135
P3 submitted if opened 7771 0.25 0 1
P3 correct if submitted 1939 0.34 0 1
P3 submission points if opened 7771 139.72 254.67 0 999
P3 final points if submitted 1939 202.22 296.61 0 972
P3 lost points if submitted 1939 -357.75 285.24 -999 0
P3 time elapsed if submitted 1939 33.53 14.94 1 85
Table 1: Descriptive Statistics (part 1)
19
*** p<0.01, ** p<0.05, * p<0.1
Table 2: Descriptive Statistics (part 2)
t-TestN
Mean Median Min Max Mean Median Min Max Mean Median Min Max
Contest variables
Number contestants in contest 159.7 133 117 576 30.7 28 31 99 130.8 108 81 477 ***
Paid prize 40.2 0 0 300 70.3 0 0 300 32.2 0 0 150 ***
Ranking variables
Skill rating 1,070.3 1,097 0 3,111 1,835.3 1,733 1,463 3,111 865.9 972 0 1,522 ***
Skill rating (standardized) -417.7 -391 -1,522 1,621 347.9 245 -23 1,621 -622.2 -515 -1,522 0 ***
Dependent variables
Problems opened 2.7 3 1 3 2.9 3 1 3 2.6 3 1 3 ***
Problems submitted 1.6 2 0 3 2.3 2 0 3 1.4 1 0 3 ***
Time elapsed 105.4 95 0 400 82.4 79 0 322 111.5 95 0 400 ***
Submission points 430.3 391 0 1,636 769.0 635 0 1,636 339.9 239 0 1,625 ***
Problems correct 0.9 1 0 3 1.6 2 0 3 0.7 1 0 3 ***
Performance of contestant 236.9 185 0 1,635 501.5 465 0 1,635 166.3 141 0 1,508 ***
Lost points -193.4 -124 -1,625 0 -267.5 -206 -1,552 0 -173.6 -113 -1,625 0 ***
Control variables
Number contests participated 11.6 7 1 61 22.3 21 2 61 8.8 5 1 57 ***
Number switches between Divisions 0.9 0 0 22 2.7 1 0 21 0.5 0 0 22 ***
Number competitors in room 7.7 8 4 8 7.5 8 6 8 7.8 8 4 8 ***
No submission Div1 0.1 0 0 8 0.1 0 0 8 0.0 0 0 8 ***
No submission Div2 0.5 0 0 15 0.1 0 0 2 0.6 0 0 15 ***
Note N
: Column shows the t-Test on differences in means
Overall sample (Ntotal = 10.038) Division 1 (NDiv1 = 2,116) Division 2 (NDiv2 = 7,922)
20
Performance of contestants (in points)
(Model 1) (Model 2) (Model 3) (Model 4)
VARIABLES
RDD RDD
with controls
RDD
with interaction
RDD with
interaction and
controls
Treatment -60.98*** -57.01*** -66.89*** -62.06***
(18.20) (17.59) (18.23) (17.68)
Treatment X number of competitors -26.41** -23.79*
in contest room (centered) (12.30) (12.20)
Number of competitors -7.65* 0.05 -1.07
in contest room (centered) (4.43) (4.40) (4.38)
Number of contests participated 12.42*** 12.33***
(1.66) (1.66)
Sq(number of contests participated) -0.11*** -0.11***
(0.03) (0.03)
ln(number of switches between Divisions) -0.49 -0.51
(12.87) (12.85)
Number of contests with no submissions -68.44*** -68.41***
in Division 1 (high-ability group) (22.24) (22.18)
Number of contests with no submissions -19.83*** -19.84***
in Division 2 (low-ability group) (4.42) (4.39)
Constant 289.87*** 275.60*** 287.51*** 276.99***
(39.34) (38.68) (39.47) (38.55)
Round FE Yes Yes Yes Yes
Observations 10,038 10,038 10,038 10,038
R-squared 0.23 0.24 0.23 0.24
Number of contests 38 38 38 38
Note: The table reports sharp RD estimates using a cubic functional form with interaction effects to account for
different slopes around the cutoff; Model 3 and Model 4 further include heterogeneous treatment effects of the
number of competitors in a contest room, centered around the mean; all models include round fixed effects.
Treatment refers to the main variable of interest which is constructed as a dummy variable equal 1 if skill rating is
higher than the cutoff thus indicating if a contestant is competing in a high ability group with intense competitive
pressure. Robust standard errors are clustered at the contestant level; *** p<0.01, ** p<0.05, * p<0.1.
Table 3: Regression Discontinuity Design (cubic function)
21
Performance of contestants (in points)
(Model 1) (Model 2)
(Model 3) (Model 4)
VARIABLES
RDD RDD
with controls
RDD
with interaction
RDD with
interaction and
controls
Treatment -74.01*** -69.31*** -77.97*** -73.22***
(20.90) (20.09) (20.93) (20.11)
Treatment X number of competitors -25.46** -23.03*
in contest room (centered) (12.42) (12.31)
Number of competitors -7.35* 0.08 -1.01
in contest room (centered) (4.46) (4.39) (4.37)
Number of contests participated 12.47*** 12.39***
(1.67) (1.67)
Sq(number of contests participated) -0.11*** -0.11***
(0.03) (0.03)
ln(number of switches between Divisions) -0.97 -1.00
(13.00) (12.97)
Number of contests with no submissions -67.74*** -67.79***
in Division 1 (high-ability group) (22.30) (22.23)
Number of contests with no submissions -19.80*** -19.81***
in Division 2 (low-ability group) (4.40) (4.37)
Constant 291.77*** 279.10*** 289.40*** 280.54***
(40.28) (39.68) (40.41) (39.55)
Round FE Yes Yes Yes Yes
Observations 10,038 10,038 10,038 10,038
R-squared 0.23 0.24 0.23 0.24
Number of contests 38 38 38 38
Note: The table reports sharp RD estimates using a quartic functional form with interaction effects to account for
different slopes around the cutoff; Model 3 and Model 4 further include heterogeneous treatment effects of the
number of competitors in a contest room, centered around the mean; all models include round fixed effects. Treatment
refers to the main variable of interest which is constructed as a dummy variable equal 1 if skill rating is higher than
the cutoff thus indicating if a contestant is competing in a high ability group with intense competitive pressure. Robust
standard errors are clustered at the contestant level; *** p<0.01, ** p<0.05, * p<0.1.
Table 4: Regression Discontinuity Design (quartic function)
22
Figure 1: Composition of SRMs
Figure 2: Performance of contestants (RDD descriptive results)
0
500
1000
1500
Per
form
ance
of
con
test
ants
(in
po
ints
)
-1500 -1000 -500 0 500 1000 1500Skill rating (standardized)
Sample average within bin
Polynomial fit of order 3
Room 2 Room 4
Room 1 Room 3
Skill rating
Division 2 (low-ability group)
Division 1 (high-ability group)
Assignment to two groups (= Divisions) based on skill rating
Assignment to different rooms at random
Cutoff Value
23
Figure 3: Performance of contestants with additional challenge points (RDD descriptive results)
0
500
1000
1500
Per
form
ance
of
con
test
ants
w/
chal
len
ge p
oin
ts
-1500 -1000 -500 0 500 1000 1500Skill rating (standardized)
Sample average within bin
Polynomial fit of order 3
24
Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous
treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the
dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates
are conditional on the previous stage of the problem (1st stage: problem opened, 2
nd stage: solution for problem
submitted, 3rd
stage: submitted solution is correct).
Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round
fixed effects; *** p<0.01, ** p<0.05, * p<0.1.
Figure 4: RDD Analysis of problem solving behavior for problem 1
25
Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous
treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the
dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates
are conditional on the previous stage of the problem (1st stage: problem opened, 2
nd stage: solution for problem
submitted, 3rd
stage: submitted solution is correct).
Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round
fixed effects; *** p<0.01, ** p<0.05, * p<0.1.
Figure 5: RDD Analysis of problem solving behavior for problem 2
26
Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous
treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the
dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates
are conditional on the previous stage of the problem (1st stage: problem opened, 2
nd stage: solution for problem
submitted, 3rd
stage: submitted solution is correct).
Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round
fixed effects; *** p<0.01, ** p<0.05, * p<0.1.
Figure 6: RDD Analysis of problem solving behavior for problem 3
27
Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous
treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the
dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates
are conditional on the previous stage of the problem (1st stage: problem opened, 2
nd stage: solution for problem
submitted, 3rd
stage: submitted solution is correct).
Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round
fixed effects; *** p<0.01, ** p<0.05, * p<0.1.
Figure 7: RDD Analysis of overall problem solving behavior (part 1)
28
Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous
treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the
dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates
are conditional on the previous stage of the problem (1st stage: problem opened, 2
nd stage: solution for problem
submitted, 3rd
stage: submitted solution is correct).
Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round
fixed effects; *** p<0.01, ** p<0.05, * p<0.1.
Figure 8: RDD Analysis of overall problem solving behavior (part 2)