Team Formation Dynamics and Preferences in Online Courses

Team Formation Dynamics: A Study Using OnlineLearning Data

Milad EftekharDepartment of Computer

ScienceUniversity of TorontoToronto, ON, Canada

[email protected]

Farnaz Ronaghi∗

Department of ManagementScience and Engineering

Stanford UniversityPalo Alto, CA, USA

[email protected]

Amin SaberiDepartment of Management

Science and EngineeringStanford UniversityPalo Alto, CA, USA

[email protected]

ABSTRACTUsing data from online courses, we study the dynamics ofteam formation in online environments. In particular, weobserve that the teams formed by online students for com-pleting course projects are homogeneous in terms of age,location and education level but diverse in terms of pri-mary skill. Motivated by the data, we propose a coalitionalgame that captures the teaming preferences of individualsand show that the core of the resulting game is always non-empty. Even though our proof is constructive, it does notalways yield a polynomial-time algorithm. We show that itis NP-hard to find a solution in the core in the general caseand propose polynomial-time algorithms for natural specialcases motivated by observations of online course data.

Categories and Subject DescriptorsH.2.8 [Database Management]: Database Applications–Data Mining; K.3.1 [Computers and Education]: Com-puter Uses in Education – Collaborative Learning; I.2.11[Artificial Intelligence]: Distributed Artificial Intelligence– Multiagent Systems; J.4 [Social and Behavioral Sci-ences]: Economics

General TermsAlgorithms, Measurement, Theory

KeywordsTeam Formation; Online Education; Social Learning; Coreof Collaborative Games

1. INTRODUCTIONIn addition to increasing access to education at global

scale, the emergence of Massive Open Online Courses (MOOCs)

∗This work was done while authors were at NovoEd Inc.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’15, November 2–3, 2015, Palo Alto, California, USA.Copyright is held by the owner/author(s). Publication rights licensed to ACM.ACM 978-1-4503-3951-3/15/11 ...$15.00.DOI: http://dx.doi.org/10.1145/2817946.2817967.

has given us a new view into how people learn, interact, andcollaborate. The availability of the MOOC data across re-search institutions, their scale and granularity, as well astheir reach across countries and cultures make them an in-valuable source for research.

This paper focuses on the dynamics of team formationin an online environment. We studied team formation in11 online courses with about 50 thousand participants fromover 150 different countries. The courses were offered onNovoEd, where teamwork and collaboration are an integralpart of the coursework. Students were asked to form teamsto work on a business project. The teams were organic;participants could search and browse each other’s profilesand join each other’s teams.

We studied the teams formed by students and reviewedthe joint distribution of characteristics, e.g., age, location,gender, and education level. We observed a high degree ofhomogeneity across age, location, and education level butsurprisingly not across gender. This was consistent acrossthe courses as well as across different segments of the popu-lation. The tendency of individuals to associate with similarothers - dubbed as homophily - is discovered in a vast arrayof studies about social networks.

The more interesting observation here is that among suc-cessful organic teams (teams that possess a high fractionof members who acquired a statement of accomplishmentat the end of the course), we also observed a high degreeof diversity (heterophily) in terms of participants’ primaryskill sets. In other words, multidisciplinary teams with morediverse skill sets were more successful than the rest: theysought to form multidisciplinary teams by seeking team-mates with complementary skills. Homophily (across age,location, and education level) and heterophily (across skillset) proved to be an effective strategy, leading to more suc-cessful teams. Section 3 provides a detailed description ofour observations.

Based on the observations, we proposed a simple gametheoretic model to capture the preferences of individuals andtheir dynamics. In our model, each individual (or “agent”)was endowed with a vector of characteristics, like age, lo-cation, or skill set. We partitioned agents into teams of acertain size and assumed that team success depended onthe characteristics of the individuals present in that team.Obviously, every agent would seek a team with the highestquality but would have to be accepted into the team, aswell. To capture this, we used the notion of a coalitionalgame core [10]. We define a partition of agents into teams

as begin is in the core if no subset of agents could improveits score function by deviating from the proposed partitionand forming its own team.

A few aspects of this model merit comment. First, thenotion of core is a natural generalization of pairwise sta-bility used in the context of widely studied stable marriageproblems [20]. Instead of pairwise stability, we focus on thestability of groups of size larger than two. Second, like thestable marriage problem, we do not allow coalition to com-pensate each other with side payments. More precisely, weuse the notion of core in a game with non-transferable pay-off.

The most important distinction of the games studied hereis that all team members benefit equally from the union.In other words, the utility of an agent in the team equalsthe value or quality score of that team. This is a naturalassumption in our setting since all students in a team re-ceive the same grade for a joint project. But our model isapplicable elsewhere, too. For example, members of a groupbenefit in a similar way when they form teams to undertakea project that supports a community (like building a park,library or school) or when there are social norms againstdifferentiation (like specifying authors in a theory paper).

We prove that under general assumptions, and as long asmembers of a team equally benefit from its value, the coreof the game is non-empty. In other words, it is always pos-sible to partition agents into sets such that no subset hasan incentive to secede. Our proof (presented in Section 5)is constructive, but it does not always lead to a polynomial-time algorithm. In fact, we prove that it is NP-hard to finda stable partition in most cases. Section 6 offers polyno-mial time algorithms for natural special cases motivated byobservations in Section 3.

2. RELATED WORKThe impact of homogeneity or diversity on team effective-

ness has been studied extensively. In homogeneous teams,members share similar characteristics which results in eas-ier collaboration, positive reactions, and extensive engage-ment [17,23]. In diverse teams, the existence of varied abil-ities and ideas sparks creativity and helps achieve a higherfinal performance [18]. Another detailed study of team com-position (including personalities, skills, team size, roles, etc.)and its impact on team performance is found in Senior andSwailes [21].

A specific line of related research focuses on the impact ofa founding team’s composition on the performance of firmsand ventures [5, 7–9].

The team formation problem has been studied extensivelyfor scenarios that aim to form a single team for each exist-ing task. Having a set of users and one or multiple tasks,the goal is to find a team that contains all required skillsto complete the task. The problem has been studied in op-erations research [25] and revisited in computer science byadding compatibility constraints (in the form of communi-cation overhead over a social network of users) [15], costconstraints (the cost associated with adding each user toa team) [3, 19], and time constraints (task work permittedonly during free time) [11]. Another extension of the prob-lem considers a time-series of arriving tasks whereby usersare assigned to each task with a fair task allocation [4, 16].

The concept of teams and forums performing collaborativelearning in MOOCs has been studied more recently [13,

24]. In one work, users are assumed to have a vector ofcharacteristics, and the goal is to group users into teams withan upper-bound size such that team members have identicalvalues on all characteristics [6].

Another interesting line of work [1, 2] considers the vari-ance in individual ability. Assuming the ability of a group ofusers is the average of the ability levels of its members, thegoal is to partition users into groups such that the numberof users with an ability level less than that of their teamis maximized. In our context, users have complete controlover whom they choose to accept into their teams, therefore,we chose game-theoretic modeling instead of optimization.Combining the approach of the two papers poses an inter-esting open problem.

The structure of teams as coalitions has been studied as anetwork formation game in [22] and [12], where the strengthof a coalition is defined as the strength of the social networkstructure creating it in a hedonic coalition formation game.In many network coalition games, the network is used todetermine payoffs, or even network creation is consideredas a strategic process [14]. Studying the team formationproblem as a network formation game, one where networkstructure affects the utility functions of agents, offers aninteresting future direction for our work. This paper definesthe utility function of a team based on the characteristics ofits members.

3. DATA OBSERVATIONSThis section describes the data, our methodology for mea-

suring homogeneity and diversity in teams, and our analysis.

3.1 TeamsNovoEd is a social learning platform used to offer expe-

riential and collaborative online classes of varying sizes. Inmany courses offered on NovoEd, team work and collabora-tion form an integral part of coursework.

The learning platform supports two types of team for-mation processes: algorithmic and organic. Algorithmicteams are formed automatically based on criteria set bythe instructor (e.g., size of the team and geographical loca-tion of the members), and are often transient, being formedaround a particular assignment and dissolved afterwards.Organic teams are formed by learners themselves and maypersist throughout the course, although students can leaveone team and join another at any time. Students find eachother through searching submitted assignments, answers tocourse profile questions, locations and keywords in the pro-file. They may be invited to join a team by the team leaderor individually ask to join a team.

This paper focuses on organic teams formed by the stu-dents, analyzes student preferences when selecting a team,and identifies success factors in these teams.

3.2 DatasetOur dataset includes 11 courses offered in the summer of

2014. Enrollments ranged from 200 students to more than25, 000 students per course. These courses were four- toeight-weeks long and covered various business topics. Threeof the analyzed courses had tens of active students (studentswho had done some course activities e.g., submitting an as-signment), five had hundreds of them and three had thou-sands.

Each student had a profile that included age, education,gender, and location. Instructors could choose to introduceextra profile questions such as those regarding skills. Profilequestion responses were the main mechanism for searchingout team members.

3.3 Measuring Homogeneity and Diversity ina Team

We propose a framework to determine whether studentsprefer to work with others similar to them (i.e., homogeneityis preferred) or with a diverse group (i.e., diversity is pre-ferred). We describe this analytical framework by studyingage diversity in organic teams.

Students are required to select their age range in theirprofile. Provided options are: 18−20, 21−25, 26−30, 31−35, 36 − 40, 41 − 45, 46 − 50, Over 50. The age rangeselected is not shown on their profiles and is not searchableby other students, but it could be inferred from studentprofile pictures and biographies.

We calculate the number of pairs of students in one teamthat belong to age groups X and Y . To remove size bias, wenormalize this number by the total number of possible pairsthat could have been created from members of age group Xwith members of age group Y (students are not uniformlydistributed in age groups). Let n(X) represent the numberof students in age group X and pair(X,Y ) represent thenumber of pairs with one student from age group X and one

student from age group Y . Let temp(X,Y ) = pair(X,Y )n(X)×n(Y )

.

For the special case of X = Y , the number of possible pairs

is(n(X)

2

). Hence, we define temp(X,X) = pair(X,X)

(n(X)2 )

.

Consequently, we define a metric to measure the prefer-ence of students from age groups X and Y to be in one teamas:

pref(X,Y ) =temp(X,Y )

average(∑

Z temp(X,Z),∑

Z temp(Y,Z))

Figure 1 shows age preferences in forming teams for a typ-ical course.1 Darker colors represent larger values, showingage groups that prefer to be in the same team. We observethat cells on the diagonal (or close to it) have larger values.This suggests a preference for age-homogeneous team cre-ation. Students under 20 or students over 50 show a strongpreference to join same-age teams (viz., dark colors on thebottom left and top right cells), while students between 26-40 show more flexibility in age groups when forming teams.

Student profiles in our dataset also include education level(i.e., Some high school, Graduated high school or equiva-lent, Some college or university, Graduated with an asso-ciate degree, Graduated with a bachelor’s degree, Gradu-ated with a master’s degree, Graduated with a doctoratedegree); gender (male, female); location (latitude-longitudepair); and skill set. Skill set, a course-specific question,may have different values in different courses. For exam-ple, skill set options defined by the instructor in a busi-ness class included: Aerospace, Finance, Machinery, Archi-tecture, Chemicals - Materials, Consumer products, Othermanufacturing, Telecommunications, Publishing- Schools, Ser-vice primary secondary, Energy - Electric utilities, Software- Internet - Mobile, Drugs - Biotech - Medical devices, Man-agement and finance consulting, Electronics - Computers,Law - Accounting - other business services, Other services.

1We observed consistent behavior across all courses.

Homogeneity in teams with regard to age

18-20 21-25 26-30 31-35 36-40 41-45 46-50 over 50

18-20

21-25

26-30

31-35

36-40

41-45

46-50

over

50

Figure 1: Students prefer to form teams with those of thesame age group.

Figure 2: Preference for age-homogeneous team membership

We look deeper into team formation and analyze homo-geneity preferences across different profile questions utilizingthe following analytical framework. For each profile ques-tion, we plot the cumulative distribution of distances be-tween responses given by team members in organic teamsand compare it to the same distribution in teams that couldbe built completely at random. We consider all studentswho responded to each of the profile questions in this analy-sis. Profile questions can be represented as numerical (loca-tion [latitude, longitude] and gender [0,1]), ordinal (age andeducation), and categorical (skill set). We define pairwisedistances between two data points for each profile questionas follows:

Figure 3: Preference for longitude-homogeneous team mem-bership

• Numerical profile questions: Location is definedby two numeric values: latitude and longitude. Age isdefined as an integer by assigning binary values to maleand female. Pairwise distances of two data points forage and location is defined as the Euclidean distancebetween them.

• Ordinal profile questions: Ordinal features, suchas age range and education level, are represented withconsecutive categories. We define the pairwise distancebetween two data points as the distance between thecategories to which they belong. For example, for ed-ucation, the distance between “Some high school” and“Graduated high school or equivalent” is 1.

• Categorical profile questions: We utilize entropyto define the distance between categorical values. En-tropy in a team is defined as

∑mi=1 pi log(pi), where pi

is the fraction of students who have selected i as theirresponse, and m is the number of categories.

3.4 Homogeneous and Diverse Teaming Pref-erences

We calculate cumulative distribution functions for pair-wise distances of student profile data in teams formed organ-ically and compare them to random teams. Organic teamsare homogeneous with respect to a profile question if theCDF of pairwise distances of team members for that pro-file question appears higher than the same CDF for randomteams.

We also study the impact of homogeneity on the successof students in organic teams by comparing the pairwise dis-tance CDFs for all organic teams with the CDFs for theorganic teams that contain at least 50% (“successful”), 75%(“very successful”), and 100% (“extremely successful”) suc-cessful team members. A successful student is one who hascompleted the course and received a statement of accom-plishment. Criteria for receiving a statement of accomplish-ment are highly dependent on the course instructor.

Figure 4: Preference for latitude-homogeneous team mem-bership

As explained in the following subsections, we observe thatstudents prefer homogeneity with regard to location, dis-tance, age range and education level. Interestingly, however,students do not have any preference concerning the genderof their team members. We also observe that students tendto prefer heterogeneous skill sets in successful organic teams.

3.4.1 Homogeneous Teaming PreferencesFigure 2 shows the CDF of pairwise distances for age range

in a course. The CDF curve for the organic teams is alwaysabove the CDF for the random case, attesting to the pref-erence for age-homogeneity in team formation. This resultwas reproduced consistently across all eleven courses in ourdataset.

Figures 3-6 show CDF curves for longitude (time zone),latitude, geographical distance, and education level. TheCDF for organic teams appears above the CDF for randomteams for all these features consistently across all courses –even for courses with supportive teams (i.e., students whodiscuss assignments in teams but write answers individu-ally). Teams show greater homogeneity with regard to lon-gitude than latitude: longitude represents time zone, andour results show that students organically form teams withthose in similar time zones.

We also reviewed invitations extended by team leaderswhen they are growing their teams and observed similar ho-mogeneous preferences for team invitations. This providesus with a second resource to confirm that students prefer towork with those with characteristic similar to their own.

We analyzed team formation preferences for successful or-ganic teams with regard to age, education, and locationand reproduced the same homogeneous preferences for theseteams compared to the random case. However, it is impor-tant to note that increasing homogeneity in a team does notnecessarily translate into making individual students moresuccessful in that team. In fact, in some courses we observedthat teams with more successful students depict a more di-verse distribution compared to all other organic teams. This

Figure 5: Preference for distance-homogeneous team mem-bership

Figure 6: Preference for education-level-homogeneous teammembership

suggests that teams with many successful students are morehomogeneous than random teams but not necessarily morehomogeneous than other organic teams.

We also analyzed teaming preferences with regard to gen-der. We didn’t find any statistically significant result thatis consistent across all courses.

3.4.2 Diverse Teaming PreferencesFinally, we analyzed skill diversity CDF curves for organic

teams and for successful organic teams. Figure 7 shows thatthe CDF curves for the organic teams with many success-ful members are lower than the CDF curves for all organicteams. Higher skill set entropy in a team is equivalent to

Figure 7: Skill diversity in high-profile organic teams versusall organic teams

higher skill set diversity among that team’s members. Teamswith more successful members have diverse skill sets. Thisresult is consistent across all courses. Therefore, studentsdepict diverse teaming preferences with regard to skill setand other similar profile questions (e.g., background, careersector, and professional interests).

4. MODELINGOur observations in Section 3 suggest two main criteria

to consider when forming teams. Individuals prefer to formteams with those who are in the same timezone and havea similar location, age range and education level. We alsoobserved that members of teams with more diverse skill setsachieve better learning outcomes in online courses.

To better understand the dynamics of team formation, weemploy tools and solution concepts from cooperative gametheory. This is appropriate because in our setting individu-als have agency over team formation by choosing their ownteammates. Moreover, they benefit directly from the successof their team and strive to form teams that have the highestprobability of success.

Let A be a set of n agents (e.g., students). Let c : A→ Z`

map each agent to a vector of l characteristics with discretevalues like age, education level, latitude, and longitude.2

For each a ∈ A and 1 ≤ j ≤ `, cj(a) denotes the j-thcharacteristic of a.

In addition, we assume that each agent has a primary skills : A→ S, where S = {s1, ..., sm}.

Team and Team Allocation. Given a number k and aset of predetermined thresholds {δj}1≤j≤` ≥ 0, a set T ⊆ Ais a team if 1 ≤ |T | ≤ k, and for each a, b ∈ A and 1 ≤ j ≤ `,

|cj(a)− cj(b)| ≤ δj .

We consider the above constraints as the homogeneity con-straints, and we call {δj} the homogeneity thresholds. Here,

2Without loss of generality, we assume the difference be-tween two continuous values is 1 and the values are in Z.

k introduces a size constraint to avoid making very largeteams, and homogeneity constraints are introduced in accor-dance with the observations in Section 3.4. We use T ⊆ 2A

to denote the set of all possible teams of agents. A teamallocation P is a partition of A into teams.

Value Function of a Team. The value function, v :2A → R+, assigns a non-negative value to each set B ⊆ A.One can think of v(T ) as the quality score or the likelihoodof success of a team T . We assume that v(B) is a monotoneincreasing function of the skill set present in B. In otherwords, we assume there exists a function g : 2S → R+ suchthat for any B ⊆ A,

v(B) = g (∪a∈Bs(a)) .

These assumptions simplify the problem significantly. Ina sense, we reduce each person to a vector of characteristicsand each team to a skill set. Obviously, the reality is farmore complex; many other factors affect the decision of in-dividuals to form a team and whether the resulting team issuccessful. On the other hand, in many situations, includingteam formation in an online course, individuals must formteams with limited information about each other. Our Sec-tion 3 observations indicate that the diversity of skill set andhomogeneity of basic characteristics do play a strong role inboth formation and success of teams and hence the validityof our assumptions.

Utility function. The utility of every agent is equal tothe value function or quality score of the team to which heor she belongs. More formally, for any agent a ∈ T of apartition P,

uP(a) = v(T ).

In our model, utility function assumes that all membersbenefit equally from the value created by the team. In otherwords, team members cannot distribute the value freely orcompensate each other with side payments. This modelapplies to a variety of settings. The most immediate ex-ample is in the context of online courses, where all teammembers receive the same grade for finishing the project.Other scenarios include those where a large group benefitsalmost equally from some projects, such as building a mu-seum, park, or school. Our model also applies tp situationswhere it is against social norms to differentiate explicitly be-tween contributors (e.g., as when writing a paper in a theoryconference).

4.1 Stable Solutions and Core of a Coopera-tive Game

Core of a game is a simple and intuitive notion defined incooperative game theory to study the stability of groups orcoalitions. The core consists of all configurations of agentsand payoff allocations that cannot be improved upon by acoalition of agents. This means that once an agreement inthe core has been reached, no coalition has an incentive tosecede.

In our context, a team allocation P is stable (P ∈ core) ifand only if for all T ∈ T , there exists an agent a ∈ T suchthat3

v(T ) ≤ uP(a).

This means that for all possible teams T , the utility of atleast one agent (we call it a) under the current team allo-3Please recall that T denotes the set of all possible teams.

cation P is not less than its utility when it secedes from Pand forms team T . Hence, a is not willing to secede, and Pis stable.

Our goal is to identify a team allocation in which: (1) thesize of each team is at most k, (2) homogeneity constraintsare respected, and (3) the solution is in the core. In particu-lar, after forming the teams, there is no group of agents thatcould improve its utility by leaving its team and forming anew team.

Problem 1. Given a set of agents, A, a value functionv : 2A → R+, a set of homogeneity thresholds {δj}1≤j≤` ≥ 0,and a size limit k, find a team allocation P that is in thecore.

5. THEORETICAL ANALYSIS OF THE TEAMALLOCATION PROBLEM

This section partially answers Problem 1.4 We start byshowing that there is always a team allocation P in the core.

Consider the following optimization problem.

Problem 2. Given a set of agents, A, a value functionv : 2A → R+ and a set of homogeneity thresholds {δj}1≤j≤` ≥0, find a team T ∈ T such that

v(T ) ≥ v(T ′)

for all T ′ ∈ T .

Our first theorem shows that, given access to an oracle thatsolves Problem 2, there is an algorithm that can find a teamallocation in the core in polynomial time. Then, we showthat Problem 1 is equivalent to Problem 2 (up to a linear lossin |A|). We continue by discussing the fact that Problem 2is NP-hard for an adversarially chosen value function. Thissuggests that Problem 1 is also NP-hard. We continue byconstructing a team allocation for a special family of valuefunctions.

Theorem 1. For any set of agents A, any value functionv : 2A → R+, and thresholds {δj}1≤j≤` ≥ 0, core 6= ∅.Furthermore, given access to an oracle of Problem 2, thereis a polynomial time algorithm that finds P ∈ core.

Proof. We propose a simple algorithm. First, we use anoracle of Problem 2 to find a team T with maximum value.We add T to P, remove its members from A, and repeat theprocedure on the remaining set. The algorithm terminateswhen A is empty. See Algorithm 1 for details.

Algorithm 1 Team Allocation Algorithm

while |A| > 0 doUse an oracle of Problem 2 to find T ∈ T (where

T ⊆ A) of maximum value.Add T to P and remove T from A, A = A \ T .

end whileReturn P.

Let P be the output. For the sake of contradiction, assumethere is a team T ∈ T such that for all a ∈ T ,

uP(a) < v(T ).

4A complete answer would be provided in the next sections.

Let a be the first agent in T that is assigned to another teamby the algorithm, and let T ′ represent that team. Note that

v(T ′) = uP(a) < v(T ).

By the definition of a, when T ′ is constructed, all membersof T were available, so T was also a feasible team at thattime. Since the oracle returned T , we must have

v(T ′) ≥ v(T ),

which is a contradiction. Hence, P ∈ core.

Observe that Algorithm 1 runs in polynomial time if theoracle of Problem 2 runs in polynomial time. Unfortunately,even where there is no homogeneity constraint and v(.) is asubmodular function, Problem 2 is NP-hard. In the fol-lowing theorem, we show that this is not a limitation ofAlgorithm 1 because Problem 1 is equivalent to Problem 2.

Theorem 2. Problems 1 and 2 are reducible to each otherin polynomial time.

Proof. Theorem 1 shows one direction: if we can solveProblem 2 in polynomial time, Algorithm 1 solves Prob-lem 1.

Conversely, given a polynomial time algorithm for Prob-lem 1, let P be the output of this algorithm. Let T ∈ P bethe team with maximum value,

v(T ) ≥ v(T ′),

for all T ′ ∈ P. We return T as the solution of Problem 2.If T is not an optimal solution, there is a team T ′ such thatv(T ′) > v(T ). But then P is not in the core because if allagents of T ′ withdraw from their allocated teams in P andform T ′, they receive a higher utility.

Even without the homogeneity constraints, finding a set ofcardinality k maximizing a monotone function, even undersubmodularity or supermodularity assumption, is NP-hard.Therefore, Problem 1 is NP-hard.

Corollary 1. It is NP-hard to find a team allocation inthe core.

With a more intricate construction employing homogenityconstraints, we can also show that the maximization prob-lem is NP-hard even when v is linear in the number of skillspresent in the set. However, we leave that proof for a moreextensive version of this article.

6. TEAM ALLOCATION UNDER WEAKLYSEPARABLE VALUE FUNCTIONS

Given Theorem 1, we focus on a specific class of valuefunctions. The following is a natural special case that admitspolynomial time algorithms.

Definition 1 (Weakly Separable Functions). A func-tion g : 2S → R+ is weakly separable if for any B ⊆ A anda, b ∈ A such that

g(B ∪ {a}) ≥ g(B ∪ {b}),

the following holds: For all B′ ⊃ B such that a, b /∈ B′,

g(B′ ∪ {a}) ≥ g(B′ ∪ {b}).

We say that a value function v : 2A → R+ is weakly sep-arable if there is a weakly separable function g : 2S → R+

such that for any B ⊆ A,

v(B) = g(∪a∈Bs(a)).

Example 1. The following functions are weakly separa-ble:

• g(V ) = (|V |)2

• g(V ) =√|V |

• For any w : S → R+, g(V ) =∏

s∈V w(s)

The rest of this section assumes that the value functionis monotone increasing and weakly separable. For each skills, we define the weight of s, w(s), to be the value of thesingleton containing s,

w(s) = g({s}).

First, we give a simple polynomial time algorithm forProblem 1 when there is no homogeneity constraint. In Sub-sections 6.1 and 6.2, we show that Problem 1 is polynomialtime solvable in the number of agents and the homogeneitythresholds.

Theorem 3. If there is no homogeneity constraint andthe value function is weakly separable, then Problem 1 ispolynomial time solvable.

Proof. By Algorithm 1, we need only solve Problem 2in polynomial time. Without loss of generality, assume thatthere is at most one agent in A having each skill in S (if morethan one, keep one and remove the rest) and that |A| ≥ k(if |A| < k, return A as the solution). Let A = {a1, . . . , an}such that

w(s(a1)) ≥ w(s(a2)) ≥ · · · ≥ w(s(an)).

We show {a1, . . . , ak} is the optimum of Problem 2.First, note that by monotonicity, the optimum set has size

k. Fix a set {b1, . . . , bk} such that

w(s(b1)) ≥ · · · ≥ w(s(bk)).

By definition of {a1, . . . , ak}, for all 1 ≤ i ≤ k,

w(s(ai)) ≥ w(s(bi)).

Therefore,

v({b1, . . . , bk}) ≤ v({b1, . . . , bk−1, ak})

since w(s(ak)) ≥ w(s(bk)) and v(.) is weakly separable. Sim-ilarly,

v({b1, . . . , bk−1, ak}) ≤ v({b1, . . . , bk−2, ak−1, ak}).

Repeating this k times, we conclude that

v({b1, . . . , bk}) ≤ v({a1, . . . , ak}),

as desired.

Algorithm 2 outputs a team allocation in the core whenthe value function is weakly separable and no homogeneityconstraint exists. It is easy to see that the algorithm runsin time O((|S|+ |A|) log(|S|+ |A|)).

Building on the above analysis, Theorem 4 characterizesall team allocations in the core of Problem 1 when no ho-mogeneity constraint exists and the value function is weaklyseparable.

Algorithm 2 Team Allocation Algorithm for weakly sepa-rable value functions and no homogeneity constraint

1: Sort the skills in the decreasing order of weights, i.e.,

w(s1) ≥ w(s2) ≥ · · · ≥ w(sm).

Assume there is at least one agent with each skill.2: while |A| > 0 do3: Construct a new team T of size k by selecting one

agent from each of the highest weight k skills (if lessthan k skills remain in the sequence, choose one agentfor each of them).

4: Add T to P and remove its agents from A. If thereare no more agents with skill si, remove si from thesequence.

5: end while6: Return P.

Theorem 4. 5 Consider a team allocation solution P.Let ui be the minimum utility an agent with skill si receivesin P. Rename u values to u′ such that u′1 ≥ ... ≥ u′m.

Team allocation P is in core if and only if there exists apermutation of skills s1, ..., sm such that for all 1 ≤ j ≤ m:

u′j ≥ g(Aj\Bj),

where Aj =⋃min(k+j−1,m)

i=1 {si}, Bj =⋃j−1

i=1 {si}, and \ isthe set difference operator.6

Example 2. Consider a class of 15 students A = {a1, · · · , a15}and 4 skills {s1, s2, s3, s4} . Students a1 − a3 have the pri-mary skill s1, a4 − a8 have skill s2, a9 − a12 have skill s3,and a13 − a15 have skill s4. Students are allowed to formteams of size at most k = 3. Moreover, g(V ) = |V |.

Consider the following team allocation P:T1 = {a1, a4, a9}, T2 = {a2, a5, a10}, T3 = {a3, a11, a13},T4 = {a6, a12, a14}, T5 = {a7, a15}, T6 = {a8}.

The minimum utility of any student with skill s1 is u1 = 3.Similarly, u2 = 1, u3 = 3, and u4 = 2. By sorting andrenaming ui values we get: u′1 = u1 = 3 ≥ u′2 = u3 = 3 ≥u′3 = u4 = 2 ≥ u′4 = u2 = 1.

Consider the following permutation of skills (s1 = s1, s2 =s3, s3 = s4, s4 = s2). The team allocation P is in core be-causeu′1 = 3 ≥ g({s1, s2, s3}\{}) = 3,u′2 = 3 ≥ g({s1, s2, s3, s4}\{s1}) = 3,u′3 = 2 ≥ g({s1, s2, s3, s4}\{s1, s2}) = 3, andu′4 = 1 ≥ g({s1, s2, s3, s4}\{s1, s2, s3}) = 3.

The following theorem gives a second approach to charac-terizing team allocations in the core.

Theorem 5. Let u′1, ..., u′m be sorted minimum utilities

for different skills in a team allocation P. Let s′1, ..., s′m be

the corresponding skills. Solution P is in core if and only iffor all 1 ≤ i ≤ m

g({s′i, s′i+1, ..., s′min(i+k−1,m)}) ≤ u′i.

Example 3. Consider the setting in Example 2: 15 stu-dents, 4 skills, k = 3, and g(V ) = |V |.5We discuss more complicated proofs, in the extended ver-sion of this article.6Recall that without loss of generality, we assume thatw(s1) ≥ · · · ≥ w(sm).

Let team allocation P be: T1 = {a1, a2, a4}, T2 = {a3, a5, a6},T3 = {a7, a8, a12}, T4 = {a9, a10, a11}, and T5 = {a13, a14, a15}.We want to examine if P is in core.

The minimum utilities are u1 = 2, u2 = 2, u3 = 1, u4 = 1.ui values are already sorted; hence u′1 = u1, u

′2 = u2, u

′3 =

u3, u′4 = u4 and s′1 = s1, s

′2 = s2, s

′3 = s3, s

′4 = s4.

P is not in core because u′1 = 2 < g({s′1, s′2, s′3}) = 3.

6.1 Team Allocation Problem under One Ho-mogeneity Constraint

This subsection extends Theorem 3 to the setting whereexactly one homogeneity constraint exists, i.e., we are look-ing for a team allocation P such that for any team T ,

maxa,b∈T

|c(a)− c(b)| ≤ δ.

Note that we are dropping the indices because there is justone homogeneity constraint and, as usual, δ is the homo-geneity threshold.

We prove the following theorem.

Theorem 6. If v is a weakly separable value functionand there is exactly one homogeneity constraint, then Al-gorithm 3 finds a team allocation in the core in time poly-nomial in |A|, |cmax − cmin|, where cmin = mina c(a) andcmax = maxa c(a).

Proof. First, observe that an agent a can join agentsb ∈ A only where c(b) ∈ [c(a)− δ, c(a) + δ]. So, for any teamT , there exists an integer i ∈ [cmin, cmax] such that for alla ∈ T ,

c(a) ∈ [i, i+ δ].

Without loss of generality, assume cmax − cmin > δ. Forany i ∈ [cmin, cmax − δ], let

Ai := {a : i ≤ c(a) ≤ i+ δ}.

As before, we need only solve Problem 2. For each i, we finda team Ti ⊆ Ai of maximum value using Algorithm 2. Then,among all Ti, we simply return the one with the maximumvalue T ∗i such that

i∗ = argmaxi v(Ti).

Such a set satisfies the homogeneity constraint and obviouslyhas the maximum weight among all teams.

Algorithm 3 finds a team allocation in the core when thereis only one homogeneity constraint and the value function isweakly separable.

Algorithm 3 Team Allocation Algorithm for weakly sepa-rable value functions under one homogeneity constraint

1: For each i ∈ [cmin, cmax − δ], let

Ai = {a : i ≤ c(a) ≤ i+ δ}.

2: while |A| > 0 do3: for i = cmin → cmax − δ do4: Let Ti be the output of Algorithm 2 on Ai.5: end for6: Let T ∗ be the Ti with maximum weight.7: Add T ∗ to P and remove it from A and all Ai.8: end while9: Return P.

The following theorem is immediate.

Theorem 7. A team allocation P is in core if and onlyif Theorem 5 holds for all sets Ai = {a : i ≤ c(a) ≤ i + δ}where cmin ≤ i ≤ cmax − δ.

6.2 Team Allocation under Multiple Homogene-ity Constraints

Finally, we proceed to the case where l homogeneity con-straints are present. For 1 ≤ j ≤ l,

∀ Team T, maxa,b∈T

|cj(a)− cj(b)| ≤ δj .

To identify a team allocation in core, again it is sufficientto solve Problem 2 for this generalized case. We generalizethe approach described in the previous section. In partic-ular, we create an l-dimensional cube based on cj values.The cube contains e1×· · ·× el cells, where ej is the numberof possible integer values for characteristic cj between cjmin

and cjmax − δj :

ej = (cjmax − δj)− (cjmin) + 1,

where cjmin = mina cj(a) and cjmax = maxa cj(a).For any ij (1 ≤ j ≤ `) where cjmin ≤ ij ≤ cjmax − δj ,

we construct a subset of A, denoted by Ai1,...,i` (that corre-sponds to a cell in the cube), as follows:

Ai1,...,i` = {a ∈ A : ∀ 1 ≤ j ≤ `, ij ≤ cj(a) ≤ ij + δj}.

We use Algorithm 2 to find the team with maximumweight in each set Ai1,...,i` and return the team with themaximum weight among all as the solution of Problem 2.The following theorem follows.

Theorem 8. If v is a weakly separable value function,then the above algorithm finds a team allocation in the corein time O(|A| · |S| ·

∏1≤j≤`(cjmax − cjmin − δj)).

7. IMPLEMENTATIONOur algorithm builds teams based on teaming preferences

and δ parameters observed in organic teams. Algorithmicteams can be used by instructors for brainstorming exer-cises, case study discussions, and negotiation simulations.Students are placed into teams to accomplish a collectivegoal in a short period of time. The quality of suggestedteams has an immediate impact on learning outcomes inthese activities.

Organic teams are the main mechanism for creating anintimate network of peers and learners in online courses. Or-ganic team formation requires students to search for teamsthey can join or to find new members to join their currentteam. Both activities are time-consuming. Our algorithmcan suggest new team members to small teams and to stu-dents without teams. This algorithm can implement a rec-ommendation engine to reduce barriers of creating organicteams and to increase the probability of success in resultingteams.

This section describes how we implemented our proposedalgorithm and compares its stability and performance to twobaseline algorithms, Random and Exhaustive-Greedy. TheRandom algorithm groups users randomly considering thecardinality and homogeneity constraints. A student is ran-domly selected and added to a team if homogeneity and sizeconstraints are satisfied. The Exhaustive-Greedy algorithmcreates teams as follows. It starts with an empty set. In eachstep, it adds the student that maximizes the set’s value to

0

0.2

0.4

0.6

0.8

1

Ran

do

m

Gre

edy

Ou

r A

lg

Ran

do

m

Gre

edy

Ou

r A

lg

Ran

do

m

Gre

edy

Ou

r A

lg

Ran

do

m

Gre

edy

Ou

r A

lg

Ran

do

m

Gre

edy

Ou

r A

lg

course 1 course 2 course 3 course 4 course 5

Frac

tio

n o

f St

ud

en

ts in

Sta

ble

Te

ams

Figure 8: Fraction of students in stable teams formed byRandom, Exhaustive-Greedy, and our algorithms

this set as long as size and homogeneity constraints are sat-isfied. A new empty set is created when the current set hask members or when no other student can be added to itwithout violating homogeneity constraints.

The run time of each algorithm is measured on a Mac-Book Air with 8GB of RAM and a 1.7 GHz Intel Core i7cpu running OS X version 10.9.5. The algorithms are im-plemented with R and are single-threaded. The stability ofthe resulting teams formed by each algorithm is measuredby computing the fraction of students in stable teams asdefined in Section 4.

We compared the performance of Random and Exhaustive-Greedy algorithms with our algorithm for all courses, withdifferent values for key parameters. We reproduced con-sistent results across courses and parameters. We reportstability and performance comparisons for creating teamswith maximum size constraint of k = 4 and two homogene-ity constraints on age and education level, with parame-ters δage = 3 and δeducation = 4. The selected δ valuesfor age and education are observed as the maximum differ-ence between team members’ ages and education levels, re-spectively, in 80% of all organic teams across eleven coursespresent in our database.

Figure 8 shows the stability of the three algorithms forfour different courses. We observe that the quality of theteams formed by the Random algorithm is very low, witha stability fraction of about 0.1. The Exhaustive-Greedyalgorithm provides teams with much higher quality thanthe Random algorithm, with an average stability fraction ofabout 0.9. Our algorithm forms the highest quality teams,with a stability fraction of 1. Figure 9 graphs the runningtime in milliseconds displayed in logarithmic scale, demon-strating the efficiency of our algorithm.

8. CONCLUSION AND FUTURE WORKTeams are increasingly indispensable across organizations.

Whether in a large company or startup, a foundation, or aresearch organization, interdisciplinary teams are essentialfor solving problems or performing new and sophisticatedtasks. Despite our increasing dependency on high func-tioning teams, our understanding of the dynamics of teamformation, people’s biases in choosing teammates, and thecomposition of successful teams remains quite limited. This

110

1001000

10000100000

100000010000000

Rand

omO

ur a

lgGr

eedy

Rand

omO

ur a

lgGr

eedy

Rand

omO

ur a

lgGr

eedy

Rand

omO

ur a

lgGr

eedy

Rand

omO

ur a

lgGr

eedy

course 1 course 2 course 3 course 4 course 5

Tim

e (m

s)-l

ogar

ithm

ic

10"

20'

Figure 9: Run time of Random, Exhaustive-Greedy, and ouralgorithms for different courses

paper takes a small step towards using existing data to pro-mote a better understanding of team formation dynamicsand success.

By studying 11 MOOCs and analyzing students’ prefer-ences, we observed that students prefer to form teams thatare homogeneous across age, education level, distance andtime zone, but teams that are diverse with regard to skill set.The same results hold for high-performance teams (teamswith many successful members). Our observations suggesta strong positive correlation between skill diversity and teamsuccess.

We proposed a game theoretic model for the team for-mation problem based on these observations. In particular,having a set of students with multiple characteristics andskills, we defined the automatic team formation problemas the partitioning of students into stable teams that ful-fill predetermined homogeneity and cardinality constraints.The problem was modeled as a collaborative game with non-transferable utility, and the concept of core was utilized tomodel stability. We showed that the core of the defined coali-tional game is always non-empty and proposed a polynomialtime algorithm to form stable teams for a natural categoryof value functions. A set of experiments were conducted thatshow the performance and quality of our proposed algorithmcompared to random and greedy baseline algorithms.

An interesting direction of future work would be to studythe team formation problem when student history informa-tion is present. Specifically, we are interested in consideringinformation on how students formed teams and how wellthey performed in previous assignments and courses. Thisdata could help us define a proficiency score for studentsacross different skills and topics. This is a generalizationof [2], where students may have different proficiency levelson different subjects considered together with profile ques-tion constraints (such as age, education, and location). Itwould be interesting to discover whether teams were ho-mogeneous or heterogeneous across proficiency levels andwhether teams with more homogeneous proficiency levelswere more successful.

We are also interested in considering social connectionsbetween users and whether these connections affect students’

preferences to form teams, generalizing our model based onthese observations.

Another interesting direction would generalize the prob-lem for a dynamic setting, where the goal is to make rec-ommendations to users to join teams or invite others andupdate the recommendations if users decline.

Finally, we would like to observe how teams evolve overtime, how users behave in teams, when users leave their teamor get asked to leave, and how new members are selected.

9. REFERENCES[1] R. Agrawal, B. Golshan, and E. Terzi. Forming

beneficial teams of students in massive online classes.In Proceedings of the 1st ACM Conference onLearning@ scale, pages 155–156. ACM, 2014.

[2] R. Agrawal, B. Golshan, and E. Terzi. Groupingstudents in educational settings. In Proceedings of the20th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, pages1017–1026. ACM, 2014.

[3] A. An, M. Kargar, and M. ZiHayat. Finding affordableand collaborative teams from a network of experts. InProceedings of the 2013 SIAM InternationalConference on Data Mining, pages 587–595, 2013.

[4] A. Anagnostopoulos, L. Becchetti, C. Castillo,A. Gionis, and S. Leonardi. Online team formation insocial networks. In Proceedings of the 21stInternational World Wide Web Conference, pages839–848. ACM, 2012.

[5] C. M. Beckman. The influence of founding teamcompany affiliations on firm behavior. Academy ofManagement Journal, 49(4):741–758, 2006.

[6] R. Bredereck, T. Kohler, A. Nichterlein,R. Niedermeier, and G. Philip. Using patterns to formhomogeneous teams. Algorithmica, pages 1–22, 2013.

[7] C. E. Eesley, D. H. Hsu, and E. B. Roberts. Thecontingent effects of top management teams onventure performance: Aligning founding teamcomposition with innovation strategy andcommercialization environment. Strategic ManagementJournal, 35(12):1798–1817, 2014.

[8] K. M. Eisenhardt and C. B. Schoonhoven.Organizational growth: Linking founding team,strategy, environment, and growth among U.S.semiconductor ventures, 1978-1988. AdministrativeScience Quarterly, pages 504–529, 1990.

[9] D. P. Forbes, P. S. Borchert, M. E. Zellmer-Bruhn,and H. J. Sapienza. Entrepreneurial team formation:An exploration of new member addition.Entrepreneurship Theory and Practice, 30(2):225–248,2006.

[10] D. B. Gillies. Solutions to general non-zero-sumgames. Contributions to the Theory of Games,4(40):47–85, 1959.

[11] X. Han, Y. Liu, X. Guo, X. Wu, and X. Song. Timeconstraint-based team formation in social networks. InProceedings of the 2013 International Conference onMechatronic Sciences, Electric Engineering andComputer (MEC), pages 1600–1604. IEEE, 2013.

[12] M. Hoefer, D. Vaz, and L. Wagner. Hedonic coalitionformation in networks. In Proceedings of the 29thAAAI Conference on Artificial Intelligence, 2014.

[13] J. Huang, A. Dasgupta, A. Ghosh, J. Manning, andM. Sanders. Superposter behavior in mooc forums. InProceedings of the 1st ACM conference on Learning@scale, pages 117–126. ACM, 2014.

[14] M. O. Jackson et al. Social and economic networks,volume 3. Princeton University Press, 2008.

[15] T. Lappas, K. Liu, and E. Terzi. Finding a team ofexperts in social networks. In Proceedings of the 15thACM SIGKDD International Conference onKnowledge Discovery and Data Mining, pages467–476. ACM, 2009.

[16] A. Majumder, S. Datta, and K. Naidu. Capacitatedteam formation problem on social networks. InProceedings of the 18th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,pages 1005–1013. ACM, 2012.

[17] L. L. Martins, L. L. Gilson, and M. T. Maynard.Virtual teams: What do we know and where do we gofrom here? Journal of Management, 30(6):805–835,2004.

[18] A. S. Mello and M. E. Ruckes. Team composition. TheJournal of Business, 79(3):1019–1039, 2006.

[19] S. S. Rangapuram, T. Buhler, and M. Hein. Towardsrealistic team formation in social networks based ondensest subgraphs. In Proceedings of the 22ndInternational World Wide Web Conference, pages1077–1088. International World Wide WebConferences Steering Committee, 2013.

[20] A. E. Roth and M. A. O. Sotomayor. Two-sidedmatching: A study in game-theoretic modeling andanalysis. Number 18. Cambridge University Press,1992.

[21] B. Senior and S. Swailes. The dimensions ofmanagement team performance: a repertory gridstudy. International Journal of Productivity andPerformance Management, 53(4):317–333, 2004.

[22] L. Sless, N. Hazon, S. Kraus, and M. Wooldridge.Forming coalitions and facilitating relationships forcompleting tasks in social networks. In Proceedings ofthe 2014 International Conference on AutonomousAgents and Multi-agent Systems, pages 261–268.International Foundation for Autonomous Agents andMulti-agent Systems, 2014.

[23] W. E. Watson, K. Kumar, and L. K. Michaelsen.Cultural diversity’s impact on interaction process andperformance: Comparing homogeneous and diversetask groups. Academy of Management Journal,36(3):590–602, 1993.

[24] B. A. Williams. Peers in moocs: Lessons based on theeducation production function, collective action, andan experiment. In Proceedings of the 2nd ACMConference on Learning@ Scale, pages 287–292. ACM,2015.

[25] A. Zzkarian and A. Kusiak. Forming teams: ananalytical approach. IIE Transactions, 31(1):85–97,1999.