View
3
Download
0
Category
Preview:
Citation preview
arX
iv:1
705.
0340
5v3
[ph
ysic
s.so
c-ph
] 2
1 Ju
l 201
7
A complex network analysis of ethnic conflicts and human rights violations
Kiran Sharma,1, ∗ Gunjan Sehgal,2, † Bindu Gupta,2, ‡ Geetika Sharma,2, §
Arnab Chatterjee,2, ¶ Anirban Chakraborti,1, ∗∗ and Gautam Shroff2, ††
1School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India2TCS Research, New Delhi, India
News reports in media contain records of a wide range of socio-economic and political eventsin time. Using a publicly available, large digital database of news records, and aggregating themover time, we study the network of ethnic conflicts and human rights violations. Complex networkanalyses of the events and the involved actors provide important insights on the engaging actors,groups, establishments and sometimes nations, pointing at their long range effect over space andtime. We find power law decays in distributions of actor mentions, co-actor mentions and degreesand dominance of influential actors and groups. Most influential actors or groups form a giantconnected component which grows in time, and is expected to encompass all actors globally in thelong run. We demonstrate how targeted removal of actors may help stop spreading unruly events.We study the cause-effect relation between types of events, and our quantitative analysis confirmthat ethnic conflicts lead to human rights violations, while it does not support the converse.
Keywords: complex network analysis, ethnic conflict; human rights violation, media reports
I. INTRODUCTION
The quantitative analyses of the gigantic amount of data associated with human social conditions have gainedmuch momenta in the recent years, especially through the multidisciplinary tools and approaches. The traditionaltheories of social science together with complex network analysis [1–3] and addition of new tools and paradigms fromseveral science disciplines like theoretical physics, applied mathematics, computer science, as well as psychology, havedeveloped into what is presently known as computational social science [4] and have helped uncover new patterns ofsocial behavior, including social dynamics [5, 6]. Data made available digitally, has drawn attention of researchersacross disciplines who have collaborated and contributed in their own ways to scientifically analyze and understandcomplex social phenomena in the recent years, previously not known to the present scale of detail.Social behavior, conditions, events, organizations, which have given shape to the history of human civilization, have
often been positive or constructive – building societies, shaping cultures, and at other times negative or destructive– conflicts, wars and battles followed by reorganization of countries or nations. Human civilization has evolved in acomplicated manner and created a variety in ethnic characters and cultures, in languages, beliefs and customs. It iswidely understood [7] that historically, certain groups pay priority to their linguistic, cultural, racial, and religiousties of individuals within the groups, which are then passed down through generations. Ethnic violence arises whengroups attempt to maintain their boundaries against pressure from historical enemies. In another possible scenario,when the authority of a multi-ethnic state declines, the central regime ceases to protect the interests of ethnic groups,creating a void in which ethnic groups start competing to establish and control a new regime that will protect theirinterests. Socio-political tension in such a situation is likely to incite violence.Ethnic heterogeneity is a characteristic feature observed in most countries and regions worldwide. With time, these
variations often reduce locally, where socio-economic and political forces are less dominant in favor of ethnic mixing.However, the course of political history along with that of religion and culture often encounters conflicts at variousscales. Ethnic conflicts are comparatively frequent in certain regions and rare elsewhere. Ethnic heterogeneity doesnot necessarily breed war and its absence does not ensure peace. Even today, ethnic wars continue to be globallythe most common form of armed conflicts, but the mechanisms that lead a society down the path of ethnic conflictare yet to be fully understood. What is intriguing, is the connection between democratization and the occurrence
∗Email: kiran34 sit@jnu.ac.in†Email: sehgal.gunjan@tcs.com‡Email: bindu.gupta2@tcs.com§Email: geetika.s@tcs.com¶Email: arnab.chatterjee4@tcs.com∗∗Email: anirban@jnu.ac.in††Email: gautam.shroff@tcs.com
2
of ethnic conflict. While stable democracies are unlikely to wage war with other democracies, a country that issocio-politically unstable may very well find conflicts between groups with opposing interests [8]. Human rights areinternationally agreed values, standards or rules regulating the conduct of states towards their own citizens andtowards non-citizens. Interestingly, the violation of human rights appears to be more associated with ethnic conflictsthan abuses of economic and social rights [9]. Other political, economic and social preconditions may also influencethe causes of ethnic conflicts, and the conscious promotion by the political actors of any polarizing dimension basedon these factors is sufficient to lead to conflict. It is often seen that changes in political regime can lead to conflictswhich are often ethnic in various parts of the world, sparking human rights violations. Therefore, spatio-temporalanalyses of regional conflict formations and political dynamics, and the statistical studies of the different variablesare important.Temporal data of events like human to human communications and physical contacts/proximity has been studied
in recent years (see e.g., Holme and Saramaki [10]) in connection to study of epidemics and contagion, spreading ofinformation using mobile phone communication data [11] as well as data from proximity sensors carried by humanentities [12].Here we provide a quantitative analysis of the scale and topology of ethnic tensions, related conflicts and violations
of human rights, using the reports in digital media. We look at data from a publicly accessible database, which keepsaccount of events from news available in media. We particularly focus on ethnic conflicts (EC) and human rightsviolations (HR) by suitably filtering keywords present in the digital text transcript of the news. With the availability ofhigh precision data containing precise spatio-temporal information, one can look towards finding correlations betweenevents, involved actors (individuals, groups, organizations or states), the geographical pattern of spreading of conflictsetc. Although there have been studies on conflicts, wars and terror attacks [13–16], and speculation of ethnic conflictsusing census data for segregated population [17], a comprehensive study of the actors involved in ethnic conflicts andhuman rights violation has been lacking. In this work, for the first time, we provide a quantitative and qualitativeunderstanding of the relative activity of the actors, frequently engaging actor pairs, the network of actors, and giveinsights into the static and dynamical aspects of the actor network in a scenario involving ethnic conflicts and humanrights violations.
II. RESULTS
GDELT Event Database [18] contains the database of news articles from all over the world in several languages.Using a query, it is possible to extract data for events, about ethnic conflicts (EC) and human rights violations (HR)happening around the world, spanning over a large time scale. Each event data contains information such as the pairof actors involved, a unique time stamp, names of individuals, organizations or groups, the location information ofthe event, as well as latitude, longitude data of actors and the event. The news event and subsequently the dataserves as a proxy for the actual event and its intensity (in terms of number of reports). Although we do not analyzethe actual news reports, we consider a report on EC to be an actual instance of ethnic conflict or its possibility, andsimilarly for HR. We analyzed 28, 055 events of EC and 36, 470 of HR for a 15 year period (2001-2015).
0
20
40
60
80
100
2001 2004 2007 2010 2013 2016
n
time (year)
(b)HR
0
40
80
120
160
2001 2004 2007 2010 2013 2016
n
(a)EC
10-4
10-3
10-2
10-1
100
100 101 102
cum
ulat
ive
prob
abili
ty Q
(n)
news reports on a single day, n
(c)
ECHR
FIG. 1: The time sequence of the number of events n reported daily for (a) EC, (b) HR during 2001-2015. (c) The cumulativeprobability (CCDF) Q(n) that n or more events are reported on a particular day. The data for EC seems to fit well to a powerlaw for the largest values, with decay exponent around 2.54 ± 0.03, HR fits well to a stretched exponential (exp[−anb] witha = 0.44 ± 0.12 and b = 0.62 ± 0.01).
3
The number of news entries n per day is a stochastic variable, and often there is a burst of activity noted (Fig. 1a,b).The amount of data cataloged is also observed to have increased since the late 2000’s. The number of news entries/reports on a single day n has a broad distribution due to the large inter-day fluctuations in the number of reports andbursty nature of the data. Fig 1c shows the complementary cumulative distribution function (CCDF) that a day hasmore than n news reports, Q(n). For EC, the distribution has a short lognormal body with a rather prominent powerlaw tail (decay exponent close to 2.54 ± 0.03); for HR, the bulk of the distribution fits to a stretched exponential(exp[−anb] with a = 0.44 ± 0.12 and b = 0.62 ± 0.01) The details of the database, data extraction and differentattributes are described in Supplementary Information. In our study, we focus on a relatively recent time span(2001-2015) and extract the data for actor pairs and geographical location of events from the database.
A. Static properties of the network
We construct the aggregate network over a given period of time, for the entire 15 year period (2001-2015) as wellas for each of the 15 individual years, for the two datasets. Each event is visualized as a link between the actorsinvolved, which are the nodes, thus creating a network of actors connected by events. The details of the constructionis given in the Methods section. A schematic representation of the network construction is given in Fig. 2a and atypical network is shown in Fig. 2b for EC data aggregated over the year 2001.
��
�����
�����
���
�����
�����
���
�����
����
�
�����
����
��
��
�
�� ���
���
��
��
�
��
�����
����
�
�
��
�����
�����
��
��
�
��
��
��
ALB_AL
MKD_AL
CVL_AL
CAF_CT00GOV_CT18
HRV_HRMOS_HR
GOV_AL
ALB_MK00
ARM_AM
AZE_AJ00
NGO_IS
PSE_IS
HRVCVL_HR
ISR_ISISRGOV_IS
ALBCVL_AL
BTN_BT
nep_NP00CVL_NL
HRV_NL
UIS_RB
IGOEURSCE_ALMKD_MKD4
nep_BT
JUD_BK02
GOV_BK02
ALBUAF_ALMKDCVL_MK
srp_RB USA_US
srp_BK02
FRA_MKFRAGOVMIL_MK
BLR_BO00
AZE_BO00
UAF_AL
srp_HRHRV_BK02
GOV_IS
ALB_AL47CVL_AL47
ALBGOV_AL
MKD_MK00
CVL_RB00
RUS_RS48ALB_RB00
MIL_ALUIS_AL
MKD_MJ
EGY_EG11
ALBRAD_AL
JUD_RB
GOV_RB
ALB_AL45SCGSRB_AL45
NLD_BK02
COP_BK02
ALBCVL_AL47
CVL_MK00
ALBREB_MKE8
MKD_MKE8
MOS_RB00
srp_BK01
NLD_BKBEL_RB00
CVL_RSCIRUS_RSJA
TUR_CY
CYP_CY
JEW_US
CVL_US
GRC_CH22
CYP_CH22
PAK_PK03CVL_PK03
MOS_BK02
HRVCVL_RBRUSMIL_RS48
RWA_BYBDI_BY
IRQ_IZ05
kur_IZ05
FRA_FRA5MOS_FRA5
CHN_RBALB_RB
HIN_IN36UAF_IN36
BTNREF_BT
ALB_MKCOP_MK
ARM_AJ00
IRQMED_IZ14IRQ_IZ14
JUD_IS
MOS_USPSE_BK
srpCHR_RBIGOWSTNAT_RB
ALB_AL42
MOS_BM17MMR_BM17
NGA_NI26NGALEG_NI26
PAK_PK05
UZB_UZ
MKD_MK
IRN_IR15
IRQ_IZ07NGOHRIAMN_IZ07
CVL_IS
AFG_AF13
TJK_TI
REB_MKE8
IGOWSTNATMIL_MK00
CVL_BK02
GBR_UKAZE_AJ61
ITA_BM17
IGOMEAARL_EG11
CVL_RS48
INDGOV_PKPAK_PK
ALB_MKE8
SCGSRB_RI00
CVL_RIHRV_RI
PSE_GZ
IDN_ID13mad_ID13
EUR_SFZWEGOV_SF
GRC_GR
CYPCVL_CY04
MOS_UK
IGOUNO_RB
GOV_AF13AFGCVL_AF13
MKD_MKE3
AZE_AJ57
znd_AJ00
AFR_SF
ZWEGOV_ZI
IRQGOV_IZ07
ISRMED_US
ISR_US
CRM_RB
MOS_RB
ALBRAD_MK00GOV_MK00
IGOEURSCE_MKD4
GBRMED_UKU2
GBR_UKU2
CVL_MYPSE_MY
RUS_UK
ALBMIL_MK00
MKDGOV_MKE3
MIL_PL
COP_PL
ISR_BE
HRV_HR21
srp_HR21NLDJUD_RB
bos_RB
MIL_UKche_RS17
ALB_AL40ALBUAF_AL40
PAK_IN36
IRQGOV_IZ05
MIL_RB00
HRV_RB00
srp_USHRV_US
AZE_AJ09
MOS_BK
TUR_TUCYPGOV_TU
NLDJUD_BK02
GBRGOV_ITJUDGOV_IT
ITA_IT07NLD_IT07
GOV_CQCHN_CQ
USA_AF
NGA_NI00
GOV_RB00
srp_BK
COP_RB00
PAK_IN12UAF_IN12
IRQ_IR15
IRQGOV_IR15
ARM_AM08
ISRGOV_IS00
PSEMED_IS
IMGSEAMOS_RP21
MOS_RP21
CVL_RB
RUS_RSCI
HRVGOV_HR08
HRV_HR08
MOSSFI_IZ05
HRV_HR12
CAF_CT18
GOV_ID00
IDN_ID00
IRQ_IZ13NGOHRIAMN_IZ13
CVL_NI00
CVL_SF
GOV_IR26
IRN_IZ
IRQ_IZ
GOV_IS00
PSE_IS00
RUS_RS17
MED_MK
IRQGOV_IZ13
ARMCVL_AM
LEG_US
GOV_NI00
NGOHRIAMN_IR15kur_IR15
MKD_NL
ALB_UK
ZAF_SF02GOV_SF02
PSEGOV_IS
MKDLEG_MK00
day_ID13
ISR_IS05PSE_IS05
UISLEG_MK00MKD_CH
ZWE_ASAGR_AS
MOS_YI
HRV_YI
MKDCOP_MK00
MKD_MKE2CRM_AL
MKDGOV_MK00
CVL_HR
COP_AL
CVL_BKGOV_BK
USA_USDC
MIL_USDC
ALB_BKEUR_BK
srp_RB00
MOS_BK01
IGOWSTNAT_MK00
HRV_HR04
srp_HR04
AGR_MK00
MED_PK03
ITA_IT09
ALB_RS47
MKD_RS47
IGOWSTNAT_RB00IGOWSTNATMIL_AL
ALB_RS48
NGA_US
ALB_AL46
RAD_MK00
srp_AL
MED_IZ15kur_IZ15
HRV_FRA5
IRQ_IZ11IRQGOV_IZ11
COP_BK
GOV_TT
NLD_AL
PSE_US
kur_IZ13
RUS_US
CVL_ID13
USAGOV_USDC
MED_IN12kas_IN12
IGOUNOREFHCR_AL
ALB_AL48
MKD_GM
UAF_MK
CVL_MK
MOS_MK00
RUS_RS12RUSPTY_RS12
UIS_RB00
ISR_IS03
COP_MK00
MED_MK00
AFR_SF02
CVL_USDCPSE_USDC
COP_ID00
TUR_TU68
AZE_AM
MIL_RS17
ISR_SF02
PSE_SF02
SDN_SU29
nba_SU29
ISRMED_USDCISR_USDC
HRV_HR19
MOS_HR19
HRV_RI00
MOSSFI_IZ18
IRQ_IZ18PAK_PK04CVL_PK04
MED_AL
MOS_FICVL_FI
MKDGOV_MK
GOV_IZ05
CVL_NI26
AZECVL_AJ00
IRN_IZ11
LEG_NI26
LEG_NI43
NGA_NI43
NGO_GM
IGOMEAARL_USDC
SCGSRB_RB00
PSE_EG11
CVL_SF02
JUD_RB00
srp_SZHRV_SZ
MKDCVL_MK00
IRQ_UKH9
IRN_IZ12
srpCRM_RB
COP_RS48
PTY_BK02
ALBMED_AL
IRQGOV_IZ
HRV_BK01
RUS_AF30TJK_AF30
PSE_RS48
MIL_NI00
kur_SF
PAK_UZUZB_AF
PTY_HR21
GOV_RIMOS_RI
UIS_RINGOHRI_RS17
GOVMIL_BM17
NLD_RI
BEL_RI
IGOUNOREFHCR_RB00
IGOEUREECGOV_MK00
kur_IZ
INT_MK
ALB_YI
IGOWSTNAT_YI
MKDGOV_MKE2
GOV_NI26
MOS_HR12
MED_UKU2
UAF_GMPSE_WE00
ISR_IS00IGOWSTNAT_AL
CVL_IS00
SIK010_PK05
HIN_PK05
HRV_RB
HRVCVL_HR21
kur_IZ11
nai_USDC
JEW_IS
HRV_BK
MED_KESCGSRB_KE
HIN_IN12 EUR_USDC
CVL_NI43
RUS_RS
RUSGOV_RS
LBR_LIEDU_TS
UAF_USGAUSA_USGA
JUD_AF13
IGOWSTNAT_BK
UAF_MKD4
AZE_AJREF_AJ
IGOUNO_BK01JUD_BK01RUS_AF
AGR_ZA03
GOV_ZA03
ALBREB_AL
MIL_RI00MILMIL_RI00
GOV_NL07
REF_ID00
NLD_MKJUD_MK
IDN_ID04
mad_ID04
PAKUAF_PK03
SIK_PK05
IRQGOV_IZ18
GOV_LY
MED_LY
CYPGOV_CY04
GOV_USDC
ARM_AJ61
AGR_ZI04
GOV_ZI04
RUS_RB
JUD_US
GOV_US
IGOUNO_BK02
GOVHRI_ID13
GOV_SO02SOM_SO02
CHN_HR
RUS_RB00
IRQ_TUIRQGOV_TU
AFR_ZIZWE_ZI
NLD_RB00
MKDGOV_AL
MLI_ML
MOSSFI_TU68
IRQ_TU68
GOV_FR
GRC_CY
IRQ_US
MKD_BU52
MKDGOV_BU52
ALBUAF_MKD4
ALBRAD_RB00
HRVJUD_HR21
MKDEDU_AL
EDU_RB00
MED_USDC
MOSSFI_NL
IRQ_NL
CAF_CT
FRACVL_FRB9
MKDMED_MK
USALEG_USNCnba_SU56
EDU_LI
NLDGOV_NL07
IGOWSTNAT_SU
RUS_RS47
SCGSRB_RI
SDN_SU56
REF_AL40
ARMGOV_AJ00
SCGSRBMIL_RI00
ISR_GZCOP_IS00
idg_ID00
madREF_ID00
GOV_SU56
NGAGOV_NI45NGA_NI45
RUS_RS20
GOV_EN01CHN_EN01
HRV_NL07
haz_AF10
NGALEG_US
IDN_ID12COP_ID12
GOV_IS02ISR_IS02
UISLEG_CH
LEG_IS
CVL_AG
GOV_AF11
AFGCVL_AF11
ALB_AL50
NLDJUD_BK
JEW_IS00
CVL_HR04NLDJUD_HR04
INT_IN12CVL_IN12
ALB_MKD4
GOVREB_GR35
REB_GR35
MED_IZ14IRQ_IZ08
ALB_NL
CRMJUD_IS
GOV_HR04srp_USDCGOV_MX11
ZWE_ZI05
MOS_NL07
CVL_SO02
PSEREB_MK
UAF_PK
MMRELI_BM11MMRMIL_BM11
ALB_LYIGOUNOHRIHCHJUD_IS05
MKDEDU_MK
MOS_TU00
TURCVL_TU00
IGOWSTNAT_US
srp_NL11
NLD_NL11
TUR_GR
CYP_GR
IGOUNOHRIHCHJUD_AL
MIL_RI
ALB_RI
NGACVL_NI00
CVL_RSJA
zap_MX25
CVL_MX31
MEX_MX11AGR_MX11
HRVJUD_RB
MCOELI_ID13
day_ID04
zap_MX31
IRN_IZ05
RUSCVL_RSCI
EC−2001(b)
FIG. 2: (a) The schematic visualization of events on a time-line and the subsequent evolution of the aggregated network ofactors: Events appear sequentially. Event E1 involving actors A1, A2 is followed by event E2 involving A2, A3 and so on. Thelinks depict the actors involved together in an event. New events (actors in yellow and links in red) add to actor mention (sizeof node) and actor co-mention (link weight; thickness of links). (b) The network of actors for EC for 2001. Each actor is anode and any two actors ever involved in an event are connected by a link. The size of the nodes are proportional to the totalnumber of mentions in the period, and the relative thickness of the links are proportional to the total number of co-mentionsof the actors (link weights). The largest connected component (giant component) is depicted in red.
In our study, the temporal granularity of the data is one day. We extracted (i) the number of mentions m of eachindividual actor and (ii) the number of co-mentions w of an unique pair of actors (actor pair mentions), as well as thenumber of unique actors k, one actor is involved with. In terms of the network, m measures node strength, w the linkweight and k the degree of a node. While m and k measure the importance, activity or visibility of a single actor, wmeasures the involvement of an actor pair in inciting an event. These quantities are measured for each year for the15 year period (2001-2015) as well as through the whole period (aggregate over 15 years).It is often perceived that certain actors frequently engage in reported events than others and the same is true for
pairs of actors engaging in ethnic conflicts or human rights violations. We computed the complementary cumulative
4
probability distribution (CCDF) for different quantities for the aggregated network in 2001−2015 (refer to Fig. 3a,b,c).For the actor mentions, the probability that an actor is mentioned m times or more fits to a power law for the
✶�✲✁
✶�✲✂
✶�✲✄
✶�✲☎
✶�✲✆
✶�✵
✶�✵
✶�✆
✶�☎
✶�✄
✶�✂
◗❛✭✝✞
♠
❆✟✠✡☛ ☞✌✍✠✎✡✍✏ ✑✒✒☛✌✒✑✠✌✷��✶✓✷�✶✔
✕✖✗
❊✘❍✙
✚✛✜✢
✚✛✜✣
✚✛✜✤
✚✛✜✥
✚✛✜✦
✚✛✧
✚✛✧
✚✛✦
✚✛✥
✚✛✤
★✩✪✫✬✮
✇
✯✰✱✳✴ ✸✹✺✴ ✻✼✽✱✺✳✽✾ ✹✿✿✴✼✿✹✱✼❀✛✛✚❁❀✛✚❂
❃❄❅
❇❈❉❋
●■❏❑
●■❏▲
●■❏▼
●■❏◆
●■❏❖
●■P
●■P
●■❖
●■◆
●■▼
❘❙❚❯
❦
❱❲❳❨❩ ❬❭❪❩❭❭ ❫❪❪❩❭❪❫❳❭❴■■●❵❴■●❜
❝❞❡
❢❣❤✐
FIG. 3: Structural properties of the aggregated network in the period 2001-2015: (a) Plot of the cumulative probability (CCDF)Qa(m) that an actor is mentioned at least m times. Except for an exponential decay at the very end of the tail, most of thedistribution has a the power law decay Qa(m) ∼ m−ν1 with exponents ν1 as 1.23±0.01 for EC and 1.28±0.01 for HR. (b) Plotof the cumulative probability Qab(w) that an actor pair is mentioned at least w times. The tail of the distributions fit well topower laws Qab(w) ∼ w−ν2 with exponents as 1.58± 0.01 for EC and 1.74± 0.03 for HR. (c) Plot of the cumulative probabilityQ(k) that an actor is connected to k others or more. The tail of the distributions fit well to power laws Q(k) ∼ k−ν3 withexponents ν3 as 1.52 ± 0.01 for EC and 1.48 ± 0.01 for HR.
largest values, Qa(m) ∼ m−ν1 , with decay exponents 1.23± 0.01 for EC and 1.28± 0.01 for HR. The exponents arecomputed, along with the values of the standard errors (uncertainties) and significances, using Maximum LikelihoodEstimates [19]. The probability that an actor pair is mentioned w times or more fits well to power laws Qab(w) ∼ w−ν2
with exponents ν2 as 1.58±0.01 for EC and 1.74±0.03 for HR data. The actor mentions and actor pair mentions arerespectively the node weights and edge weights in network terminology. The probability that an actor was involvedwith k or more actors during the time span (degree of the actor) fit well to power laws Q(k) ∼ k−ν3 with exponentsν3 as 1.52 ± 0.01 for EC and 1.48 ± 0.01 for HR data. For the individual years, due to less aggregation, the countsare less and data naturally seems noisy. However, the data still seem to exhibit the power law tail in the probabilitydistributions (see Supplementary Fig. S7). The fitting exponents for the individual years are given in SupplementaryTable S1. The correlation between actor degree and mentions, as well as the relationship between the power lawexponents ν1 and ν3 are shown in Supplementary Fig. S8.The above results quantitatively characterize the heterogeneity in the activity of actors, while most actors are
relatively less active: the power law distributions for actor mentions Qa indicate that there are a significant few whoconstantly engage in ethnic conflicts and human rights violations issues. The power law distributions in actor pairmentions Qab indicate similar characteristic for pairs of actors. The broad degree distributions Q(k) are indicative ofthe fact that the number of actors engaging with very large number of actors are also significant.
B. Clusters
The aggregated network is found to be composed of several disconnected components or ‘clusters’. Physically thismeans that the actors constituting one cluster have never been involved with any actor from a different cluster. Forboth sets, the largest connected component is 102 − 103 times the smaller clusters, which are large in number (seeSupplementary Table. S2). In fact, the largest cluster s1 grows superlinearly with the size of the network N , as wefound (from the data in Supplementary Table. S2), s1 ∼ N δ, with δ = 1.17 ± 0.01 for EC and 1.19 ± 0.02 for HR(Supplementary Fig. S9). This tells us that the fraction of nodes in the largest cluster grows with the total sizeof the network quite fast so that eventually the fraction of nodes outside the largest cluster will be negligible. Wecomputed the CCDF Q(s) of cluster size s and find that it roughly has a power law decay for components except thelargest one, with a decay exponent roughly close to 3 (see Fig. 4). The CCDF for individual years has been shown inSupplementary Fig. S10. The largest clusters in both sets have a slowly decaying clustering coefficient 〈C(k)〉 withdegree k (see Supplementary Fig. S11).
C. Network growth properties
In order to explore the dynamics of the process that leads to the broad distribution of the mentions and degrees,we investigate the dynamics of growth for these quantities over the span of 15 years (2001-2015). For good statistics,
5
10-4
10-3
10-2
10-1
100
100 101 102 103 104
Q(s
)cluster size s
2001-2015
ECHR
FIG. 4: Plot of the cumulative probability (CCDF) Q(s) that there is a cluster of size larger than s, for data aggregated overthe period 2001-2015. The size of the largest cluster is very large compared to the rest.
we ranked the actors according to the number of mentions and degrees, and analyzed the data for the top 10 actors.We first measured the growth of the degrees and mentions for the top 10 actors of EC and HR data, with respect to
the time t∗ when they were first mentioned. The long time behavior (asymptotic) indicates a growth law of (t− t∗)β
with β ≃ 3.
✶�✲✁
✶�✲✂
✶�✲✄
✶�✵
✶�✄
✶�✂
✶�✁
✶�✹
✶�✵ ✶�✄ ✶�✂ ✶�✁ ✶�✹
♣✭☎✆
♠✝✞✟✠✡✞☛ ♠
❊☞
✌✍✎
■✏✑✒■✏��P✏✓✒■✏��
■✏✑✒■✏❆✑✔✒❆✔❆✕✓✒❆✖✑❘✏✒✑✏P✏✓✒■✏
❆✕✓✒❆✖�✗❆✘✙✒❆✘
✑❘✏✒✑✏✚✛✜✄
✢✣✤✥
✢✣✤✦
✢✣✤✧
✢✣★
✢✣✧
✢✣✦
✢✣✥
✢✣✩
✢✣★ ✢✣✧ ✢✣✦ ✢✣✥ ✢✣✩
✪✫✬✮
✯✰✱✳✴✷✱✸ ✯
❍✺
✻✼✽
❯✾✿❀❯✾❁❂❃❯✾❀❃✾❄❅
❯✾✿❀❯✾❇❃❈❀❇❃❉❋❃❯✾❀❃✾❇❃❈❀❇❃
❇✾❃❀❇✾✣✣❂●❈❀❂●❉❉❏❑❃❀❯▲●▼
❏❑❃❀❯▲◆✧
FIG. 5: Cumulative growth rates π(m) for mentions m for (a) EC and (b) HR datasets. The curves asymptotically fit toπ(m) ∼ ma with a > 1. The precise fitting exponents are given in Supplementary Table. S4.
To extract the rate of growth for a quantity x, we computed Π(x) = ∆x∆t
. Π(x) in real data turns out to be very
noisy, so we computed the cumulative integral π(x) =∫ x
0 Π(x′)dx′ which turns out to be less noisy. We observethat data for degree π(k) has a shorter span and is noisier than that for mentions π(m). The plots suggest thatboth π(m) and π(k) are superlinear functions of their respective arguments, i.e., π(x) ∼ xα with α ∼ 1.2 − 1.4 (seeFig. 5 for mentions and Supplementary Fig. S12 for degree). Thus the asymptotic growth rate Π(x) ∼ xα−1 is stillweakly dependent on the respective arguments. We thus identify that the growth rate of the system (network) isnot independent of the size of the node (degree or mentions). The growth exponents are computed using MaximumLikelihood Estimates [19] and tabulated in Supplementary Table. S4. A superlinear growth rate implies that thenetwork will, in the long run, consist of a very large ‘condensate’ with a small fraction of nodes in isolated smallerclusters, which is consistent with our analysis of the cluster size distribution.
D. Tolerance to attack and failure
We also study how the network breaks down under attack, in order to investigate the possibility of preventing unrulyevents to spread [20]. The largest connected component of the network is subjected to targeted attack by removal of
6
the most connected nodes. We start by removing the node with the highest degree, followed by the next highest andso on. This results in rapid fragmentation or destruction of the network. We compute the fraction of nodes G presentin the largest cluster, which is observed to decrease very quickly (Fig. 6a). In fact, the network of actors can bedestroyed by targeted attack just by removing much less than 10% of the nodes compared to when randomly selectednodes are removed (random failure) one after another (Fig. 6b). This exercise indicates that targeted interventionmay help stop spreading of ethnic conflicts and human rights violations.
0
0.5
1
1.5
2
2.5
0 0.05 0.1 0.15 0.2 0.25 0.3
G, <
c>
Attack(a)
EC, GEC, <c>
0
0.5
1
1.5
2
2.5
0 0.05 0.1 0.15 0.2 0.25 0.3
G, <
c>
fraction of nodes removed
Attack(b)
HR, GHR, <c>
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
G, <
c>
Failure(c)
EC, GEC, <c>
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
G, <
c>
fraction of nodes removed
Failure(d)
HR, GHR, <c>
FIG. 6: The structure of the network under attack: Nodes are removed in the sequence of their degrees starting from thehighest degree. The plot shows the behavior of the giant component G (fraction of nodes in the largest connected component)and the average number of nodes in the clusters other than the giant component 〈c〉, with increasing fraction of removed nodes,for (a) EC and (b) HR. The structure of the network under random failure: Nodes are removed randomly. Results are shownfor (c) EC and (d) HR networks. The networks are destroyed very quickly by targeted node removal (attack), compared torandom node removal (failure). The results are for networks aggregated over 2001-2015.
E. Measuring causality
Considering that we have time series data for the number of mentions of ethnic conflicts and human rights violationsfrom news articles it would be interesting to study if there is a clear cause and effect relation between these events withtime. To determine the causal effect purely from the observations of the past data, we apply Granger Causality [21]which estimates the causal relationship by observing the changes in the distribution of the variables over time.Let us consider two random variables depicting counts of EC and HR. To say that EC causes HR, Granger causality
computes a regression of variable HR on the past values of itself and the past values of EC and then tests the significanceof coefficient estimates associated with EC. We consider a bivariate linear autoregressive model on EC and HR, andassume the L.H.S. to be dependent on the history of EC and HR,
HRt = a0 + a1ECt−1 + . . .+ ahECt−h + b1HRt−1 + . . .+ bhHRt−h + Et
where h is the maximum number of lagged observations (for both EC and HR). The coefficients ai, bi are thecontributions of each lagged observation to the predicted value of ECt−i and HRt−i respectively and Et is theprediction error.We set up a null hypothesis and to test the significance of the coefficients, compute the p value. If the p-value is
less than 0.05, one can reject the null hypothesis. We tested the counts for year wise as well as month wise mentions.Comparing the p values for both cases, we found that: (a) one cannot conclude that HR causes EC, but (b) EC causesHR. Hence, we can confirm that ethnic conflicts cause human rights violations, while human rights violations are notresponsible for ethnic conflicts. The details of the analysis are provided in the Supplementary Information.
7
III. DISCUSSIONS
News reports serve as a reasonable proxy for the importance and intensity of events. The intensity is reflectedby the number of reports and lingering span of time through which the reports follow. Our study focuses on eventsthat pointed at ethnic conflicts and human rights violations. The GDELT data is unique in the sense that it recordsevents and actors involved in them. The U.S. government had recently funded in a large-scale project, the IntegratedConflict Early Warning System (ICEWS) [22], which makes use of quantitative data and statistical methods inorder to forecast events of political instability, which include international and domestic crises, ethnic and religiousviolence, rebellion and insurgency. We have chosen the GDELT data instead of the ICEWS data to get a more globalperspective [23, 24]. The frequencies of mentions of actors and their co-mentions with others can be treated wellas proxies for their importance and influence, as well as involvement with others. The aggregate data enables usto construct a network of actors, and even finding disconnected groups. In fact, our study reveals that most eventsare disconnected in very small clusters while very large clusters of frequently engaging actors exist. One can studythe geographical localization of events and clusters to procure detailed information regarding the context, intensityand growth pattern of events [25]. Identifying important groups of actors, in terms of their intensities of activities,is important for possible intervention that may prevent the spread of such events. The probability distributions ofactor mentions, co-actor mentions and the degree of an actor have power law tails for the largest values, indicating astrong self-organizing principle behind the events. The growth properties of individual actor nodes indicate that inthe long run, very small fraction of disconnected clusters are left, while most of the actors belong to a giant connectedcomponent. The data on ethnic conflicts and human rights violations are found to be strikingly similar in terms ofstatic and dynamic properties of the network, and even in terms of network stability against failure as well as targetedattack. This may point at a very high degree of correlation between events. In fact, using a causality analysis, wecould quantitatively conclude that ethnic conflicts lead to human rights violations, while the reverse may not be true.These networks of ethnic conflicts and human rights violations reflect the negative aspects of human behavior andcooperation, and are certainly different from other social networks (friendship, collaboration, etc.) in the sense thatthere are not many triangles or closed communities, as observed in the distribution of the average clustering coefficientwith degree of the actors. This detailed scientific study of the network structure, dynamics, function and resiliencemay help policy makers, specifically in cases where there is a need for preventing the spread of ethnic tensions andconflicts. There are possibilities of similar analyses using data from online social media, e.g., Twitter, etc.We have deliberately avoided mentioning the real names and other details because the main aim of the paper was
to study the network properties and statistical regularities. Since we have just considered the reports as a proxy forthe events and not studied the reports in details to extract further information, it would not be proper to draw anyconclusions about the actors, the network relations or the detailed explanations/causes behind the conflicts or theviolations. Certainly, it would be interesting to address such sociological explanations, implications and policies inthe future. Specific studies on the geographical implications of actor networks, the geographical and socio-culturalinfluence of their robustness under possible intervention will be important issues to study.
Methods
Data acquisition and filtering
GDELT Event Database [18] contains the database of news articles from around the world in several languages,hosted through Google Cloud. Using Google BigQuery [26], it is possible to extract data for each event, having anunique time stamp, and providing the data about news about ethnic conflicts (EC) and human rights violations(HR) happening around the world spanning over a large time scale. The data contains information about a pair ofactors involved, the location information of the event, as well as latitude, longitude data of actors and the event.We procured 45, 942 events for EC and 48, 295 for HR for a 15 year period, 2001-2015. We filtered out those datafor which both actors were mentioned, along with their respective location information. The GDELT data has ahuge fraction of missing entries. We have filtered out and excluded those data rows, which have at least one entrymissing corresponding to the attributes we were interested in. Hence, after cleaning, we analyzed 28, 055 events ofEC and 36, 470 of HR. Details of the Cameo codes used for filtering the data (along with examples from the CAMEOcodebook [27]), data attributes and cleaning are provided in Supplementary Information.
8
Network construction
Given a period of time T , we construct the network of ‘connected’ actors in the following way: any two actorsA1 and A2 mentioned together in an event E1 reported at time t ∈ [t0 : t0 + T ] are ‘connected’ by a link of unitweight,where t0 is the beginning of an interval of time span T . If another event E2 within the same time windowinvolves actors A2 and A3, then A3 is connected to A2 with a link of unit weight. Thus A1 and A3 are both connectedto A2 (see Fig. 2a). Aggregating all such events over the time window T , connected components emerge, with linkweights increasing if the same pair of actors linking them appear in multiple events (actor pair mentions). Theseconnected components form a complex network of nodes (actors) and links (actor pair mentions). An actor may beco-mentioned with several other actors and thus have a larger ‘degree’, measured by the number of distinct co-actorsit has. In principle, the network aggregated over a time period can have several disconnected components or ‘clusters’.Fig. 2b shows the aggregated network for one year for actors mentioned in the EC dataset.
Acknowledgments
The authors thank L. Dey for useful comments.
9
Supplementary Information
Data Description
The data source for this quantitative analysis on the global structure of ethnic violence is GDELT, the GlobalDatabase of Events, Language, and Tone [18]. It is described [24] as “an initiative to construct a catalog of humansocietal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location,count, theme, news source, and event across the planet into a single massive network that captures what’s happeningaround the world, what its context is and who’s involved, and how the world is feeling about it, every single day”.GDELT contains data since 1979 [28]. However, the data from the year 2000 onwards is more comprehensive,
reflecting the increase in the number of news media and the frequency of event recording. The entire GDELT datasetis available as a public dataset in Google Big Query [26]. GDELT event records are stored in an expanded versionof the dyadic CAMEO format, capturing two actors and the action performed by Actor1 upon Actor2. A widearray of variables break out the raw CAMEO actor codes into their respective fields to make it easier to interactwith the data, the Action codes are broken down into a hierarchical structure, a score describing the intensity ofconflict or cooperation is provided, an average tone score is provided for all coverage of the event, several indicatorsof “importance” based on media attention are provided, and an unique array of geo-referencing fields offer estimatedlandmark-centroid-level geographic positioning of both actors and the location of the action.However, for the purpose of this quantitative analysis we have extracted the data for the cameo codes 203 (engage
in ethnic cleansing) for Ethnic Conflicts (EC) and 092 (investigate human rights abuses), 1122 (accuse of humanrights abuses) for Human Rights Violations (HR) from the years 2001 to 2015.As mentioned in the CAMEO codebook [27], for the corresponding entries, we reproduce below the cited examples:
• Cameo code 203 for EC: Serb forces were engaged in ethnic cleansing in Kosovo against the majority Albanianpopulation of the province, according to the US government.
• Cameo code 092 for HR: Israel’s high court opened a landmark hearing Wednesday into the legality of secretinterrogation techniques used against Palestinian detainees.
• Cameo code 1122 for HR: Human rights watchdog Amnesty International accused the United States of vio-lating human rights, ignoring international law and sending a “permissive signal to abusive governments”.
Each data entry with a GLobalEventID had the following attributes [27]:
• Actor1Code, Actor2Code: The complete raw CAMEO code for Actor1 and Actor2 (includes geographic, class,ethnic, religious, and type classes). It may be blank if the system was unable to identify any actor.
• Actor1Name, Actor2Name: The actual name of the Actor1 and Actor2. In the case of a political leader ororganization, this will be the leader’s formal name (e.g., GEORGE W BUSH, UNITED NATIONS), for ageographic match it will be either the country or capital/major city name (e.g., UNITED STATES / PARIS),and for ethnic, religious, and type matches it will reflect the root match class (e.g., KURD, CATHOLIC, POLICEOFFICER, etc). It may be blank if the system was unable to identify an actor.
• Actor1Geo ADM1Code, Actor2Geo ADM2Code: This is the 2-character FIPS10-4 country code followed by the 2-character FIPS10-4 administrative division 1 (ADM1) code for the administrative division housing the landmark.
• ActionGeo FullName: Location of Event. This is the full human-readable name of the matched location.
• ActionGeo CountryCode: Location of Event. This is the 2-character FIPS10-4 country code for the location.
• ActionGeo ADM1Code: Location of Event. This is the 2-character FIPS10-4 country code followed by the 2-character FIPS10-4 administrative division 1 (ADM1) code for the administrative division housing the landmark.
• ActionGeo Lat, ActionGeo Long: This is the centroid latitude and longitude of the landmark for mapping.
• SQLDATE: Date the event took place in YYYYMMDD format.
There may be actors with similar ActorCode at different locations. So, for uniquely identifying each actor we haveconcatenated Actor1Code with Actor1Geo ADM1Code and Actor2Code with Actor2Geo ADM2Code. The data acquiredfrom queries had to further cleaned for missing entries. Each event entry mentions a pair of unique actors, but we alsofound rare instances (< 0.5%) where only one actor has been identified. Any row of data with missing actor names,
10
actor codes or location data were removed to created the working data set. The number of events were 45, 942 for ECand 48, 295 for HR and this was reduced to 28, 055 for EC and 36, 470 for HR after filtering. On actual inspectionof the data, we found that a very small fraction of entries do not actually report an actual case of ethnic conflict orhuman rights violations, yet contain the relevant keywords that dicusss the issues in a positive tone (e.g. absence ofethnic violence or human right violations etc.).
The structure of the network
10-4
10-3
10-2
10-1
100
100 101 102
Qa(
m)
m
EC, Actor mentions
(a)
200120022003200420052006200720082009201020112012201320142015
10-4
10-3
10-2
10-1
100
100 101 102
m
HR, Actor mentions
(b)
200120022003200420052006200720082009201020112012201320142015
10-4
10-3
10-2
10-1
100
100 101 102
Qab
(w)
w
EC, Actor pair mentions
(c)200120022003200420052006200720082009201020112012201320142015
10-4
10-3
10-2
10-1
100
100 101 102
w
HR, Actor pair mentions
(d)200120022003200420052006200720082009201020112012201320142015
10-4
10-3
10-2
10-1
100
100 101 102
Q(k
)
k
EC, Actor degree
(e)
200120022003200420052006200720082009201020112012201320142015
10-4
10-3
10-2
10-1
100
100 101 102
k
HR, Actor degree
(f)
200120022003200420052006200720082009201020112012201320142015
FIG. S7: Plots of the cumulative probability (CCDF) Qa(m) an actor is mentioned m times or more, for (a) EC and (b) HR;Qab(w) that an actor pair is mentioned w times or more for (c) EC and (d) HR, and Q(k) that an actor is co-mentioned withk actors or more for (e) EC and (f) HR, for each year in the period 2001-2015. The actual fits are given in Table S1.
11
100
101
102
103
104
100 101 102 103
Men
tion
m
Degree k
(a)
EC raw dataEC binned data
fit100
101
102
103
104
100 101 102 103
Men
tion
m
Degree k
(b)
HR raw dataHR binned data
fit
FIG. S8: Scatter plots showing the degree (k) versus mention (m) (red) for each actor, in the networks for (a) EC and (b) HR.The raw data (red) are then log-binned and plotted as black filled circles. The dashed lines are the best fits obtained usingordinary least squares for the log-binned data (black filled circles): m ∝ kγ with γ measured to be 1.22 ± 0.01 for EC and1.16±0.02 for HR. The slope of each best fit line implies strong correlation between the degree (k) and mention (m) for actors,as also reflected by the ratios of the power-law exponents of the probability distributions of the respective variables in Fig. 3:ν3/ν1 ≈ 1.23 for EC and ≈ 1.16 for HR, compares well to corresponding values of γ.
TABLE S1: Computed exponents power law distribution fits for (i) actor mentions, ν1 (ii) actor pair mentions, ν2 and (iii)degree, ν3 from the data for Ethnic Conflicts (EC) and Human Rights Violations (HR) for different years (2001-2015) as wellas for the 15 year aggregate.
YearActor pair mention Actor mention Actor degreeEC HR EC HR EC HR
2001 2.61 ± 0.07 3.47 ± 0.35 1.56 ± 0.05 1.79 ± 0.04 2.13 ± 0.05 2.01 ± 0.062002 3.47 ± 0.23 3.55 ± 0.22 1.90 ± 0.06 2.06 ± 0.07 2.44 ± 0.07 2.16 ± 0.092003 3.04 ± 0.37 3.32 ± 0.35 1.71 ± 0.03 2.03 ± 0.07 2.01 ± 0.05 2.15 ± 0.042004 2.53 ± 0.11 3.81 ± 0.39 1.42 ± 0.02 1.90 ± 0.04 1.66 ± 0.04 2.06 ± 0.032005 3.35 ± 0.24 3.48 ± 0.18 2.15 ± 0.11 2.08 ± 0.05 2.29 ± 0.15 2.16 ± 0.102006 3.25 ± 0.20 3.14 ± 0.17 2.15 ± 0.07 1.99 ± 0.08 2.24 ± 0.09 2.30 ± 0.072007 3.04 ± 0.18 3.04 ± 0.14 1.64 ± 0.04 1.86 ± 0.07 1.85 ± 0.05 2.25 ± 0.052008 2.27 ± 0.07 2.28 ± 0.13 1.43 ± 0.03 1.74 ± 0.04 1.69 ± 0.03 2.11 ± 0.052009 2.07 ± 0.17 2.97 ± 0.09 1.50 ± 0.02 1.54 ± 0.03 1.93 ± 0.04 1.69 ± 0.052010 1.47 ± 0.10 2.61 ± 0.15 1.30 ± 0.03 1.49 ± 0.02 1.71 ± 0.03 1.62 ± 0.042011 1.67 ± 0.07 2.81 ± 0.07 1.32 ± 0.01 1.38 ± 0.03 1.62 ± 0.04 1.56 ± 0.032012 2.86 ± 0.13 2.60 ± 0.13 1.54 ± 0.02 1.52 ± 0.02 1.80 ± 0.03 1.72 ± 0.042013 1.92 ± 0.07 1.41 ± 0.05 1.40 ± 0.02 1.21 ± 0.02 1.86 ± 0.02 1.57 ± 0.022014 1.83 ± 0.04 2.48 ± 0.05 1.29 ± 0.01 1.37 ± 0.01 1.69 ± 0.02 1.60 ± 0.032015 1.82 ± 0.06 2.06 ± 0.07 1.31 ± 0.01 1.45 ± 0.01 1.68 ± 0.02 1.64 ± 0.01
2001-2015 1.58 ± 0.01 1.74 ± 0.03 1.23 ± 0.01 1.28 ± 0.01 1.52 ± 0.01 1.48 ± 0.01
12
Clusters
TABLE S2: Table showing the number of actors and the actors in the largest connected cluster for the different data sets:Ethnic Conflicts (EC) and Human Rights Violations (HR) for different years (2001-2015) as well as for the 15 year aggregate.
YearEC HR
Total nodesLargest
cluster sizeTotal nodes
Largestcluster size
2001 540 193 917 3332002 396 104 636 722003 355 117 788 1842004 633 329 823 2852005 263 48 714 1102006 451 56 907 1832007 721 288 1,102 3092008 1,279 696 1,375 4512009 1,204 544 2,267 1,0112010 951 474 1,887 9542011 1,412 746 2,719 1,5912012 1,644 899 2,423 1,1422013 1,728 892 2,707 1,5762014 2,560 1,620 3,447 2,1522015 3,432 2,229 3,816 2,322
2001-2015 10,394 7,875 15,899 12,106
The size of the largest cluster is computed as the maximum number of nodes s1 in the giant component or largestsubgraph and shown in Supplementary Table. S2 and the variation of s1 with the size of the entire network is shown inFig. S9, and the asymptotic fit is found to be s1 ∼ N δ. The degree distribution of the largest cluster / giant componenthas a power law tail. The power law exponents for the asymptotic fits are given in Supplementary Table. S3.
101
102
103
104
105
102 103 104 105
size
of l
arge
st c
lust
er s
1
total actors N
ECHR
EC fit, δ=1.17HR fit, δ=1.19
FIG. S9: Variation of the size of the largest cluster s1 with network size N , for different years (2001-2015) and well as 15 yearaggregate for EC and HR. The power law fits to s1 ∼ Nδ are δ = 1.17 ± 0.01 for EC and 1.19 ± 0.02 for HR.
13
10-3
10-2
10-1
100
100 101 102 103
Q(s
)
cluster size s
(a)
EC10-3
10-2
10-1
100
100 101 102 103
cluster size s
HR
(b)20012002200320042005200620072008
2009201020112012201320142015
FIG. S10: Plots of the cumulative probability (CCDF) Q(s) that there is a cluster of size larger than s, for different years(2001-2015) for (a) EC and (b) HR.
TABLE S3: Computed exponents ν3 of power law distribution to asymptotic fits to the degree distribution Q(k) ∼ k−ν3 forthe largest connected cluster (giant component) for the two categories of data – Ethnic Conflicts (EC) and Human RightsViolations (HR) for individual years and the 15 year aggregate (2001-2015).
Year EC HR2001 1.99 ± 0.06 1.71 ± 0.072002 1.84 ± 0.08 1.65 ± 0.052003 1.84 ± 0.08 1.56 ± 0.052004 1.58 ± 0.05 1.83 ± 0.062005 1.75 ± 0.29 1.48 ± 0.092006 1.50 ± 0.05 1.97 ± 0.082007 1.71 ± 0.06 1.84 ± 0.042008 1.64 ± 0.04 1.81 ± 0.082009 1.88 ± 0.05 1.56 ± 0.042010 1.62 ± 0.04 1.49 ± 0.032011 1.50 ± 0.03 1.41 ± 0.042012 1.75 ± 0.03 1.60 ± 0.042013 1.83 ± 0.02 1.53 ± 0.022014 1.67 ± 0.03 1.57 ± 0.042015 1.69 ± 0.02 1.60 ± 0.01
2001-2015 1.51 ± 0.01 1.48 ± 0.01
Biggest cluster: clustering coefficient
To find the cohesion of ethnic conflict and human rights violation, we computed the clustering coefficient of both(5 years aggregated) networks, which shows the extents to which the nodes of a network are closely connected withone another. The clustering coefficient for a node i in an undirected graph is computed as: Ci =
2Ni
ki(ki−1) , where ki
is the degree of the node i, and Ni is the number of links between the neighbors of i. We plot the average clusteringcoefficient of a node with degree k in Fig. S11, and find that higher degree nodes are less clustered compared to lowdegree nodes.
14
10-3
10-2
10-1
100
100 101 102 103<
C(k
)>
k
ECHR
FIG. S11: Plot of average clustering coefficient 〈C(k)〉 with degree k of the biggest cluster for the two categories of data – ECand HR for the 15 years aggregate (2001-2015). The data for all sets exhibit a slow decay with degree.
Dynamics of network growth
10-3
10-2
10-1
100
101
102
100 101 102 103
π(k)
degree k
EC
(a)
ISR_IS00PSE_IS00
ISR_ISARM_AMRUS_RSALB_ALPSE_IS
RUS_RS48AZE_AJ09
AZE_AJm1
10-3
10-2
10-1
100
101
102
103
100 101 102 103
π(k)
degree k
HR
(b)
USA_USDCUSA_US
RUS_RS48IRN_IR26
GBR_UKH9IRN_IR
RUS_RSGBR_UKISR_IS00
CHN_CH22m1
FIG. S12: Plots showing the cumulative growth rates π(k) for degree k for (A) EC and (B) HR datasets. The curves fit toπ(k) ∼ kα with α > 1. The precise fitting exponents are given in Table. S4.
TABLE S4: Computed exponents of asymptotic power law fits for the cumulative growth rates π(x) ∼ xα for the top ten highestcounts for (i) actor mentions, (ii) degree, for the 2 categories of data – Ethnic Conflicts (EC) and Human Rights Violations(HR) for the 15 year aggregate.
EC HRActor Degree Mention Actor Degree Mention
ALB AL 0.86 ± 0.01 1.04 ± 0.01 CHN CH22 0.71 ± 0.02 1.11 ± 0.01ARM AM 1.55± 0.02 1.51± 0.01 GBR UK 1.11± 0.02 1.33± 0.01AZE AJ 0.50± 0.02 0.96± 0.01 GBR UKH9 1.68± 0.02 1.79± 0.01AZE AJ09 0.96± 0.04 1.28± 0.01 IRN IR 0.57± 0.01 1.10± 0.01ISR IS 0.89± 0.02 1.23± 0.01 IRN IR26 1.58± 0.03 1.47± 0.01ISR IS00 1.26± 0.01 1.38± 0.01 ISR IS00 1.86± 0.02 1.71± 0.01PSE IS 1.05± 0.02 1.31± 0.01 RUS RS 1.27± 0.02 1.41± 0.01PSE IS00 1.82± 0.02 1.40± 0.01 RUS RS48 1.04± 0.02 1.22± 0.01RUS RS 0.47± 0.01 0.92± 0.01 USA US 1.33± 0.02 1.40± 0.01RUS RS48 0.53± 0.02 0.79± 0.01 USA USDC 1.29± 0.01 1.35± 0.01
15
Measuring causality
Let us consider the two random variables depicting counts of EC (Ethnic Conflicts mentions) and HR (HumanRights Violation mentions). To say that EC causes HR, Granger causality [21] computes a regression of variable HRon the past values of itself and the past values of EC and then tests the significance of coefficient estimates associatedwith EC.We consider a bivariate linear autoregressive model on EC and HR, and assume the L.H.S. to be dependent on the
history of EC and HR,
HRt = a0 + a1ECt−1 + . . .+ ahECt−h + b1HRt−1 + . . .+ bhHRt−h + Et (1)
where h is the maximum number of lagged observations (for both EC and HR). The coefficients ai, bi are thecontributions of each lagged observation to the predicted value of ECt−i and HRt−i respectively while Et is theprediction error.If b1 = b2 = . . . = bh = 0, we call it a null hypothesis HR0 which implies that EC does not cause HR. In other
words the coefficients of EC are not significant enough to cause HR. But if the null hypothesis gets rejected we saythat the coefficients of EC are significant enough to cause HR.For testing this significance of the coefficients, we compute the p-value. If the p-value is less that 0.05 one can reject
the null hypothesis, and hence conclude that EC causes HR (HR ∼ EC).Applying the above process on the EC and HR year wise mentions, we found that:
• On testing EC ∼ HR for h = 4, we find p ≃ 0.721, and the null hypothesis cannot be rejected. Hence, onecannot conclude that HR causes EC.
• On testing HR ∼ EC for h = 4, we find p ≃ 0.029, and the null hypothesis can be rejected. Hence, we can saythat EC causes HR.
Applying the above process on the EC and HR month wise mentions, we found that:
• On testing EC ∼ HR for h = 5, we find p ≃ 0.193 and thus the null hypothesis cannot be rejected. Hence, wecannot say that HR causes EC.
• On testing HR ∼ EC for h = 5, we find p ≃ 0.036, and thus the null hypothesis can be rejected. Hence, we cansay that EC causes HR.
Hence, we can definitely conclude from the above quantitative analysis that Ethnic conflicts cause Human Rightsviolations.
[1] M. E. J. Newman, A.-L. Barabasi, and D. J. Watts. The structure and dynamics of networks. Princeton Univ. Press,Princeton, 2006.
[2] R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Rev. Mod. Phys., 74(1):47–97, 2002.[3] A.-L. Barabasi. Network Science. Cambridge Univ. Press, Cambridge, 2016.[4] D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann,
T. Jebara, G. King, M. Macy, D. Roy, and M. Van Alstyne. Computational social science. Science, 323(5915):721–723,2009.
[5] C. Castellano, S. Fortunato, and V. Loreto. Statistical physics of social dynamics. Rev. Mod. Phys., 81:591–646, May 2009.[6] P. Sen and B. K. Chakrabarti. Sociophysics: An Introduction. Oxford University Press, Oxford, 2013.[7] M. D. Toft. The geography of ethnic violence: Identity, interests, and the indivisibility of territory. Princeton Univ. Press,
Princeton, 2005.[8] J. Vorrath, L. F. Krebs, and D. Senn. Linking ethnic conflict & democratization: An assessment of four troubled regions.
National Center of Competence in Research, Challenges to Democracy in the 21st Century, Working Paper, (6), 2007.[9] O. N. T. Thoms and J. Ron. Do human rights violations cause internal conflict? Human Rights Quarterly, pages 674–705,
2007.[10] P Holme and J Saramaki, editors. Temporal networks. Springer, 2013.[11] J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabasi. Structure and tie
strengths in mobile communication networks. Proc. Nat. Acad. Sci., 104(18):7332–7336, 2007.[12] C. Cattuto, W. Van den Broeck, A. Barrat, V. Colizza, J.-F. Pinton, and A. Vespignani. Dynamics of person-to-person
interactions from distributed RFID sensor networks. PloS ONE, 5(7):e11596, 2010.[13] L. F. Richardson. Statistics of deadly quarrels. Boxwood Press, Pittsburgh, 1960.
16
[14] A. Clauset, M. Young, and K. S. Gleditsch. On the frequency of severe terrorist events. J. Conflict Resol., 51(1):58–87,2007.
[15] O. Becerra, N. Johnson, P. Meier, J. Restrepo, and M. Spagat. Natural disasters, casualties and power laws: A comparativeanalysis with armed conflict. In Proc. Ann. Meet. Am. Political Sc. Assoc. Citeseer, 2012.
[16] A. Chatterjee and B. K. Chakrabarti. Fat tailed distributions for deaths in conflicts and disasters. Rep. Adv. Phys. Sc.,1:1740007, 2017.
[17] M. Lim, R. Metzler, and Y. Bar-Yam. Global pattern formation and ethnic/cultural violence. Science, 317(5844):1540–1544, 2007.
[18] The GDELT Project, retreived October, 2016. www.gdeltproject.org/.[19] A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM review, 51(4):661–703,
2009.[20] R. Albert, H. Jeong, and A.-L. Barabasi. Error and attack tolerance of complex networks. Nature, 406(6794):378–382,
2000.[21] C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica,
37:424–438, 1969.[22] M. D. Ward, A. Beger, J. Cutler, M. Dickenson, C. Dorff, and B. Radford. Comparing GDELT and ICEWS Event Data.
Analysis, 21:267-297, 2013.[23] B. Arva, J. Beieler, B. Fisher, G. Lara, P. A. Schrodt, W. Song, M. Sowell and S. Stehle. Improving Forecasts of
International Events of Interest. Working paper, 2013.[24] K. Leetaru and P. A. Schrodt. GDELT: Global data on events, location, and tone, 1979–2012. In ISA Ann. Convention,
volume 2. Citeseer, 2013.[25] B. Gupta, G. Sehgal, K. Sharma, G. Sharma, A. Chatterjee, A. Chakraborti, and G Shroff. Spatial analysis of networks
of ethnic conflicts and humans rights violations. in preparation.[26] World’s largest event dataset now publicly available in BigQuery. https://cloudplatform.googleblog.com/2014/05/worlds-largest-event-dataset-now-publicly-available-in-google-bigquery.html.[27] GDELT - DATAFORMATCODEBOOKV 1.03, as on 8/25/2013. http://data.gdeltproject.org/documentation/GDELT-Data_Format_Codebook.pdf.[28] J Dana Stuster. Mapped: Every protest on the planet since 1979. ForeignPolicy. com, 2013.
Recommended