Upload
hiroko-onari
View
7.517
Download
1
Tags:
Embed Size (px)
DESCRIPTION
suicide ideation of individuals in online social networks
Citation preview
Suicide ideation of individuals in online social networks
N. Masuda, I. Kurahashi and H. Onari, arXiv:1207.2548, 2012
Hiroko Onari#TokyoWebmining
26th of August, 2012
What is a social network?
• graph that represents relationships (ties, links) between independent users (nodes)
Directed networkswhere ties have directione.g.) online directed networks
- Google+
- YouTube
- Flickr
Undirected networkswhere ties have no directione.g.) online undirected networks
- mixi
- skype
What is suicide?- association with social isolation -
• Suicide is defined as all cases of death resulting directly or indirectly from a positive (e.g., shooting oneself) or negative (e.g., refusing to eat) act of the victim himself, which he knows will produce this result. [Durkheim, 1951]
• Suicide is not an individual act nor a personal action. The force, which determines the suicide, is not psychological but social. Suicide is the result of social disorganization or lack of social integration or social solidarity. [Durkheim, 1951] => Social Isolation
Social network analysis on suicide & social isolation
• A small number of friends and a small fraction of triangles to which an individual belongs significantly contribute to suicide ideation of social isolation. (by study the relationship between suicidal behavior and egoentric social networks among adolescents)[Bearman & Moody, 2004] [Cui et al., 2010]
• The paucity of triangles, or intransitivity also characterizes social isolation. [Wasserman & Faust, 1994] [Bearman & Moody, 2004]
• Individuals without triangles are considered to lack membership to social group even if they have many friends. [Krackhardt, 1999]
Social Statistics by OECD
Suicide ratesper 100,000 persons per year
Suicide ratesper 100,000 persons, 1960 - 2006
0
5
10
15
20
25
30
35
40
45
1960
19
62 19
64 19
66 19
68 19
70 19
72 19
74 19
76 19
78 19
80 19
82 19
84 19
86 19
88 19
90 19
92 19
94 19
96 19
98 20
00 20
02 20
04 20
06
Denmark Greece Hungary Ireland Japan Switzerland OECD average
Japan
Japanʼ’s suicide rate per 100,000 persons is higher than any other OECD country.
Research Questions
• From the perspective of network science, can we observe indications for reducing suicide by the quantitative analysis in online social network?
• Can we say that online social network reflect real personal relationship?
Data
• Social Network of registered users from mixi as of March 2012.
• More than user-defined communities on various topics in mixi as of April 2012. A community is a group of users that have a common interest such as hobby. The user-defined community is distinctive feature which other major SNSs like Facebook do not have.
• mixi is a major SNS in Japan, and it launched in 2004.
4.5× 106
2.7× 107
Analysis EnvironmentTokyo Cabinet Analysis Computer
edge listID, friendʼ’s ID1, 21, 32, 12, 42, 5
clustering coefficientID, Clustering Coefficient1, 0.42, 0.2
Perlpersonal info.community info.
result report
We calculated clustering coefficient in Tokyo Cabinet which is a library of routines for a managing database and is contained Key-Value store. As API for Tokyo Cabinet, Perl language was used. Data analysis was implemented in R.Irrelevant private information was deleted, and relevant information was encrypted. We conducted all analysis in Tokyo HQ office of mixi using a computer that is not connected to Internet.
Tokyo CabinetID: friendʼ’s IDs1: 2, 32: 1, 4, 5
Sampling Procedure (1/2)• Suicide seed sample (9990 users): Selected 4 communities which are related with suicide as the following criteria;
(1) the name of user-defined communities includes the word “suicide” (“jisatsu” in Japanese)(2) at least 1000 members(3) at least 100 comments posted for each topic(4) at least 3 independent topics on which comments were made on October, 2011(5) the admission to join community is open to public
* excluded communities which concentrated on the method of committing suicide and encouraged members to live with hopes * discarded users with 0 or 1 friend on mixi
Then, sampled 9990 active users that existed as of January 23, 2012 and logged on to mixi in more than 20 days per month on average from August through December 2011 from the suicidal communities
Sampling Procedure (2/2)• Depression seed sample (24410 users): Selected 7 communities which are related with depression by the way of the similar criteria with suicide seed sampling. The difference is the name of user-defined communities includes the word “depression” (“utsu” in Japanese) Sampled 24410 active users from the depression communities
• Control seed sample (228949 users): Random sample of active users that who had at least 2 friends, and did not join the suicidal communities and the depression-related communities
Measurements of a social network
• The following network indications were adopted.- degree- degree distribution- clustering coefficient- homophily
Degree
• Degree is the number of neighbors (i.e., Friends), and denoted by for user . A small degree is an indicator of social isolation.
ki
i
Degree distribution
• It is known that the degree distribution of human relationship are long tailed. Most people have a relatively small degree, but a few people have very large degree, being connected to many other people.
Clustering coefficient• Clustering coefficient is a measure of the number of triangles in a network. In social networks, clustering coefficient is large, the user is considered to be embedded in close-knit social groups (Wasserman & Faust, 1994; Watts & Strogattz,1998; Newman, 2010). A small value is an indicator of social isolation.
• Clustering coefficient can be measured in two ways: “global clustering coefficient” (often called “transitivity”) and “local clustering coefficient”. The global measure gives an overall indication of the clustering in the network, whereas the local measure gives an indication of the embeddedness of single nodes.
• In our research, we use “local clustering coefficient”.* Local clustering coefficient is often used in network science (complex network), and the global value is often used in sociology.
Local clustering coefficientfor undirected networks
• The local clustering coefficient for each vertex(user) is defined by
* By definition, . * is degree of the user . * The user who have 0 or 1 friend ( ) should/can be removed.
• The average of local clustering coefficient is defined by
* By definition, . * is the total number of users in the network except .
Ci ≡number of triangles connected to vi
ki(ki − 1)/2.
N
vi
ki
ki = 0 or 1
Ci
vi
ki = 0 or 1
C ≡ 1N
N�
i=1
Ci.
0 ≤ C ≤ 1
0 ≤ Ci ≤ 1
Degree and clustering coefficient
• The influence of and need to be distinguished carefully. Here is an example. There are two people with 5 friends, but the different number of links.
ki Ci
ki = 5Ci =
05(5− 1)/2
= 0 Ci =3
5(5− 1)/2=
310
ki = 5
Degree and clustering coefficient• Each data point for degree is obtained by averaging over the users in a group with degree . Large fluctuations of at large values are caused by the paucity of users having large . decreases with in many networks (Newman, 2010).
CiC(k)
k
k
C(k)
Cik
Homophily
• Similar individuals are more likely to become friends. It is called “homophily”. In this study, we adopt the fraction of neighbor with suicide ideation.
• It should be noted that, if a user has relatively many friends with suicide ideation, it does not necessarily imply that suicide is contagious. Homophily may be a cause of such assortativity.
• FYI: There is some research to differentiate the effect of influence and homophily (Aral et al., 2009; Shalizi & Thomas, 2011)*My presentation on the effect of influence and homphily based on Aralʼ’s paper in slideshare. http://www.slideshare.net/hirokoonari/ss-13221508
Homophily• In this study, users in suicide group has more comparatively similar friends than users in control group. The same tendency can be said for users in depression-related group.
Independent variablesPersonal variablesPersonal variables
Age
Gender
Local network variablesLocal network variables
degree
local clustering coefficient
Homophily
number of neighbors (friends)
undirected clustering coefficient
number of neighbors who join the suicide / depression community
Behavioral variables in mixiBehavioral variables in mixi
Community number
Registration period
number of communities which a user join
number of days between the registration date and Jan. 23, 2012
Statistical models• Univariate and multivariate logistic regressions: estimating the likelihood of belonging to a suicidal or a depressive community
• VIF (variance inflation factor): checking the multicollinearity between independent variables to justify the use of the multivariate logistic regression. The recommended VIF value is smaller than 10 (preferably smaller than 5).
• Pearson, Spearman, and Kendall correlation coefficients: measuring correlation between the independent variables
• AUC (area under the receiver operating characteristic curve): quantifying the explanatory power of the logistic model. The AUC value falls between 0.5 and 1. A large AUC value indicates that logistic regression fits well.
Univariate statistics of independent variables for the suicide and control groups
Variable
Suicide group
(N = 9, 990)
Control group
(N = 228, 949)
Mean±SDRange
(min,max)Mean±SD
Range
(min,max)
p-value
Age 27.4±10.3 (17, 97) 27.7±9.2 (14, 96) 0.000652
Community number 283.7±284.3 (1, 1000) 46.3±79.4 (1, 1000) < 0.0001
ki 82.9±98.7 (2, 1000) 65.8±67.6 (2, 1000) < 0.0001
Ci 0.087±0.097 (0, 1) 0.150±0.138 (0, 1) < 0.0001
Homophily (suicide) 0.0110±0.0329 (0, 1.000) 0.0012±0.0080 (0, 0.667) < 0.0001
Registration period 1235.7±638.9 (122, 2878) 1333.5±670.5 (102, 2891) < 0.0001
Gender (female) 5,786 (57.9%) 126,941 (55.4%) < 0.0001
No. suicidal communities 1.20±0.51 (1, 4) N/A N/A N/A
No. login days 28.9±4.4 (1, 31) 26.9±6.3 (1, 31) < 0.0001
Multivariate logistic regression of suicide ideation on individual and network variables
Variable OR CI p-value VIFAge 1.00463 (1.00211, 1.00716) 0.000313 1.091
Gender (female = 1) 0.821 (0.783, 0.861) < 0.0001 1.028Community number 1.00733 (1.00720, 1.00747) < 0.0001 1.197
ki 0.99790 (0.99758, 0.99821) < 0.0001 1.156Ci 0.0093 (0.0069, 0.0126) < 0.0001 1.081
Homophily (suicide) 2.22× 1012 (0.57× 1012, 8.65× 1012) < 0.0001 1.016Registration period 0.999383 (0.999346, 0.999420) < 0.0001 1.135
* OR: odds ratio; CI: 95% confidence interval; VIF: variance inflation factorAUC 0.873
More likely to belong to the suicide group than control group on average; - A one-year older user is 1.00463 times - Being female is 1.00463 times - Membership to one community is 1.00733 times - Having one friend is 0.99790 times - An increase in Ci by 0.01 is 0.0093^0.01 = 0.95 times - An increase in the fraction of friends in the suicide group by 0.01 is (2.22 × 10^12)^0.01 = 1.33 times - One day of the registration period is 0.999383 timesAUC is large, so this logistic regression fits well.
Correlation coefficients between pairs of independent variables
Variable 1 Variable 2Control Suicide Depression
P S K P S K P S K
Age Gender −.053 −.026 −.022 −.094 −.137 −.116 −.166 −.174 −.145
Age Community number −.032 .023 .015 −.045 −.105 −.073 −.089 −.131 −.091
Age ki −.279 −.385 −.271 −.103 −.224 −.157 −.168 −.268 −.187
Age Ci .041 −.152 −.111 −.048 −.220 −.154 −.092 −.273 −.192
Age Homophily (suicide) −.011 −.090 .074 .031 −.037 −.029 N/A N/A N/A
Age Homophily (depression) −.007 −.083 −.066 N/A N/A N/A .166 .121 −.089
Age Registration period .278 .460 .337 .159 .356 .259 .203 .364 .266
Gender Community number .110 .116 .095 .205 .204 .166 .086 .083 .068
Gender ki .015 .014 .011 .048 .046 .038 .048 .046 .038
Gender Ci −.084 −.085 −.069 −.109 −.097 −.080 −.061 −.030 −.024
Gender Homophily (suicide) −.012 −.017 −.017 −.007 .031 .028 N/A N/A N/A
Gender Homophily (depression) .000 .009 .008 N/A N/A N/A −.053 −.021 −.018
Gender Registration period .025 .025 .020 −.064 −.061 −.050 −.078 −.079 −.065
Community number ki .375 .372 .258 .348 .338 .231 .375 .360 .248
Community number Ci −.376 −.399 −.277 −.231 −.200 −.136 −.201 −.171 −.116
Community number Homophily (suicide) .027 .113 .091 −.034 .140 .105 N/A N/A N/A
Community number Homophily (depression) .038 .166 .132 N/A N/A N/A −.150 .034 .025
Community number Registration period .339 .338 .230 .166 .152 .102 .187 .172 .115
ki Ci −.363 −.248 −.175 −.251 −.116 −.085 −.240 −.105 −.074
ki Homophily (suicide) −.013 .191 .150 −.175 .174 .107 N/A N/A N/A
ki Homophily (depression) −.027 .254 .188 N/A N/A N/A −.210 .076 .029
ki Registration period .102 .081 .055 .170 .154 .103 .172 .152 .101
Ci Homophily (suicide) −.026 −.100 −.080 −.047 −.213 −.162 N/A N/A N/A
Ci Homophily (depression) −.031 −.145 −.114 N/A N/A N/A −.055 −.243 −.182
Ci Registration period −.221 −.249 −.168 −.143 −.112 −.162 −.133 −.099 −.068
Homophily (suicide) Registration period −.039 −.031 −.025 −.104 −.059 −.044 N/A N/A N/A
Homophily (depression) Registration period −.024 .011 .009 N/A N/A N/A −.120 −.049 −.036
* P: Pearson; S: Spearman, K: Kendall correlation coefficients* > 0.2
These correlation coefficients are sufficiently small.
Univariate logistic regression of suicide ideation on individual and network variables
* OR: odds ratio; CI: 95% confidence interval; AUC: area under the curve
Variable OR CI p-value AUCAge 0.99604 (0.99377, 0.99832) 0.000651 0.515
Gender (female = 1) 1.106 (1.062, 1.152) < 0.0001 0.512Community number 1.00728 (1.00716, 1.00741) < 0.0001 0.867
ki 1.00259 (1.00237, 1.00280) < 0.0001 0.549Ci 0.000581 (0.000428, 0.000789) < 0.0001 0.690
Homophily (suicide) 1.57× 1016 (0.41× 1016, 6.08× 1016) < 0.0001 0.643Registration period 0.999783 (0.999753, 0.999813) < 0.0001 0.545
- The community number makes by far the largest contribution among the seven independent variables.- The second largest explanatory power is the AUC 0.690 of clustering coefficient. This result is consistent with the previous one (Bearman & Moody, 2004). - The third largest explanatory power is the AUC 0.643 of homophily.
Conclusions• Online social behavior of users rather than demographic properties. The below factors contribute to suicide ideation by the largest amounts- increase in the community number- decrease in the local clustering coefficient- increase in the homophily variable
• The age and gender little influence suicide ideation is inconsistent with previous findings (Wray et al., 2011).
• The degree little explains suicide ideation is inconsistent with previous studies (Bearman & Moody, 2004; Cui et al., 2010).
• User-defined communities of mix cover virtually all major topics. As a future study, applying the present methods can be profitable.
Appendix- Analysis of depressive symptoms -
Univariate statistics of independent variables for the depression and control groups
Variable
Depression group
(N = 24, 410)
Control group
(N = 228, 949)
Mean±SDRange
(min,max)Mean±SD
Range
(min,max)
p-value
Age 28.8±9.4 (16, 97) 27.7±9.2 (14, 96) < 0.0001
Community number 249.6±263.1 (1, 1000) 46.3±79.4 (1, 1000) < 0.0001
ki 81.9±88.1 (2, 1000) 65.8±67.6 (2, 1000) < 0.0001
Ci 0.085±0.089 (0, 1) 0.150±0.138 (0, 1) < 0.0001
Homophily (depression) 0.0196±0.0501 (0, 1.000) 0.0031±0.0131 (0, 0.667) < 0.0001
Registration period 1389.4±659.2 (122, 2885) 1333.5±670.5 (102, 2891) < 0.0001
Gender (female) 16,872 (69.1%) 126,941 (55.4%) < 0.0001
No. suicidal communities 1.16±0.47 (1, 6) N/A N/A N/A
No. login days 28.8±4.4 (1, 31) 26.9±6.3 (1, 31) < 0.0001
Multivariate logistic regression of depressive symptoms on individual and network variables
* OR: odds ratio; CI: 95% confidence interval; VIF: variance inflation factor
Variable OR CI p-value VIFAge 1.0141 (1.0124, 1.0158) < 0.0001 1.104
Gender (female = 1) 1.532 (1.481, 1.585) < 0.0001 1.019Community number 1.00790 (1.00778, 1.00803) < 0.0001 1.155
ki 0.99833 (0.99810, 0.99856) < 0.0001 1.154Ci 0.0145 (0.0118, 0.0178) < 0.0001 1.079
Homophily (depression) 1.98× 1010 (0.99× 1010, 4.02× 1010) < 0.0001 1.022Registration period 0.999744 (0.999720, 0.999769) < 0.0001 1.117
AUC 0.866
Univariate logistic regression of depressive symptoms on individual and network variables
* OR: odds ratio; CI: 95% confidence interval; AUC: area under the curve
Variable OR CI p-value AUCAge 1.0110 (1.0097, 1.0123) < 0.0001 0.551
Gender (female = 1) 1.799 (1.748, 1.850) < 0.0001 0.568Community number 1.00826 (1.00814, 1.00837) < 0.0001 0.860
ki 1.00258 (1.00243, 1.00274) < 0.0001 0.566Ci 0.000415 (0.000338, 0.000509) < 0.0001 0.692
Homophily (depression) 2.12× 1012 (1.05× 1012, 4.28× 1012) < 0.0001 0.658Registration period 1.000126 (1.000106, 1.000145) < 0.0001 0.522