31
Suicide ideation of individuals in online social networks N. Masuda, I. Kurahashi and H. Onari, arXiv:1207.2548 , 2012 Hiroko Onari #TokyoWebmining 26th of August, 2012

Suicide ideation of individuals in online social networks tokyo webmining

Embed Size (px)

DESCRIPTION

suicide ideation of individuals in online social networks

Citation preview

Page 1: Suicide ideation of individuals in online social networks tokyo webmining

Suicide  ideation  of  individuals  in  online  social  networks

N.  Masuda,  I.  Kurahashi  and  H.  Onari,  arXiv:1207.2548,  2012

Hiroko  Onari#TokyoWebmining

26th  of  August,  2012

Page 2: Suicide ideation of individuals in online social networks tokyo webmining

What  is  a  social  network?

• graph  that  represents  relationships  (ties,  links)  between  independent  users  (nodes)

Page 3: Suicide ideation of individuals in online social networks tokyo webmining

Directed  networkswhere  ties  have  directione.g.)  online  directed  networks

-  twitter

-  Google+

-  YouTube

-  Flickr

Undirected  networkswhere  ties  have  no  directione.g.)  online  undirected  networks

-  mixi

-  Facebook

-  skype

-  LinkedIn

Page 4: Suicide ideation of individuals in online social networks tokyo webmining

What  is  suicide?-  association  with  social  isolation  -

• Suicide  is  defined  as  all  cases  of  death  resulting  directly  or  indirectly  from  a  positive  (e.g.,  shooting  oneself)  or  negative  (e.g.,  refusing  to  eat)  act  of  the  victim  himself,  which  he  knows  will  produce  this  result.  [Durkheim,  1951]

• Suicide  is  not  an  individual  act  nor  a  personal  action.  The  force,  which  determines  the  suicide,  is  not  psychological  but  social.  Suicide  is  the  result  of  social  disorganization  or  lack  of  social  integration  or  social  solidarity.  [Durkheim,  1951]  =>  Social  Isolation

Page 5: Suicide ideation of individuals in online social networks tokyo webmining

Social  network  analysis  on  suicide  &  social  isolation

• A  small  number  of  friends  and  a  small  fraction  of  triangles  to  which  an  individual  belongs  significantly  contribute  to  suicide  ideation  of  social  isolation.  (by  study  the  relationship  between  suicidal  behavior  and  egoentric  social  networks  among  adolescents)[Bearman  &  Moody,  2004]  [Cui  et  al.,  2010]

• The  paucity  of  triangles,  or  intransitivity  also  characterizes  social  isolation.  [Wasserman  &  Faust,  1994]  [Bearman  &  Moody,  2004]

• Individuals  without  triangles  are  considered  to  lack  membership  to  social  group  even  if  they  have  many  friends.  [Krackhardt,  1999]

Page 6: Suicide ideation of individuals in online social networks tokyo webmining

Social  Statistics  by  OECD

Suicide  ratesper  100,000  persons  per  year

Suicide  ratesper  100,000  persons,  1960  -  2006

0

5

10

15

20

25

30

35

40

45

1960

19

62 19

64 19

66 19

68 19

70 19

72 19

74 19

76 19

78 19

80 19

82 19

84 19

86 19

88 19

90 19

92 19

94 19

96 19

98 20

00 20

02 20

04 20

06

Denmark Greece Hungary Ireland Japan Switzerland OECD average

Japan

Japanʼ’s  suicide  rate  per  100,000  persons  is  higher  than  any  other  OECD  country.

Page 7: Suicide ideation of individuals in online social networks tokyo webmining

Research  Questions

• From  the  perspective  of  network  science,  can  we  observe  indications  for  reducing  suicide  by  the  quantitative  analysis  in  online  social  network?

• Can  we  say  that  online  social  network  reflect  real  personal  relationship?

Page 8: Suicide ideation of individuals in online social networks tokyo webmining

Data

• Social  Network  of                            registered  users  from  mixi  as  of  March  2012.  

• More  than                              user-defined  communities  on  various  topics  in  mixi  as  of  April  2012.  A  community  is  a  group  of  users  that  have  a  common  interest  such  as  hobby.  The  user-defined  community  is  distinctive  feature  which  other  major  SNSs  like  Facebook  do  not  have.

• mixi  is  a  major  SNS  in  Japan,  and  it  launched  in  2004.

4.5× 106

2.7× 107

Page 9: Suicide ideation of individuals in online social networks tokyo webmining

Analysis  EnvironmentTokyo  Cabinet Analysis  Computer

edge  listID,  friendʼ’s  ID1,  21,  32,  12,  42,  5

clustering  coefficientID,  Clustering  Coefficient1,  0.42,  0.2

Perlpersonal  info.community  info.

result  report

We  calculated  clustering  coefficient  in  Tokyo  Cabinet  which  is  a  library  of  routines  for  a  managing  database  and  is  contained  Key-Value  store.  As  API  for  Tokyo  Cabinet,  Perl  language  was  used.  Data  analysis  was  implemented  in  R.Irrelevant  private  information  was  deleted,  and  relevant  information  was  encrypted.  We  conducted  all  analysis  in  Tokyo  HQ  office  of  mixi  using  a  computer  that  is  not  connected  to  Internet.

Tokyo  CabinetID:  friendʼ’s  IDs1:  2,  32:  1,  4,  5

Page 10: Suicide ideation of individuals in online social networks tokyo webmining

Sampling  Procedure  (1/2)• Suicide  seed  sample  (9990  users):  Selected  4  communities  which  are  related  with  suicide  as  the  following  criteria;  

(1)  the  name  of  user-defined  communities  includes  the  word  “suicide”  (“jisatsu”  in  Japanese)(2)  at  least  1000  members(3)  at  least  100  comments  posted  for  each  topic(4)  at  least  3  independent  topics  on  which  comments  were  made  on  October,  2011(5)  the  admission  to  join  community  is  open  to  public

     *  excluded  communities  which  concentrated  on  the  method  of  committing  suicide  and                encouraged  members  to  live  with  hopes        *  discarded  users  with  0  or  1  friend  on  mixi

Then,  sampled  9990  active  users  that  existed  as  of  January  23,  2012  and  logged  on  to  mixi  in  more  than  20  days  per  month  on  average  from  August  through  December  2011  from  the  suicidal  communities

Page 11: Suicide ideation of individuals in online social networks tokyo webmining

Sampling  Procedure  (2/2)• Depression  seed  sample  (24410  users):  Selected  7  communities  which  are  related  with  depression  by  the  way  of  the  similar  criteria  with  suicide  seed  sampling.  The  difference  is  the  name  of  user-defined  communities  includes  the  word  “depression”  (“utsu”  in  Japanese)  Sampled  24410  active  users  from  the  depression  communities

• Control  seed  sample  (228949  users):  Random  sample  of  active  users  that  who  had  at  least  2  friends,  and  did  not  join  the  suicidal  communities  and  the  depression-related  communities

Page 12: Suicide ideation of individuals in online social networks tokyo webmining

Measurements  of  a  social  network

• The  following  network  indications  were  adopted.-  degree-  degree  distribution-  clustering  coefficient-  homophily

Page 13: Suicide ideation of individuals in online social networks tokyo webmining

Degree

• Degree  is  the  number  of  neighbors  (i.e.,  Friends),  and  denoted  by          for  user      .  A  small  degree  is  an  indicator  of  social  isolation.

ki

i

Page 14: Suicide ideation of individuals in online social networks tokyo webmining

Degree  distribution

• It  is  known  that  the  degree  distribution  of  human  relationship  are  long  tailed.    Most  people  have  a  relatively  small  degree,  but  a  few  people  have  very  large  degree,  being  connected  to  many  other  people.

Page 15: Suicide ideation of individuals in online social networks tokyo webmining

Clustering  coefficient• Clustering  coefficient  is  a  measure  of  the  number  of  triangles  in  a  network.  In  social  networks,  clustering  coefficient  is  large,  the  user  is  considered  to  be  embedded  in  close-knit  social  groups  (Wasserman  &  Faust,  1994;  Watts  &  Strogattz,1998;  Newman,  2010).  A  small  value  is  an  indicator  of  social  isolation.

• Clustering  coefficient  can  be  measured  in  two  ways:  “global  clustering  coefficient”  (often  called  “transitivity”)  and  “local  clustering  coefficient”.  The  global  measure  gives  an  overall  indication  of  the  clustering  in  the  network,  whereas  the  local  measure  gives  an  indication  of  the  embeddedness  of  single  nodes.

• In  our  research,  we  use  “local  clustering  coefficient”.*  Local  clustering  coefficient  is  often  used  in  network  science  (complex  network),  and  the  global  value  is  often  used  in  sociology.  

Page 16: Suicide ideation of individuals in online social networks tokyo webmining

Local  clustering  coefficientfor  undirected  networks

• The  local  clustering  coefficient            for  each  vertex(user)          is  defined  by

     *  By  definition,                                  .      *          is  degree  of  the  user          .        *  The  user  who  have  0  or  1  friend  (                                  )  should/can  be  removed.  

• The  average  of  local  clustering  coefficient  is  defined  by  

       *  By  definition,                              .        *          is  the  total  number  of  users  in  the  network  except                                    .

Ci ≡number of triangles connected to vi

ki(ki − 1)/2.

N

vi

ki

ki = 0 or 1

Ci

vi

ki = 0 or 1

C ≡ 1N

N�

i=1

Ci.

0 ≤ C ≤ 1

0 ≤ Ci ≤ 1

Page 17: Suicide ideation of individuals in online social networks tokyo webmining

Degree  and  clustering  coefficient

• The  influence  of          and          need  to  be  distinguished  carefully.  Here  is  an  example.  There  are  two  people  with  5  friends,  but  the  different  number  of  links.  

ki Ci

ki = 5Ci =

05(5− 1)/2

= 0 Ci =3

5(5− 1)/2=

310

ki = 5

Page 18: Suicide ideation of individuals in online social networks tokyo webmining

Degree  and  clustering  coefficient• Each  data  point              for  degree  is  obtained  by  averaging          over  the  users  in  a  group  with  degree      .  Large  fluctuations  of              at  large        values  are  caused  by  the  paucity  of  users  having  large      .          decreases  with          in  many  networks  (Newman,  2010).

CiC(k)

k

k

C(k)

Cik

Page 19: Suicide ideation of individuals in online social networks tokyo webmining

Homophily

• Similar  individuals  are  more  likely  to  become  friends.  It  is  called  “homophily”.  In  this  study,  we  adopt  the  fraction  of  neighbor  with  suicide  ideation.  

• It  should  be  noted  that,  if  a  user  has  relatively  many  friends  with  suicide  ideation,  it  does  not  necessarily  imply  that  suicide  is  contagious.  Homophily  may  be  a  cause  of  such  assortativity.

• FYI:  There  is  some  research  to  differentiate  the  effect  of  influence  and  homophily  (Aral  et  al.,  2009;  Shalizi  &  Thomas,  2011)*My  presentation  on  the  effect  of  influence  and  homphily  based  on  Aralʼ’s  paper  in  slideshare.    http://www.slideshare.net/hirokoonari/ss-13221508

Page 20: Suicide ideation of individuals in online social networks tokyo webmining

Homophily• In  this  study,  users  in  suicide  group  has  more  comparatively  similar  friends  than  users  in  control  group.  The  same  tendency  can  be  said  for  users  in  depression-related  group.

Page 21: Suicide ideation of individuals in online social networks tokyo webmining

Independent  variablesPersonal  variablesPersonal  variables

Age

Gender

Local  network  variablesLocal  network  variables

degree

local  clustering  coefficient

Homophily

number  of  neighbors  (friends)

undirected  clustering  coefficient

number  of  neighbors  who  join  the  suicide  /  depression  community

Behavioral  variables  in  mixiBehavioral  variables  in  mixi

Community  number

Registration  period

number  of  communities  which  a  user  join

number  of  days  between  the  registration  date  and  Jan.  23,  2012

Page 22: Suicide ideation of individuals in online social networks tokyo webmining

Statistical  models• Univariate  and  multivariate  logistic  regressions:  estimating  the  likelihood  of  belonging  to  a  suicidal  or  a  depressive  community

• VIF  (variance  inflation  factor):  checking  the  multicollinearity  between  independent  variables  to  justify  the  use  of  the  multivariate  logistic  regression.  The  recommended  VIF  value  is  smaller  than  10  (preferably  smaller  than  5).

• Pearson,  Spearman,  and  Kendall  correlation  coefficients:  measuring  correlation  between  the  independent  variables

• AUC  (area  under  the  receiver  operating  characteristic  curve):  quantifying  the  explanatory  power  of  the  logistic  model.  The  AUC  value  falls  between  0.5  and  1.  A  large  AUC  value  indicates  that  logistic  regression  fits  well.

Page 23: Suicide ideation of individuals in online social networks tokyo webmining

Univariate  statistics  of  independent  variables  for  the  suicide  and  control  groups

Variable

Suicide group

(N = 9, 990)

Control group

(N = 228, 949)

Mean±SDRange

(min,max)Mean±SD

Range

(min,max)

p-value

Age 27.4±10.3 (17, 97) 27.7±9.2 (14, 96) 0.000652

Community number 283.7±284.3 (1, 1000) 46.3±79.4 (1, 1000) < 0.0001

ki 82.9±98.7 (2, 1000) 65.8±67.6 (2, 1000) < 0.0001

Ci 0.087±0.097 (0, 1) 0.150±0.138 (0, 1) < 0.0001

Homophily (suicide) 0.0110±0.0329 (0, 1.000) 0.0012±0.0080 (0, 0.667) < 0.0001

Registration period 1235.7±638.9 (122, 2878) 1333.5±670.5 (102, 2891) < 0.0001

Gender (female) 5,786 (57.9%) 126,941 (55.4%) < 0.0001

No. suicidal communities 1.20±0.51 (1, 4) N/A N/A N/A

No. login days 28.9±4.4 (1, 31) 26.9±6.3 (1, 31) < 0.0001

Page 24: Suicide ideation of individuals in online social networks tokyo webmining

Multivariate  logistic  regression  of  suicide  ideation  on  individual  and  network  variables

Variable OR CI p-value VIFAge 1.00463 (1.00211, 1.00716) 0.000313 1.091

Gender (female = 1) 0.821 (0.783, 0.861) < 0.0001 1.028Community number 1.00733 (1.00720, 1.00747) < 0.0001 1.197

ki 0.99790 (0.99758, 0.99821) < 0.0001 1.156Ci 0.0093 (0.0069, 0.0126) < 0.0001 1.081

Homophily (suicide) 2.22× 1012 (0.57× 1012, 8.65× 1012) < 0.0001 1.016Registration period 0.999383 (0.999346, 0.999420) < 0.0001 1.135

*  OR:  odds  ratio;  CI:  95%  confidence  interval;  VIF:  variance  inflation  factorAUC 0.873

More  likely  to  belong  to  the  suicide  group  than  control  group  on  average;  -  A  one-year  older  user  is  1.00463  times    -  Being  female  is  1.00463  times  -  Membership  to  one  community  is  1.00733  times  -  Having  one  friend  is  0.99790  times  -  An  increase  in  Ci  by  0.01  is  0.0093^0.01  =  0.95  times  -  An  increase  in  the  fraction  of  friends  in  the  suicide  group  by  0.01  is  (2.22  ×  10^12)^0.01  =  1.33  times  -  One  day  of  the  registration  period  is  0.999383  timesAUC  is  large,  so  this  logistic  regression  fits  well.

Page 25: Suicide ideation of individuals in online social networks tokyo webmining

Correlation  coefficients  between  pairs  of  independent  variables

Variable 1 Variable 2Control Suicide Depression

P S K P S K P S K

Age Gender −.053 −.026 −.022 −.094 −.137 −.116 −.166 −.174 −.145

Age Community number −.032 .023 .015 −.045 −.105 −.073 −.089 −.131 −.091

Age ki −.279 −.385 −.271 −.103 −.224 −.157 −.168 −.268 −.187

Age Ci .041 −.152 −.111 −.048 −.220 −.154 −.092 −.273 −.192

Age Homophily (suicide) −.011 −.090 .074 .031 −.037 −.029 N/A N/A N/A

Age Homophily (depression) −.007 −.083 −.066 N/A N/A N/A .166 .121 −.089

Age Registration period .278 .460 .337 .159 .356 .259 .203 .364 .266

Gender Community number .110 .116 .095 .205 .204 .166 .086 .083 .068

Gender ki .015 .014 .011 .048 .046 .038 .048 .046 .038

Gender Ci −.084 −.085 −.069 −.109 −.097 −.080 −.061 −.030 −.024

Gender Homophily (suicide) −.012 −.017 −.017 −.007 .031 .028 N/A N/A N/A

Gender Homophily (depression) .000 .009 .008 N/A N/A N/A −.053 −.021 −.018

Gender Registration period .025 .025 .020 −.064 −.061 −.050 −.078 −.079 −.065

Community number ki .375 .372 .258 .348 .338 .231 .375 .360 .248

Community number Ci −.376 −.399 −.277 −.231 −.200 −.136 −.201 −.171 −.116

Community number Homophily (suicide) .027 .113 .091 −.034 .140 .105 N/A N/A N/A

Community number Homophily (depression) .038 .166 .132 N/A N/A N/A −.150 .034 .025

Community number Registration period .339 .338 .230 .166 .152 .102 .187 .172 .115

ki Ci −.363 −.248 −.175 −.251 −.116 −.085 −.240 −.105 −.074

ki Homophily (suicide) −.013 .191 .150 −.175 .174 .107 N/A N/A N/A

ki Homophily (depression) −.027 .254 .188 N/A N/A N/A −.210 .076 .029

ki Registration period .102 .081 .055 .170 .154 .103 .172 .152 .101

Ci Homophily (suicide) −.026 −.100 −.080 −.047 −.213 −.162 N/A N/A N/A

Ci Homophily (depression) −.031 −.145 −.114 N/A N/A N/A −.055 −.243 −.182

Ci Registration period −.221 −.249 −.168 −.143 −.112 −.162 −.133 −.099 −.068

Homophily (suicide) Registration period −.039 −.031 −.025 −.104 −.059 −.044 N/A N/A N/A

Homophily (depression) Registration period −.024 .011 .009 N/A N/A N/A −.120 −.049 −.036

*  P:  Pearson;  S:  Spearman,  K:  Kendall  correlation  coefficients*                          >  0.2

These  correlation  coefficients  are  sufficiently  small.

Page 26: Suicide ideation of individuals in online social networks tokyo webmining

Univariate  logistic  regression  of  suicide  ideation  on  individual  and  network  variables

*  OR:  odds  ratio;  CI:  95%  confidence  interval;  AUC:  area  under  the  curve

Variable OR CI p-value AUCAge 0.99604 (0.99377, 0.99832) 0.000651 0.515

Gender (female = 1) 1.106 (1.062, 1.152) < 0.0001 0.512Community number 1.00728 (1.00716, 1.00741) < 0.0001 0.867

ki 1.00259 (1.00237, 1.00280) < 0.0001 0.549Ci 0.000581 (0.000428, 0.000789) < 0.0001 0.690

Homophily (suicide) 1.57× 1016 (0.41× 1016, 6.08× 1016) < 0.0001 0.643Registration period 0.999783 (0.999753, 0.999813) < 0.0001 0.545

-  The  community  number  makes  by  far  the  largest  contribution  among  the  seven  independent  variables.-  The  second  largest  explanatory  power  is  the  AUC  0.690  of  clustering  coefficient.      This  result  is  consistent  with  the  previous  one  (Bearman  &  Moody,  2004).  -  The  third  largest  explanatory  power  is  the  AUC  0.643  of  homophily.

Page 27: Suicide ideation of individuals in online social networks tokyo webmining

Conclusions• Online  social  behavior  of  users  rather  than  demographic  properties.  The  below  factors  contribute  to  suicide  ideation  by  the  largest  amounts-  increase  in  the  community  number-  decrease  in  the  local  clustering  coefficient-  increase  in  the  homophily  variable

• The  age  and  gender  little  influence  suicide  ideation  is  inconsistent  with  previous  findings  (Wray  et  al.,  2011).

• The  degree  little  explains  suicide  ideation  is  inconsistent  with  previous  studies  (Bearman  &  Moody,  2004;  Cui  et  al.,  2010).

• User-defined  communities  of  mix  cover  virtually  all  major  topics.  As  a  future  study,  applying  the  present  methods  can  be  profitable.

Page 28: Suicide ideation of individuals in online social networks tokyo webmining

Appendix-  Analysis  of  depressive  symptoms  -

Page 29: Suicide ideation of individuals in online social networks tokyo webmining

Univariate  statistics  of  independent  variables  for  the  depression  and  control  groups

Variable

Depression group

(N = 24, 410)

Control group

(N = 228, 949)

Mean±SDRange

(min,max)Mean±SD

Range

(min,max)

p-value

Age 28.8±9.4 (16, 97) 27.7±9.2 (14, 96) < 0.0001

Community number 249.6±263.1 (1, 1000) 46.3±79.4 (1, 1000) < 0.0001

ki 81.9±88.1 (2, 1000) 65.8±67.6 (2, 1000) < 0.0001

Ci 0.085±0.089 (0, 1) 0.150±0.138 (0, 1) < 0.0001

Homophily (depression) 0.0196±0.0501 (0, 1.000) 0.0031±0.0131 (0, 0.667) < 0.0001

Registration period 1389.4±659.2 (122, 2885) 1333.5±670.5 (102, 2891) < 0.0001

Gender (female) 16,872 (69.1%) 126,941 (55.4%) < 0.0001

No. suicidal communities 1.16±0.47 (1, 6) N/A N/A N/A

No. login days 28.8±4.4 (1, 31) 26.9±6.3 (1, 31) < 0.0001

Page 30: Suicide ideation of individuals in online social networks tokyo webmining

Multivariate  logistic  regression  of  depressive  symptoms  on  individual  and  network  variables

*  OR:  odds  ratio;  CI:  95%  confidence  interval;  VIF:  variance  inflation  factor

Variable OR CI p-value VIFAge 1.0141 (1.0124, 1.0158) < 0.0001 1.104

Gender (female = 1) 1.532 (1.481, 1.585) < 0.0001 1.019Community number 1.00790 (1.00778, 1.00803) < 0.0001 1.155

ki 0.99833 (0.99810, 0.99856) < 0.0001 1.154Ci 0.0145 (0.0118, 0.0178) < 0.0001 1.079

Homophily (depression) 1.98× 1010 (0.99× 1010, 4.02× 1010) < 0.0001 1.022Registration period 0.999744 (0.999720, 0.999769) < 0.0001 1.117

AUC 0.866

Page 31: Suicide ideation of individuals in online social networks tokyo webmining

Univariate  logistic  regression  of  depressive  symptoms  on  individual  and  network  variables

*  OR:  odds  ratio;  CI:  95%  confidence  interval;  AUC:  area  under  the  curve

Variable OR CI p-value AUCAge 1.0110 (1.0097, 1.0123) < 0.0001 0.551

Gender (female = 1) 1.799 (1.748, 1.850) < 0.0001 0.568Community number 1.00826 (1.00814, 1.00837) < 0.0001 0.860

ki 1.00258 (1.00243, 1.00274) < 0.0001 0.566Ci 0.000415 (0.000338, 0.000509) < 0.0001 0.692

Homophily (depression) 2.12× 1012 (1.05× 1012, 4.28× 1012) < 0.0001 0.658Registration period 1.000126 (1.000106, 1.000145) < 0.0001 0.522