Upload
kevin-bonham
View
109
Download
4
Embed Size (px)
Citation preview
GENDER DISPARITY IN COMPUTATIONAL BIOLOGY
19 JULY, 2016
2016-07-19
OUTLINE
▸ Gender differences in Publication (background)
▸ Computationally inferring gender
▸ Gender in Biology, Computational Biology and Computer Science
2016-07-19
MEN PUBLISH MORE PAPERS THAN WOMEN
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
2016-07-19
MEN PUBLISH MORE PAPERS THAN WOMEN
http://fivethirtyeight.com/features/in-science-it-matters-that-women-come-last/
2016-07-19
GENDER DISPARITY VARIES BY FIELD
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
2016-07-19
GENDER DISPARITY VARIES BY FIELD
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
2016-07-19
GENDER DISPARITY VARIES BY FIELD
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
http://www.eigenfactor.org/gender/#
2016-07-19
https://commons.wikimedia.org/wiki/File:Matilda_Effect.png
MEN GET MORE CREDIT FOR DISCOVERY THAN WOMEN
2016-07-19
MEN ARE MORE LIKELY TO BE CITED THAN WOMEN
http://www.nature.com/news/bibliometrics-global-gender-disparities-in-science-1.14321
2016-07-19
MEN CITE THEMSELVES MORE OFTEN THAN WOMEN
http://arxiv.org/abs/1607.00376
2016-07-19
INFERRING GENDER FROM FIRST NAMES
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
We use US Social Security Administration records to determine gender from first names. The US Social Security Administration website (http://www.ssa.gov/oact/babynames/) makes available the top 1000 names annually for each of the 153 million boys and 143 million girls born from 1880–2010. (These data acknowledge only two genders.) We assume we can identify an author's gender if the author's first name is associated with a single gender in social security records at least 95% of the time, as with ‘Mary’, or ‘John’. Otherwise, as with ‘Leslie’ or ‘Sidney’, we are unable to identify the gender and do not include that author in our analysis.
2016-07-19
INFERRING GENDER FROM FIRST NAMES
“To provide the highest possible accuracy, we combine the data of multiple data sources. We use data from publicly available governmental sources and combine them with data we crawl from social networks, which provides you the best possible matches. Each name has to be verified by different sources to be added to our list.”
2016-07-19
INFERRING GENDER FROM FIRST NAMES
“To provide the highest possible accuracy, we combine the data of multiple data sources. We use data from publicly available governmental sources and combine them with data we crawl from social networks, which provides you the best possible matches. Each name has to be verified by different sources to be added to our list.”
BENEFITS:More names available than from census records alone
Androgynous names have probabilities associated
2016-07-19
INFERRING GENDER FROM FIRST NAMES
“To provide the highest possible accuracy, we combine the data of multiple data sources. We use data from publicly available governmental sources and combine them with data we crawl from social networks, which provides you the best possible matches. Each name has to be verified by different sources to be added to our list.”
MAJOR ISSUES:~50% of names have no gender information
Non-western names are less likely to have gender information
BENEFITS:More names available than from census records alone
Androgynous names have probabilities associated
2016-07-19
GUESSES MATCH VERIFIED GENDERS
BMJ 2016; 352 doi: http://dx.doi.org/10.1136/bmj.i847 (Published 02 March 2016)
2016-07-19
GUESSES MATCH VERIFIED GENDERS
BMJ 2016; 352 doi: http://dx.doi.org/10.1136/bmj.i847 (Published 02 March 2016)
Known
Known w/o
NA
Guesse
d0.0
0.1
0.2
0.3
0.4
0.5
Medical Journal First Authors
P(f
emal
e)
2016-07-19
OUTLINE
▸ Gender differences in Publication (background)
▸ Computationally inferring gender
▸ Gender in Biology, Computational Biology and Computer Science
2016-07-19
MOTIVATION
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
2016-07-19
MOTIVATION
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
2016-07-19
MOTIVATION
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The Role of Gender in Scholarly Authorship. PLoS ONE 8(7): e66212. doi: 10.1371/journal.pone.0066212
COULD BIOLOGY BE A PATH TO COMPUTATIONAL/
QUANTITATIVE SKILLS?
2016-07-19
METHODOLOGY
2016-07-19
METHODOLOGY
▸ Download article info from Pubmed
2016-07-19
METHODOLOGY
▸ Download article info from Pubmed
▸ MeSH Terms: “Biology” and “Computational Biology”
2016-07-19
METHODOLOGY
▸ Download article info from Pubmed
▸ MeSH Terms: “Biology” and “Computational Biology”
▸ 1997-2014
2016-07-19
METHODOLOGY
▸ Download article info from Pubmed
▸ MeSH Terms: “Biology” and “Computational Biology”
▸ 1997-2014
▸ Parse XML documents for author names, dates etc
2016-07-19
METHODOLOGY
▸ Download article info from Pubmed
▸ MeSH Terms: “Biology” and “Computational Biology”
▸ 1997-2014
▸ Parse XML documents for author names, dates etc
▸ http://nbviewer.jupyter.org/github/kescobo/gender-comp-bio/blob/name_stats/src/xml_parsing.ipynb
2016-07-19
METHODOLOGY
▸ Download article info from Pubmed
▸ MeSH Terms: “Biology” and “Computational Biology”
▸ 1997-2014
▸ Parse XML documents for author names, dates etc
▸ http://nbviewer.jupyter.org/github/kescobo/gender-comp-bio/blob/name_stats/src/xml_parsing.ipynb
▸ Use Gender API names to guess genders
2016-07-19
METHODOLOGY
PMID Date Journal Name Position Dataset P(Female) Count26251854 2015/08/06 JEmpirResHum
ResEthics Masaru last bio 0.01 345
26251854 2015/08/06 JEmpirResHumResEthics Ituro second bio NA 0
26251854 2015/08/06 JEmpirResHumResEthics Naoaki penulEmate bio 0 37
26251854 2015/08/06 JEmpirResHumResEthics Mayumi other bio 0.97 654
26152079 2015/07/08 ArchIntHistSci(Paris) Vallori first bio NA 0
26152076 2015/07/08 ArchIntHistSci(Paris) Pierre-Olivier first bio NA 0
26152076 2015/07/08 ArchIntHistSci(Paris) Bernardino last bio 0.02 627
26152075 2015/07/08 ArchIntHistSci(Paris) Dolores first bio 0.98 4438
26152074 2015/07/08 ArchIntHistSci(Paris) Simone first bio 0.33 54976
26031011 2015/06/02 Pak.J.Biol.Sci. Jamuna first bio 0.89 174
26031011 2015/06/02 Pak.J.Biol.Sci. Johanna last bio 0.98 15733
2016-07-19
DATA
▸ Biology (1997-2014)
▸ Publications: 202,818
▸ Authors: 1,111,776
▸ Computational Biology (1997-2014)
▸ Publications: 42882
▸ Authors: 244,141
2016-07-19
DATA
▸ Biology (1997-2014)
▸ Publications: 202,818
▸ Authors: 1,111,776
▸ Computational Biology (1997-2014)
▸ Publications: 42882
▸ Authors: 244,141
▸ Unique Names: 74760
▸ % Names with unknown gender: 43.0%
▸ % Authors with unknown gender: 26.6%
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Primary Articles 1997-2014
Author Position
P(f
emal
e)BioComp
Error Bars: 95% confidence interval based on 1000 sample bootstrap
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Primary Articles 1997-2014
Author Position
P(f
emal
e)BioComp
Error Bars: 95% confidence interval based on 1000 sample bootstrap
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Author Position
P(f
emal
e)
Nature, Science, Cell 1997-2014
BioComp
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Primary Articles 1997-2014
Author Position
P(f
emal
e)BioComp
Error Bars: 95% confidence interval based on 1000 sample bootstrap
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Author Position
P(f
emal
e)
Nature, Science, Cell 1997-2014
BioComp
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Author Position
P(f
emal
e)
PLoS Journals
PLoS Biol.PLoS Comput. Biol.
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Primary Articles 1997-2014
Author Position
P(f
emal
e)BioComp
Error Bars: 95% confidence interval based on 1000 sample bootstrap
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Author Position
P(f
emal
e)
Nature, Science, Cell 1997-2014
BioComp
2005 2010 20150.0
0.1
0.2
0.3
0.4
0.5
Year
P(female)
BioComp
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Author Position
P(f
emal
e)
PLoS Journals
PLoS Biol.PLoS Comput. Biol.
J. C
ompu
t. B
iol.
IEEE
/AC
M T
rans
Com
put B
iol B
ioin
form
Bio
info
rmat
ics
J. B
iosc
i. B
ioen
g.J
Che
m In
f Mod
elB
MC
Bio
info
rmat
ics
Prot
eins
J. T
heor
. Bio
l.PL
oS C
ompu
t. Bi
ol.
Nat
. Met
hods
Nat
. Bio
tech
nol.
Evol
utio
nPr
oc. B
iol.
Sci.
J. M
ol. B
iol.
Ana
l. C
hem
.G
enom
e B
iol.
BM
C S
yst B
iol
Mol
. Sys
t. Bi
ol.
Gen
etic
sM
ol. B
iol.
Evol
.B
iote
chno
l. B
ioen
g.M
ol. P
hylo
gene
t. Ev
ol.
Nuc
leic
Aci
ds R
es.
Pac
Sym
p B
ioco
mpu
tN
at P
roto
cA
stro
biol
ogy
Gen
ome
Res
.Pr
oc. N
atl.
Aca
d. S
ci. U
.S.A
.Pl
ant J
.M
eth.
Enz
ymol
.Sc
ienc
eJ.
Bio
tech
nol.
Mol
. Eco
l.Sc
i Rep
BM
C E
vol.
Bio
l.En
viro
n. T
oxic
ol. C
hem
.M
ol. C
ell P
rote
omic
sN
atur
eB
MC
Pla
nt B
iol.
Bio
chem
. Bio
phys
. Res
. Com
mun
.J.
Nat
. Pro
d.M
etho
ds M
ol. B
iol.
PLoS
Bio
l.Pl
ant C
ell
J. M
icro
biol
. Bio
tech
nol.
BM
C G
enom
ics
Cel
lN
at. G
enet
.N
at C
omm
unO
MIC
SN
ew P
hyto
l.Pr
oteo
mic
sPl
ant P
hysi
ol.
J. P
rote
ome
Res
.To
xico
l. Sc
i.PL
oS G
enet
.J.
Bio
l. Ch
em.
Can
cer
Res
.J.
Hum
. Gen
et.
Phys
iol.
Gen
omic
sJ.
Viro
l.A
ppl.
Mic
robi
ol. B
iote
chno
l.C
lin. C
ance
r Res
.PL
oS P
atho
g.En
viro
n. S
ci. T
echn
ol.
Ana
l Bio
anal
Che
mJ.
Gen
. Viro
l.B
lood
Mol
Bio
syst
ISM
E J
Gen
omic
sM
alar
. J.
PLoS
ON
EG
ene
Int.
J. S
yst.
Evol
. Mic
robi
ol.
Bio
reso
ur. T
echn
ol.
Am
. J. H
um. G
enet
.En
viro
n. M
icro
biol
.J.
Her
ed.
Elec
trop
hore
sis
J. B
acte
riol.
BM
C G
enet
.C
hem
osph
ere
J Et
hnop
harm
acol
J. C
lin. O
ncol
.J
Prot
eom
ics
Bio
chim
. Bio
phys
. Act
aEn
viro
n. P
ollu
t.M
icro
biol
ogy
(Rea
ding
Eng
l.)Ap
pl. E
nviro
n. M
icro
biol
.Am
. J. T
rop.
Med
. Hyg
.J.
Ind.
Mic
robi
ol. B
iote
chno
l.PL
oS N
egl T
rop
Dis
Wat
er R
es.
J. H
azar
d. M
ater
.M
ol. B
iol.
Rep.
Viro
l. J.
Hum
. Mut
at.
FEM
S M
icro
biol
. Eco
l.FE
MS
Mic
robi
ol. L
ett.
Mic
rob.
Eco
l.J.
Mic
robi
ol. M
etho
dsJ.
Agr
ic. F
ood
Che
m.
J. In
fect
. Dis
.A
ppl.
Bio
chem
. Bio
tech
nol.
Hum
. Gen
et.
Hum
. Bio
l.In
fect
. Gen
et. E
vol.
Phar
mac
ogen
omic
sPh
arm
acog
enet
. Gen
omic
sJ.
Viro
l. M
etho
dsC
urr.
Mic
robi
ol.
Syst
. App
l. M
icro
biol
.A
nton
ie V
an L
eeuw
enho
ekA
ntim
icro
b. A
gent
s C
hem
othe
r.En
viro
n Sc
i Pol
lut R
es In
tEn
viro
n M
onit
Ass
ess
Bio
med
Res
Int
Emer
ging
Infe
ct. D
is.
J W
ater
Hea
lthG
enet
. Epi
dem
iol.
Can.
J. M
icro
biol
.B
MC
Infe
ct. D
is.
Para
sito
l. R
es.
Sci.
Tota
l Env
iron.
Hum
. Mol
. Gen
et.
J. F
ood
Prot
.B
MC
Mic
robi
ol.
J. F
ood
Sci.
J. F
oren
sic
Sci.
AID
S R
es. H
um. R
etro
viru
ses
Res
. Mic
robi
ol.
J. C
lin. M
icro
biol
.Fo
rens
ic S
ci. I
nt.
J. M
ed. V
irol.
Am
. J. P
hys.
Ant
hrop
ol.
J. M
ed. M
icro
biol
.Eu
r. J.
Hum
. Gen
et.
Clin
. Inf
ect.
Dis
.Fo
rens
ic S
ci In
t Gen
etFo
odbo
rne
Path
og. D
is.
J. C
lin. V
irol.
Am
J In
fect
Con
trol
Food
Mic
robi
ol.
Int.
J. F
ood
Mic
robi
ol.
Dia
gn. M
icro
biol
. Inf
ect.
Dis
.J.
Ant
imic
rob.
Che
mot
her.
Gen
et. M
ed.
J G
enet
Cou
ns
0.0
0.2
0.4
0.6
0.8
Journal
P(fe
mal
e)By Journal
J. C
ompu
t. B
iol.
IEEE
/AC
M T
rans
Com
put B
iol B
ioin
form
Bio
info
rmat
ics
J. B
iosc
i. B
ioen
g.J
Che
m In
f Mod
elB
MC
Bio
info
rmat
ics
Prot
eins
J. T
heor
. Bio
l.PL
oS C
ompu
t. Bi
ol.
Nat
. Met
hods
Nat
. Bio
tech
nol.
Evol
utio
nPr
oc. B
iol.
Sci.
J. M
ol. B
iol.
Ana
l. C
hem
.G
enom
e B
iol.
BM
C S
yst B
iol
Mol
. Sys
t. Bi
ol.
Gen
etic
sM
ol. B
iol.
Evol
.B
iote
chno
l. B
ioen
g.M
ol. P
hylo
gene
t. Ev
ol.
Nuc
leic
Aci
ds R
es.
Pac
Sym
p B
ioco
mpu
tN
at P
roto
cA
stro
biol
ogy
Gen
ome
Res
.Pr
oc. N
atl.
Aca
d. S
ci. U
.S.A
.Pl
ant J
.M
eth.
Enz
ymol
.Sc
ienc
eJ.
Bio
tech
nol.
Mol
. Eco
l.Sc
i Rep
BM
C E
vol.
Bio
l.En
viro
n. T
oxic
ol. C
hem
.M
ol. C
ell P
rote
omic
sN
atur
eB
MC
Pla
nt B
iol.
Bio
chem
. Bio
phys
. Res
. Com
mun
.J.
Nat
. Pro
d.M
etho
ds M
ol. B
iol.
PLoS
Bio
l.Pl
ant C
ell
J. M
icro
biol
. Bio
tech
nol.
BM
C G
enom
ics
Cel
lN
at. G
enet
.N
at C
omm
unO
MIC
SN
ew P
hyto
l.Pr
oteo
mic
sPl
ant P
hysi
ol.
J. P
rote
ome
Res
.To
xico
l. Sc
i.PL
oS G
enet
.J.
Bio
l. Ch
em.
Can
cer
Res
.J.
Hum
. Gen
et.
Phys
iol.
Gen
omic
sJ.
Viro
l.A
ppl.
Mic
robi
ol. B
iote
chno
l.C
lin. C
ance
r Res
.PL
oS P
atho
g.En
viro
n. S
ci. T
echn
ol.
Ana
l Bio
anal
Che
mJ.
Gen
. Viro
l.B
lood
Mol
Bio
syst
ISM
E J
Gen
omic
sM
alar
. J.
PLoS
ON
EG
ene
Int.
J. S
yst.
Evol
. Mic
robi
ol.
Bio
reso
ur. T
echn
ol.
Am
. J. H
um. G
enet
.En
viro
n. M
icro
biol
.J.
Her
ed.
Elec
trop
hore
sis
J. B
acte
riol.
BM
C G
enet
.C
hem
osph
ere
J Et
hnop
harm
acol
J. C
lin. O
ncol
.J
Prot
eom
ics
Bio
chim
. Bio
phys
. Act
aEn
viro
n. P
ollu
t.M
icro
biol
ogy
(Rea
ding
Eng
l.)Ap
pl. E
nviro
n. M
icro
biol
.Am
. J. T
rop.
Med
. Hyg
.J.
Ind.
Mic
robi
ol. B
iote
chno
l.PL
oS N
egl T
rop
Dis
Wat
er R
es.
J. H
azar
d. M
ater
.M
ol. B
iol.
Rep.
Viro
l. J.
Hum
. Mut
at.
FEM
S M
icro
biol
. Eco
l.FE
MS
Mic
robi
ol. L
ett.
Mic
rob.
Eco
l.J.
Mic
robi
ol. M
etho
dsJ.
Agr
ic. F
ood
Che
m.
J. In
fect
. Dis
.A
ppl.
Bio
chem
. Bio
tech
nol.
Hum
. Gen
et.
Hum
. Bio
l.In
fect
. Gen
et. E
vol.
Phar
mac
ogen
omic
sPh
arm
acog
enet
. Gen
omic
sJ.
Viro
l. M
etho
dsC
urr.
Mic
robi
ol.
Syst
. App
l. M
icro
biol
.A
nton
ie V
an L
eeuw
enho
ekA
ntim
icro
b. A
gent
s C
hem
othe
r.En
viro
n Sc
i Pol
lut R
es In
tEn
viro
n M
onit
Ass
ess
Bio
med
Res
Int
Emer
ging
Infe
ct. D
is.
J W
ater
Hea
lthG
enet
. Epi
dem
iol.
Can.
J. M
icro
biol
.B
MC
Infe
ct. D
is.
Para
sito
l. R
es.
Sci.
Tota
l Env
iron.
Hum
. Mol
. Gen
et.
J. F
ood
Prot
.B
MC
Mic
robi
ol.
J. F
ood
Sci.
J. F
oren
sic
Sci.
AID
S R
es. H
um. R
etro
viru
ses
Res
. Mic
robi
ol.
J. C
lin. M
icro
biol
.Fo
rens
ic S
ci. I
nt.
J. M
ed. V
irol.
Am
. J. P
hys.
Ant
hrop
ol.
J. M
ed. M
icro
biol
.Eu
r. J.
Hum
. Gen
et.
Clin
. Inf
ect.
Dis
.Fo
rens
ic S
ci In
t Gen
etFo
odbo
rne
Path
og. D
is.
J. C
lin. V
irol.
Am
J In
fect
Con
trol
Food
Mic
robi
ol.
Int.
J. F
ood
Mic
robi
ol.
Dia
gn. M
icro
biol
. Inf
ect.
Dis
.J.
Ant
imic
rob.
Che
mot
her.
Gen
et. M
ed.
J G
enet
Cou
ns
0.0
0.2
0.4
0.6
0.8
Journal
P(fe
mal
e)By Journal
2016-07-19
ARXIV DATA
2006 2008 2010 2012 20140.0
0.1
0.2
0.3
0.4
0.5
YearP(female)
Arxiv
Quant BioCS
first
seco
ndoth
er
penulti
mat
elas
t0.0
0.1
0.2
0.3
0.4
0.5
Author Position
P(f
emal
e)
Arxiv Articles 2007-2014
Quant. BioCS
2016-07-19
CONCLUSIONS
2016-07-19
CONCLUSIONS
▸ There are large gender disparities in publishing
2016-07-19
CONCLUSIONS
▸ There are large gender disparities in publishing
▸ Computational Biology has larger disparities than Biology as a whole
2016-07-19
CONCLUSIONS
▸ There are large gender disparities in publishing
▸ Computational Biology has larger disparities than Biology as a whole
▸ But maybe… better than computer science?