14
Proceedings of Machine Learning Research 81:114, 2018 Conference on Fairness, Accountability, and Transparency Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies Nishtha Madaan [email protected] Sameep Mehta [email protected] IBM Research-INDIA Taneea S Agrawaal [email protected] Vrinda Malhotra [email protected] Aditi Aggarwal [email protected] IIIT-Delhi Yatin Gupta [email protected] MSIT Delhi Mayank Saxena [email protected] DTU-Delhi Abstract The presence of gender stereotypes in many aspects of society is a well-known phe- nomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood) and propose an algorithm to remove these stereotypes from text. We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic model- ing of plots at sentence and intra-sentence level. Different features like occupation, introductions, associated actions and de- scriptions are captured to show the perva- siveness of gender bias and stereotype in movies. Using the derived semantic graph, we compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal im- portance even though their character has little or no impact on the movie plot. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken. The next step, is to generate debiased stories. The proposed debiasing algorithm extracts gen- der biased graphs from unstructured piece of text in stories from movies and de-bias these graphs to generate plausible unbiased stories. 1. Introduction Movies are a reflection of the society. They mir- ror (with creative liberties) the problems, issues, thinking & perception of the contemporary soci- ety. Therefore, we believe movies could act as the proxy to understand how prevalent gender bias and stereotypes are in any society. In this paper, we leverage NLP and image understand- ing techniques to quantitatively study this bias. To further motivate the problem we pick a small section from the plot of a blockbuster movie. ”Rohit is an aspiring singer who works as a salesman in a car showroom, run by Malik (Dalip Tahil). One day he meets Sonia Saxena (Amee- sha Patel), daughter of Mr. Saxena (Anupam Kher), when he goes to deliver a car to her home as her birthday present.” This piece of text is taken from the plot of Bol- lywood movie Kaho Na Pyaar Hai. This simple two line plot showcases the issue in following fash- ion: 1. Male (Rohit) is portrayed with a profession & an aspiration 2. Male (Malik) is a business owner In contrast, the female role is introduced with no profession or aspiration. The introduction, itself, is dependent upon another male character ”daughter of”! One goal of our work is to analyze and quantify gender-based stereotypes by studying the demar- c 2018 N. Madaan, S. Mehta, T.S. Agrawaal, V. Malhotra, A. Aggarwal, Y. Gupta & M. Saxena.

Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Proceedings of Machine Learning Research 81:1–14, 2018 Conference on Fairness, Accountability, and Transparency

Analyze, Detect and Remove Gender Stereotyping fromBollywood Movies

Nishtha Madaan [email protected] Mehta [email protected] Research-INDIA

Taneea S Agrawaal [email protected] Malhotra [email protected] Aggarwal [email protected]

Yatin Gupta [email protected] Delhi

Mayank Saxena [email protected]

DTU-Delhi

Abstract

The presence of gender stereotypes in manyaspects of society is a well-known phe-nomenon. In this paper, we focus onstudying such stereotypes and bias in Hindimovie industry (Bollywood) and propose analgorithm to remove these stereotypes fromtext. We analyze movie plots and postersfor all movies released since 1970. Thegender bias is detected by semantic model-ing of plots at sentence and intra-sentencelevel. Different features like occupation,introductions, associated actions and de-scriptions are captured to show the perva-siveness of gender bias and stereotype inmovies. Using the derived semantic graph,we compute centrality of each characterand observe similar bias there. We alsoshow that such bias is not applicable formovie posters where females get equal im-portance even though their character haslittle or no impact on the movie plot. Thesilver lining is that our system was able toidentify 30 movies over last 3 years wheresuch stereotypes were broken. The nextstep, is to generate debiased stories. Theproposed debiasing algorithm extracts gen-der biased graphs from unstructured pieceof text in stories from movies and de-biasthese graphs to generate plausible unbiasedstories.

1. Introduction

Movies are a reflection of the society. They mir-ror (with creative liberties) the problems, issues,thinking & perception of the contemporary soci-ety. Therefore, we believe movies could act asthe proxy to understand how prevalent genderbias and stereotypes are in any society. In thispaper, we leverage NLP and image understand-ing techniques to quantitatively study this bias.To further motivate the problem we pick a smallsection from the plot of a blockbuster movie.

”Rohit is an aspiring singer who works as asalesman in a car showroom, run by Malik (DalipTahil). One day he meets Sonia Saxena (Amee-sha Patel), daughter of Mr. Saxena (AnupamKher), when he goes to deliver a car to her homeas her birthday present.”

This piece of text is taken from the plot of Bol-lywood movie Kaho Na Pyaar Hai. This simpletwo line plot showcases the issue in following fash-ion:

1. Male (Rohit) is portrayed with a profession& an aspiration

2. Male (Malik) is a business owner

In contrast, the female role is introduced withno profession or aspiration. The introduction,itself, is dependent upon another male character”daughter of”!

One goal of our work is to analyze and quantifygender-based stereotypes by studying the demar-

c© 2018 N. Madaan, S. Mehta, T.S. Agrawaal, V. Malhotra, A. Aggarwal, Y. Gupta & M. Saxena.

Page 2: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

cation of roles designated to males and females.We measure this by performing an intra-sentenceand inter-sentence level analysis of movie plotscombined with the cast information. Captur-ing information from sentences helps us performa holistic study of the corpus. Also, it helpsus in capturing the characteristics exhibited bymale and female class. We have extracted moviespages of all the Hindi movies released from 1970-present from Wikipedia. We also employ deepimage analytics to capture such bias in movieposters and previews.

1.1. Analysis Tasks

We focus on following tasks to study gender biasin Bollywood.

I) Occupations and Gender Stereotypes-How are males portrayed in their jobs vs females?How are these levels different? How does it cor-relate to gender bias and stereotype?

II) Appearance and Description - How aremales and females described on the basis of theirappearance? How do the descriptions differ inboth of them? How does that indicate genderstereotyping?

III) Centrality of Male and Female Char-acters - What is the role of males and femalesin movie plots? How does the amount of malebeing central or female being central differ? Howdoes it present a male or female bias?

IV) Mentions(Image vs Plot) - How manymales and females are the faces of the promo-tional posters? How does this correlate to thembeing mentioned in the plot? What results areconveyed on the combined analysis?

V) Dialogues - How do the number of dia-logues differ between a male cast and a femalecast in official movie script?

VI) Singers - Does the same bias occur inmovie songs? How does the distribution ofsingers with gender vary over a period of timefor different movies?

VII) Female-centric Movies- Are the moviestories and portrayal of females evolving? Havewe seen female-centric movies in the recent past?

VIII) Screen Time - Which gender, if any,has a greater screen time in movie trailers?

IX) Emotions of Males and Females -Which emotions are most commonly displayed bymales and females in a movie trailer? Does this

correspond with the gender stereotypes which ex-ist in society?

2. Related Work

While there are recent works where gender biashas been studied in different walks of life Soklar-idis et al. (2017),(MacNell et al., 2015), (Carneset al., 2015), (Terrell et al., 2017), (Saji, 2016),the analysis majorly involves information re-trieval tasks involving a wide variety of priorwork in this area. (Fast et al., 2016) have workedon gender stereotypes in English fiction particu-larly on the Online Fiction Writing Community.The work deals primarily with the analysis ofhow males and females behave and are describedin this online fiction. Furthermore, this workalso presents that males are over-represented andfinds that traditional gender stereotypes are com-mon throughout every genre in the online fictiondata used for analysis.Apart from this, various works where Hollywoodmovies have been analyzed for having such gen-der bias present in them (Anderson and Daniels,2017). Similar analysis has been done on chil-dren books (Gooden and Gooden, 2001) and mu-sic lyrics (Millar, 2008) which found that menare portrayed as strong and violent, and on theother hand, women are associated with home andare considered to be gentle and less active com-pared to men. These studies have been veryuseful to uncover the trend but the derivationof these analyses has been done on very smalldata sets. In some works, gender drives the de-cision for being hired in corporate organizations(Dobbin and Jung, 2012). Not just hiring, it hasbeen shown that human resource professionals’decisions on whether an employee should get araise have also been driven by gender stereotypesby putting down female claims of raise requests.While, when it comes to consideration of opinion,views of females are weighted less as comparedto those of men (Otterbacher, 2015). On socialmedia and dating sites, women are judged bytheir appearance while men are judged mostly byhow they behave (Rose et al., 2012; Otterbacher,2015; Fiore et al., 2008). When considering oc-cupation, females are often designated lower levelroles as compared to their male counterparts inimage search results of occupations (Kay et al.,2015). In our work we extend these analyses for

2

Page 3: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

Bollywood movies.The motivation for considering Bollywood moviesis three fold:

a) The data is very diverse in nature. Hencefinding how gender stereotypes exist in this databecomes an interesting study.

b) The data-set is large. We analyze 4000movies which cover all the movies since 1970. Soit becomes a good first step to develop compu-tational tools to analyze the existence of stereo-types over a period of time.

c) These movies are a reflection of society. Itis a good first step to look for such gender biasin this data so that necessary steps can be takento remove these biases.

3. Data and Experimental Study

3.1. Data Selection

We deal with (three) different types of data forBollywood Movies to perform the analysis tasks-

3.1.1. Movies Data

Our data-set consist of all Hindi movie pagesfrom Wikipedia. The data-set contains 4000movies for 1970-2017 time period. We extractmovie title, cast information, plot, soundtrack in-formation and images associated for each movie.For each listed cast member, we traverse theirwiki pages to extract gender information. CastData consists of data for 5058 cast members whoare Females and 9380 who are Males. Since wedid not have access to too many official scripts,we use Wikipedia plot as proxy. We strongly be-lieve that the Wikipedia plot represent the correctstory line. If an actor had an important role inthe movie, it is highly unlikely that wiki plot willmiss the actor altogether.

3.1.2. Movies Scripts Data

We obtained PDF scripts of 13 Bollywood movieswhich are available online. The PDF scriptsare converted into structured HTML using (Ma-chines, 2017). We use these HTML for our anal-ysis tasks.

3.1.3. Movie Preview Data

Our data-set consists of 880 official movie trailersof movies released between 2008 and 2017. These

trailers were obtained from YouTube. The meanand standard deviation of the duration of the allvideos is 146 and 35 seconds respectively. Thevideos have a frame rate of 25 FPS and a reso-lution of 480p. Each 25th frame of the video isextracted and analyzed using face classificationfor gender and emotion detection (Octavio Ar-riaga, 2017).

3.2. Task and Approach

In this section, we discuss the tasks we performon the movie data extracted from Wikipedia andthe scripts. Further, we define the approach weadopt to perform individual tasks and then studythe inferences. At a broad level, we divide ouranalysis in four groups. These can be categorizedas follows-

a) At intra-sentence level - We perform thisanalysis at a sentence level where each sentenceis analyzed independently. We do not considercontext in this analysis.

b) At inter-sentence level - We perform thisanalysis at a multi-sentence level where we carrycontext from a sentence to other and then analyzethe complete information.

c) Image and Plot Mentions - We perform thisanalysis by correlating presence of genders inmovie posters and in plot mentions.

d) At Video level - We perform this analysisby doing gender and emotion detection on theframes for each video. (Octavio Arriaga, 2017)

We define different tasks corresponding to eachlevel of analysis.

3.2.1. Tasks at Intra-Sentence level

To make plots analysis ready, we used OpenIE(Fader et al., 2011) for performing co-referenceresolution on movie plot text. The co-referencedplot is used for all analyses.

The following intra-sentence analysis is per-formed

1) Cast Mentions in Movie Plot - We ex-tract mentions of male and female cast in the co-referred plot. The motivation to find mentionsis how many times males have been referred toin the plot versus how many times females havebeen referred to in the plot. This helps us iden-tify if the actress has an important role in themovie or not. In Figure 2 it is observed that,a male is mentioned around 30 times in a plot

3

Page 4: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

(a) Adjectives used with males (b) Adjectives used with females

Value

s

Verbs used for MALES

liesliesthreatensthreatens

rescuesrescuesdiesdies

realisesrealisesfindsfinds

leavesleavesproposesproposes

acceptsaccepts

savessaves

killskills

shootsshoots

beatsbeats

65 70 75 80 85 90 95

25

50

75

100

125

150

Highcharts.com

(c) Verbs used with males

Value

s

Verbs used for FEMALES

realisesrealisesfindsfinds

leavesleavesacceptsaccepts

revealsreveals

agreesagrees

marriesmarries

loveslovesexplainsexplains

molestmolest

65 67.5 70 72.5 75 77.5

25

50

75

100

125

150

Highcharts.com

(d) Verbs used with females

(e) Occupations used with males (f ) Occupations used with females

4

Page 5: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

Figure 1: Gender-wise Occupations in Bollywoodmovies

while a female is mentioned only around 15 times.Moreover, there is a consistency of this ratio from1970 to 2017(for almost 50 years)!

2) Cast Appearance in Movie Plot - Weanalyze how male cast and female cast have beenaddressed. This essentially involves extractingverbs and adjectives associated with male castand female cast. To extract verbs and adjec-tives linked to a particular cast, we use StanfordDependency Parser (De Marneffe et al., 2006).In Fig ?? and ?? we present the adjectives andverbs associated with males and females. Weobserve that, verbs like kills, shoots occur withmales while verbs like marries, loves are asso-ciated with females. Also when we look at ad-jectives, males are often represented as rich andwealthy while females are represented as beauti-ful and attractive in movie plots.

3) Cast Introductions in Movie Plot - Weanalyze how male cast and female cast have beenintroduced in the plot. We use OpenIE (Faderet al., 2011) to capture such introductions by ex-tracting relations corresponding to a cast. Fi-nally, on aggregating the relations by gender, wefind that males are generally introduced with aprofession like as a famous singer, an honest po-lice officer, a successful scientist and so on whilefemales are either introduced using physical ap-pearance like beautiful, simple looking or in rela-tion to another (male) character (daughter, sisterof). The results show that females are always as-sociated with a successful male and are not por-trayed as independent while males are portrayedto be successful.

4) Occupation as a stereotype - We per-form a study on how occupations of males and

Figure 2: Total Cast Mentions showing mentionsof male and female cast. Female mentions arepresented in pink and Male mentions in blue

females are represented. To perform this analy-sis, we collated an occupation list from multiplesources over the web comprising of 350 occupa-tions. We then extracted an associated ”noun”tag attached with cast member of the movie usingStanford Dependency Parser (De Marneffe et al.,2006) which is later matched to the available oc-cupation list. In this way, we extract occupationsfor each cast member. We group these occupa-tions for male and female cast members for allthe collated movies. Figure ?? shows the occupa-tion distribution of males and females. From thefigure it is clearly evident that, males are givenhigher level occupations than females. Figure 1presents a combined plot of percentages of maleand female having the same occupation. Thisplot shows that when it comes to occupation like”teacher” or ”student”, females are high in num-ber. But for ”lawyer” and ”doctor” the story istotally opposite.

5) Singers and Gender distribution inSoundtracks - We perform an analysis on howgender-wise distribution of singers has been vary-ing over the years. To accomplish this, we makeuse of Soundtracks data present for each movie.This data contains information about songs andtheir corresponding singers. We extracted gen-ders for each listed singer using their Wikipediapage and then aggregated the numbers of songssung by males and females over the years. InFigure 4, we report the aforementioned distribu-tion for recent years ranging from 2010-2017. Weobserve that the gender-gap is almost consistentover all these years.

Please note that currently this analysis only takesinto account the presence or absence of femalesinger in a song. If one takes into account the

5

Page 6: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

0 50 100 150 200 250 300 350 4000

100

200

300

400

500

600

700

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

-

Aligarh 2016

Haider 2014

Kaminey 2009

Kapoor and Sons 2016

Maqbool 2003

Masaan 2015

Pink 2016

Raman Raghava 2016

Udta Punjab 2016

Figure 3: Total Cast dialogues showing ratio ofmale and female dialogues. Female dialogues arepresented on X-axis and Male dialogues on Y-axis

Song

Cou

nt

Sound-track Analysis

387387

304304

360360392392 404404

321321359359

169169

226226

158158196196 202202 216216

134134158158

9393

Male Singers Female Singers

2010 2011 2012 2013 2014 2015 2016 20170

100

200

300

400

500

Highcharts.com

Figure 4: Gender-wise Distribution of singers inSoundtracks

actual part of the song sung, this trend will bemore dismal. In fact, in a recent interview 1 thisparticular hypothesis is even supported by someof the top female singers in Bollywood. In futurewe plan to use audio based gender detection tofurther quantify this.

6) Cast Dialogues and Gender Gap inMovie Scripts - We perform a sentence levelanalysis on 13 movie scripts available online. Wehave worked with PDF scripts and extractedstructured pieces of information using (Machines,

1. goo.gl/BZWjWG

Training Data

Accu

racy

Bias -Accuracy with K-Nearest Neighbors

K=1K=5K=10K=50

10% 20% 25% 30% 50%50

55

60

65

70

75

80

Highcharts.com

Figure 5: Representing variation of Accuracywith training data

2017) pipeline in the form of structured HTML.We further extract the dialogues for a corre-sponding cast and later group this informationto derive our analysis.

We first study the ratio of male and female dia-logues. In figure 3, we present a distribution ofdialogues in males and females among differentmovies. X-Axis represents number of female dia-logues and Y-Axis represents number of male di-alogues. The dotted straight line showing y = x.Farther a movie is from this line, more biasedthe movie is. In the figure 3, Raman Raghav ex-hibits least bias as the number of male dialoguesand female dialogues distribution is not skewed.As opposed to this, Kaminey shows a lot of biaswith minimal or no female dialogues.

3.2.2. Tasks at Inter-Sentence level

We analyze the Wikipedia movie data by exploit-ing plot information. This information is col-lated at inter-sentence level to generate a con-text flow using a word graph technique. Weconstruct a word graph for each sentence bytreating each word in sentence as a node, andthen draw grammatical dependencies extractedusing Stanford Dependency Parser (De Marn-effe et al., 2006) and connect the nodes in theword graph. Then using word graph for a sen-tence, we derive a knowledge graph for each castmember. The root node of knowledge graph is[CastGender, CastName] and the relations rep-resent the dependencies extracted using depen-dency parser across all sentences in the movieplot. This derivation is done by performing a

6

Page 7: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

Figure 6: Lead Cast dialogues of males and fe-males from different movie scripts

merging step where we merge all the existing de-pendencies of the cast node in all the word graphsof individual sentences. Figure 7 represents asample knowledge graph constructed using indi-vidual dependencies.

After obtaining the knowledge graph, we per-form the following analysis tasks on the data -

1. Centrality of each cast node - Central-ity for a cast is a measure of how much the casthas been focused in the plot. For this task, wecalculate between-ness centrality for cast node.Between-ness centrality for a node is number ofshortest paths that pass through the node. Wefind between-ness centrality for male and femalecast nodes and analyze the results. In Figure 8,we show male and female centrality trend acrossdifferent movies over the years. We observe thatthere is a huge gap in centrality of male and fe-male cast.

2. Study of bias using word embeddings- So far, we have looked at verbs, adjectives andrelations separately. In this analysis, we want toperform joint modeling of aforementioned. Forthis analysis, we generated word vectors usingGoogle word2vec (Mikolov et al., 2013) of length200 trained on Bollywood Movie data scrapedfrom Wikipedia. CBOW model is used for train-ing Word2vec. The knowledge graph constructedfor male and female cast for each movie containsa set of nodes connected to them. These nodesare extracted using dependency parser. We as-

0.25(a)

0.25(b)

Figure 7: Knowledge graph for Male and FemaleCast

sign a context vector to each cast member node.The context vector consists of average of wordvector of its connected nodes. As an instance,if we consider figure 7, the context vector for[M,Castname] would be average of word vectorsof (shoots, violent, scientist, beats). In this fash-ion we assign a context vector to each cast node.

The main idea behind assigning a context vectoris to analyze the differences between contexts formale and female.

We randomly divide our data into training andtesting data. We fit the training data using a K-Nearest Neighbor with varying K. We study theaccuracy results by varying samples of train and

7

Page 8: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

Figure 8: Centrality for Male and Female Cast

test data. In Figure 5, we show the accuracy val-ues for varying values of K. While studying biasusing word embeddings by constructing a con-text vector, the key point is when training datais 10%, we get almost 65%-70% accuracy, referto Figure 5. This pattern shows very high biasin our data. As we increase the training data,the accuracy also shoots up. There is a distinctdemarcation in verbs, adjectives, relations asso-ciated with males and females. Although we didan individual analysis for each of the aforemen-tioned intra-sentence level tasks, but the com-bined inter-sentence level analysis makes the ar-gument of existence of bias stronger. Note thekey point is not that the accuracy goes up as thetraining data is increased. The key point is thatsince the gender bias is high, the small trainingdata has enough information to classify correctly60-70% of the cases.

3.2.3. Movie Poster and Plot Mentions

We analyze images on Wikipedia movie pagesfor presence of males and females on publicityposters for the movie. We use Dense CAP (John-son et al., 2016) to extract male and female oc-currences by checking our results in the top 5responses having a positive confidence score.

After the male and female extraction fromposters, we analyze the male and female mentionsfrom the movie plot and co-relate them. The in-tent of this analysis is to learn how publicizing amovie is biased towards a female on advertisingmaterial like posters, and have a small or incon-sequential role in the movie.

Perce

ntag

e

Percentage of female-centric movies over the years

7.17.1 7.27.2

8.48.47.77.7

77 6.96.9

10.610.610.210.2

11.711.7 11.911.9

Percentage of female-centric movies over the years

1970-7

5

1975-1

980

1980-1

985

1985-1

990

1990-1

995

1995-2

000

2000-2

005

2005-2

010

2010-2

015

2015-2

0176

8

10

12

14

Highcharts.com

Figure 9: Percentage of female-centric moviesover the years

Figure 10: Percentage of female-centric moviesover the years

While 80% of the movie plots have more malementions than females, surprisingly more than50% movie posters feature actresses. Movies likeGangaaJal 2, Platform 3, Raees 4 have almost100+ male mentions in plot but 0 female men-tions whereas in all 3 posters females are shownon posters very prominently. Also, when we lookat Image and Plot mentions, we observe thatin 56% of the movies, female plot mentions areless than half the male plot mentions while inposters this number is around 30%. Our sys-tem detected 289 female-centric movies, wherethis stereotype is being broken. To further studythis, we plotted centrality of females and theirmentions in plots over the years for these 289movies. Figure 10 shows that both plot men-tions and female centrality in the plot exhibit

2. https://en.wikipedia.org/wiki/Gangaajal3. https://en.wikipedia.org/wiki/Platform (1993 film)4. https://en.wikipedia.org/wiki/Raees (film)

8

Page 9: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

an increasing trend which essentially means thatthere has been a considerable increase in femaleroles over the years. We also study the number offemale-centric movies to the total movies over theyears. Figure 9 shows the percentage chart andthe trend for percentage of female-centric movies.It is enlightening to see that the percentage showsa rising trend. Our system discovered at least 30movies in last three years where females play cen-tral role in plot as well as in posters. We also notethat over time such biases are decreasing - stillfar away from being neutral but the trend is en-couraging. Figure 9 shows percentage of moviesin each decade where women play more centralrole than male.

3.3. Movie Preview Analysis

We analyze all the frames extracted from themovie preview dataset and obtain information re-garding the presence/absence of a male/female inthe frame. If any person is present in the framewe then find out the emotion displayed by theperson. The emotion displayed can be one of an-gry, disgust, fear, happy, neutral, sad, surprise.Note that there can be more than one persondetected in a single frame, in that instance, emo-tions of each person is detected. We then aggre-gate the results to analyze the following tasks onthe data -

1. Screen-On Time - Figure 11 shows the per-centage distribution of screen-on time for malesand female characters in movie trailers. We see aconsistent trend across the 10 years where meanscreen-on time for females is only a meagre 31.5% compared to 68.5 % of the time for male char-acters.

2. Portrayal through Emotions - In this taskwe analyze the emotions most commonly exhib-ited by male and female characters in movie trail-ers. The most substantial difference is seen withrespect to the ”Anger” emotion. Over the 10years, anger constitutes 26.3 % of the emotionsdisplayed by male characters as compared to the14.5 % of emotions displayed by female charac-ters. Another trend which is observed, is that, fe-male characters have always been shown as morehappy than male characters every year. These re-sults correspond to the gender stereotypes whichexist in our society. We have not shown plots

Figure 11: Percentage of screen-on time for malesand females over the years

Figure 12: Year wise distribution of emotions dis-played by males and females.

for other emotions because we could not see anyproper trend exhibited by them.

4. Algorithm for Bias RemovalSystem - DeCogTeller

For this task, we take a news articles data setand train word embedding using Google word2vec(Mikolov et al., 2013). This data acts as a factdata which is used later to check for gender speci-ficity of a particular action as per the facts.Apart from interchanging the actions, we havedeveloped a specialized module to handle occupa-tions. Very often, gender bias shows in assignedoccupation { (Male, Doctor), (Female, Nurse)}or { (Male, Boss), (Female, Assistant)}.

In Figure 13 we give a holistic view of our sys-tem DeCogTeller which is described in a detailedmanner as follows

9

Page 10: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

Figure 13: DeCogTeller- Bias Removal System

I) Data Pre-processing - We first performdata pre-processing of the words in fact data anddo the following operations -

(a) used Wordnet to look-up if the word presentin fact data is present in Wordnet(Miller, 1995)or not. If it was not present in Wordnet, the wordwas simply removed.

(b) used Stanford stemmer to stem the wordsso that the words like modern, modernized etc.don’t form different vectors.

II) Generating word vectors - After wehave the pre-processed list of words from factdata, we train Google word2vec and generateword embedding from this data. We do a sim-ilar operation on biased data which in our case ismovies data from Bollywood.

III) Extraction of analogical pairs - Thenext task is to find analogical pairs from fact datawhich are analogous to the (man,woman) pair.As an instance, if we take an analogical word pair(x, y) and we associate a vector P (x, y) to thepair , then the task is to findP (x, y) = (vec[man] − vec[woman]) − (vec[x] −vec[y])Here, in the above equation we replace man andwoman vectors by he and she, respectively. Theabove equation becomesP (x, y) = (vec[he]− vec[she])− (vec[x]− vec[y])The main intent of this operation is to cap-ture word pairs such as doctor or nurse wherein most of the data, doctor is close to he andnurse is closer to she. Therefore for (x, y) =(doctor, nurse),P (doctor, nurse) is given by (vec[he]−vec[she])−(vec[doctor] − vec[nurse]). Another example of(x, y) found in our data is (king, queen). We gen-

erate all such (x, y) pairs and store them in ourknowledge base. To have refined pairs, we useda scoring mechanism to filter important pairs. If

‖P(x, y)‖ ≤ τ

where τ is the threshold parameter, thenadd the word pair to knowledge base other-wise ignore. Equivalently, after normalizing(vector[he] − vector[she]) and (vec[x] − vec[y]),we calculated cosine distance as cosine(vec[he]−vec[she], vec[x] − vec[y]) which is algebraicallyequivalent to the above inequality.

IV) Classifying word pairs - After we iden-tify analogical pairs, we observe that the degreeof bias is still not known in each pair. So, weneed to classify word pairs as specific to a gen-der or neutral to the gender. For example, Con-sider a word pair (doctor, nurse), we know thatwhether male or female anyone can be a doctoror a nurse. Hence we call such a pair as genderneutral. On the contrary, if we consider a wordpair (king, queen), we know that king is associ-ated with a male while queen is associated froma female. We call such word pairs as gender spe-cific. Now, the task is to first find out which pairsextracted in the above step correspond to genderneutral and which ones correspond to gender spe-cific. To do this, we first extract the words fromknowledge base extracted from biased data andfind how close they are to different genders. Fora word w, we calculate cosine score of w with heas cos(w, he). If w is very close to he, then it isspecific to a man. Similarly for a word w′, we dothe similar operation for she. And if w′ is veryclose to she, then it is specific to a woman. If aword w′′ is almost equidistant from he and she,then it is labelled as gender neutral.

V) Action Extraction from Biased MovieData - After we have gender specific and gen-der neutral words from the fact data, we work onthe biased data to extract actions associated withmovie cast. We extract gender for movie cast bycrawling the corresponding Wikipedia pages foractors and actresses. After we have the corre-sponding gender for each cast in the movie, weperform co-referencing on the movie plot usingStanford OpenIE (Fader et al., 2011). Next, wecollate actions corresponding to each cast usingIBM NLU API (Machines, 2017) and SemanticRole Labeler by UIUC (Punyakanok et al., 2008).

10

Page 11: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

VI) Bias detection using Actions - At thispoint we have the actions extracted from biaseddata corresponding to each gender. We can nowuse this data against fact data to check for bias.We will describe in the following system walk-through section how we use it on-the-fly to checkfor bias.

VII) Bias Removal - We construct a knowl-edge graph for each cast using relations fromStanford dependency parser. We use this graphto calculate the between-ness centrality for eachcast and store these centrality scores in a knowl-edge base. We use the between-ness centralityscore to interchange genders after we detect thebias.

5. Walk-through using an example

The system DeCogTeller takes in a text inputfrom the user. The user starts entering a bi-ased movie plot text for a movie, say, “Kaho naPyar Hai” in Figure 14. This natural languagetext is submitted into the system in which, first,the text is co-referenced using OpenIE. Then, us-ing IBM NLU API and UIUC Semantic RoleLabeller actions pertaining to each cast are ex-tracted and these are checked with gender specificand gender neutral lists. If for a correspondingcast gender,action pair the corresponding vectoris located in gender specific list then it can notbe termed as a biased action. But on the otherhand if a cast gender,action pair occurring in theplot is not found in gender-specific but the oppo-site gender is found in gender-neutral list, thenwe tag the statement as a biased statement.

As an example text if the user enters - “Ro-hit is an aspiring singer who works as a salesmanin a car showroom, run by Malik (Dalip Tahil).One day he meets Sonia Saxena (Ameesha Patel),daughter of Mr. Saxena (Anupam Kher), whenhe goes to deliver a car to her home as her birth-day present.” At the very fist step, co-referencingis done which coverts the above text to - “Rohitis an aspiring singer who works as a salesman in acar showroom, run by Malik (Dalip Tahil). Oneday Rohit meets Sonia Saxena (Ameesha Patel),daughter of Mr. Saxena (Anupam Kher), whenRohit goes to deliver a car to her home as herbirthday present.” After this step, we extract ac-tions corresponding to each cast and then checkfor bias. Here corresponding to cast Rohit we

have the following actions - {singer, salesman,meets, deliver}. The gender for Rohit is detectedby using wiki page of Hritik Roshan and is la-belled as “male”. We find actions correspond-ing to cast Sonia and find the following actions-{daughter-of}. Then we run our gender-specificand gender neutral checks and find that the ac-tions are gender neutral. Hence there is a biasthat exists. We do the similar thing for othercast members. Then, at the background, we ex-tract highest centrality male and highest central-ity female. And then switch their gender to gen-erate de-biased plot. Figure 15 shows the de-biased plot. Also, there is an option given to theuser to view the knowledge graphs for biased textand unbiased text to see how nodes in knowledgegraph change.

6. Discussion and Ongoing Work

While our analysis points towards the presence ofgender bias in Hindi movies, it is gratifying to seethat the same analysis was able to discover theslow but steady change in gender stereotypes.

We would also like to point out that the goalof this study is not to criticize one particular do-main. Gender bias is pervasive in all walks of lifeincluding but not limited to the EntertainmentIndustry, Technology Companies, ManufacturingFactories & Academia. In many cases, the biasis so deep rooted that it has become the norm.We truly believe that the majority of people dis-playing gender bias do it unconsciously. We hopethat ours and more such studies will help peoplerealize when such biases start to influence everyday activities, communications & writings in anunconscious manner, and take corrective actionsto rectify the same. Towards that goal, we arebuilding a system which can re-write stories ina gender neutral fashion. To start with we arefocusing on two tasks:

a) Removing Occupation Hierarchy : Itis common in movies, novel & pictorial depictionto show man as boss, doctor, pilot and womenas secretary, nurse and stewardess. In this work,we presented occupation detection. We are ex-tending this to understand hierarchy and thenevaluate if changing genders makes sense or not.For example, while interchanging ({male, doc-tor}, {female, nurse}) to ({male, nurse}, {female,doctor}) makes sense but interchanging {male,

11

Page 12: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

Figure 14: The screen where a user can enter the text

0.9(a)

Figure 15: The screen where text is debiased and the knowledge graph can be visualized

gangster} to {female, gangster} may be a bit un-realistic.

b) Removing Gender Bias from plots:The question we are trying to answer is ”Ifone interchanges all males and females, is theplot/story still possible or plausible?”. For ex-ample, consider a line in plot ”She gave birth totwins”, of course changing this from she to he

leads to impossibility. Similarly, there could bepossible scenarios but may be implausible like thegangster example in previous paragraph.

Solving these problems would require develop-ment of novel text algorithms, ontology construc-tion, fact (possibility) checkers and implausibil-ity checkers. We believe it presents a challenging

12

Page 13: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

research agenda while drawing attention to animportant societal problem.

7. Conclusion

This paper presents an analysis study which aimsto extract existing gender stereotypes and biasesfrom Wikipedia Bollywood movie data contain-ing 4000 movies. The analysis is performed atsentence at multi-sentence level and uses wordembeddings by adding context vector and study-ing the bias in data. We observed that while ana-lyzing occupations for males and females, higherlevel roles are designated to males while lowerlevel roles are designated to females. A similartrend has been exhibited for centrality where fe-males were less central in the plot vs their malecounterparts. Also, while predicting gender us-ing context word vectors, with very small train-ing data, a very high accuracy is observed in gen-der prediction for test data reflecting a substan-tial amount of bias present in the data. We usethis rich information extracted from Wikipediamovies to study the dynamics of the data and tofurther define new ways of removing such biasespresent in the data.

Furthermore, we present an algorithm to re-move such bias present in text. We show that byinterchanging the gender of high centrality malecharacter with a high centrality female characterin the plot text leaves no change in the story butde-biases it completely.

As a part of future work, we aim to extractsummaries from this data which are bias-free. Inthis way, the next generations would stop inher-iting bias from previous generations. While theexistence of gender bias and stereotype is experi-enced by viewers of Hindi movies, to the best ofour knowledge this is first study to use computa-tional tools to quantify and trend such biases.

References

Hanah Anderson and Matt Daniels.https://pudding.cool/2017/03/film-dialogue/.2017.

Molly Carnes, Patricia G Devine, Linda BaierManwell, Angela Byars-Winston, Eve Fine,Cecilia E Ford, Patrick Forscher, Carol Isaac,Anna Kaatz, Wairimu Magua, et al. Effect of

an intervention to break the gender bias habitfor faculty at one institution: a cluster ran-domized, controlled trial. Academic medicine:journal of the Association of American MedicalColleges, 90(2):221, 2015.

Marie-Catherine De Marneffe, Bill MacCartney,Christopher D Manning, et al. Generatingtyped dependency parses from phrase struc-ture parses. In Proceedings of LREC, volume 6,pages 449–454. Genoa Italy, 2006.

Frank Dobbin and Jiwook Jung. Corporate boardgender diversity and stock performance: Thecompetence gap or institutional investor bias?2012.

Anthony Fader, Stephen Soderland, and OrenEtzioni. Identifying relations for open infor-mation extraction. In Proceedings of the Con-ference on Empirical Methods in Natural Lan-guage Processing, pages 1535–1545. Associa-tion for Computational Linguistics, 2011.

Ethan Fast, Tina Vachovsky, and Michael SBernstein. Shirtless and dangerous: Quantify-ing linguistic signals of gender bias in an onlinefiction writing community. In ICWSM, pages112–120, 2016.

Andrew T Fiore, Lindsay Shaw Taylor, Gerald AMendelsohn, and Marti Hearst. Assessing at-tractiveness in online dating profiles. In Pro-ceedings of the SIGCHI conference on humanfactors in computing systems, pages 797–806.ACM, 2008.

Angela M Gooden and Mark A Gooden. Gen-der representation in notable children’s picturebooks: 1995–1999. Sex roles, 45(1-2):89–101,2001.

Justin Johnson, Andrej Karpathy, and Li Fei-Fei.Densecap: Fully convolutional localization net-works for dense captioning. In Proceedings ofthe IEEE Conference on Computer Vision andPattern Recognition, pages 4565–4574, 2016.

Matthew Kay, Cynthia Matuszek, and Sean AMunson. Unequal representation and genderstereotypes in image search results for occupa-tions. In Proceedings of the 33rd Annual ACMConference on Human Factors in ComputingSystems, pages 3819–3828. ACM, 2015.

13

Page 14: Analyze, Detect and Remove Gender Stereotyping from Bollywood …proceedings.mlr.press/v81/madaan18a/madaan18a.pdf · Analyze, Detect and Remove Gender Stereotyping from Bollywood

Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

International Business Machines.https://www.ibm.com/watson/developercloud/developer-tools.html. 2017.

Lillian MacNell, Adam Driscoll, and Andrea NHunt. What’s in a name: Exposing genderbias in student ratings of teaching. InnovativeHigher Education, 40(4):291, 2015.

Tomas Mikolov, Kai Chen, Greg Corrado, andJeffrey Dean. Efficient estimation of word rep-resentations in vector space. arXiv preprintarXiv:1301.3781, 2013.

Brett Millar. Selective hearing: gender bias in themusic preferences of young adults. Psychologyof music, 36(4):429–445, 2008.

George A Miller. Wordnet: a lexical database forenglish. Communications of the ACM, 38(11):39–41, 1995.

Paul G. Ploger Octavio Arriaga. Real-time con-volutional neural networks for emotion andgender classification. 2017.

Jahna Otterbacher. Linguistic bias in collabo-ratively produced biographies: crowdsourcingsocial stereotypes? In ICWSM, pages 298–307,2015.

Vasin Punyakanok, Dan Roth, and Wen-tau Yih.The importance of syntactic parsing and infer-ence in semantic role labeling. ComputationalLinguistics, 34(2):257–287, 2008.

Jessica Rose, Susan Mackey-Kallis, Len Shyles,Kelly Barry, Danielle Biagini, Colleen Hart,and Lauren Jack. Face it: The impact of gen-der on social media images. CommunicationQuarterly, 60(5):588–607, 2012.

TG Saji. Gender bias in corporate leadership: Acomparison between indian and global firms.Effective Executive, 19(4):27, 2016.

Sophie Soklaridis, Ayelet Kuper, Cynthia White-head, Genevieve Ferguson, Valerie Taylor, andCatherine Zahn. Gender bias in hospital lead-ership: a qualitative study on the experiencesof women ceos. Journal of Health Organizationand Management, 31(2), 2017.

Josh Terrell, Andrew Kofink, Justin Middleton,Clarissa Rainear, Emerson Murphy-Hill, Chris

Parnin, and Jon Stallings. Gender differencesand bias in open source: Pull request accep-tance of women versus men. PeerJ ComputerScience, 3:e111, 2017.

14