Masking and BERT-based Models for Stereotype Identi cation

Masking and BERT-based Models forStereotype Identification

Modelos Basados en Enmascaramiento y en BERT para laIdentificacion de Estereotipos

Javier Sanchez-Junquera1, Paolo Rosso1, Manuel Montes-y-Gomez2, Berta Chulvi1

1PRHLT Research Center, Universitat Politecnica de Valencia, Valencia, Spain;2Laboratorio de Tecnologıas del Lenguaje, Instituto Nacional de Astrofısica,

Optica y Electronica, Puebla, Mexico{juasanj3@doctor.; prosso@dsic.; berta.chulvi@}upv.es; [email protected]

Abstract: Stereotypes about immigrants are a type of social bias increasinglypresent in the human interaction in social networks and political speeches. Thischallenging task is being studied by computational linguistics because of the rise ofhate messages, offensive language, and discrimination that many people receive. Inthis work, we propose to identify stereotypes about immigrants using two differentexplainable approaches: a deep learning model based on Transformers; and a textmasking technique that has been recognized by its capabilities to deliver good andhuman-understandable results. Finally, we show the suitability of the two modelsfor the task and offer some examples of their advantages in terms of explainability.Keywords: social bias, immigrant stereotypes, BETO, masking technique.

Resumen: Los estereotipos sobre inmigrantes son un tipo de sesgo social cada vezmas presente en la interaccion humana en redes sociales y en los discursos polıticos.Esta desafiante tarea esta siendo estudiada por la linguıstica computacional debido alaumento de los mensajes de odio, el lenguaje ofensivo, y la discriminacion que recibenmuchas personas. En este trabajo, nos proponemos identificar estereotipos sobreinmigrantes utilizando dos enfoques diametralmente opuestos prestando atenciona la explicabilidad de los mismos: un modelo de aprendizaje profundo basado enTransformers; y una tecnica de enmascaramiento de texto que ha sido reconocidapor su capacidad para ofrecer buenos resultados a la vez que comprensibles para loshumanos. Finalmente, mostramos la idoneidad de los dos modelos para la tarea, yofrecemos algunos ejemplos de sus ventajas en terminos de explicabilidad.Palabras clave: sesgo social, estereotipos hacia inmigrantes, BETO, tecnica deenmascaramiento.

1 Introduction

Nowadays, social media, political speeches,newspapers, among others, have a strong im-pact on how people perceive reality. Very of-ten, the information consumers are not awareof how biased is what they are exposed to.To mitigate this situation, many computa-tional linguistics efforts have been made todetect social bias such as gender and racialbiases (Bolukbasi et al., 2016; Garg et al.,

2018; Liang et al., 2020; Dev et al., 2020).The immigrant stereotype is another type ofsocial bias that is present when a messageabout immigrants disregards the great diver-sity of this group of people and highlights asmall set of their characteristics. This pro-cess of homogenization of a whole group ofpeople is at the very heart of the stereotypeconcept (Tajfel, Sheikh, and Gardner, 1964).As (Lipmann, 1922) said in his seminal work

Procesamiento del Lenguaje Natural, Revista nº 67, septiembre de 2021, pp. 83-94 recibido 06-05-2021 revisado 08-06-2021 aceptado 10-06-2021

ISSN 1135-5948. DOI 10.26342/2021-67-7 © 2021 Sociedad Española para el Procesamiento del Lenguaje Natural

about stereotypes, stereotyping, as a cogni-tive process, occurs because “we do not firstsee and then define, we define first and thensee”. In short, we can say that a stereotype isbeing used in language when a whole groupof people, itself very diverse, is representedby appealing to a few characteristics.

Unfortunately, the use of stereotypes pro-motes undesirable behaviors among peoplefrom different nationalities; an example is theviolence against Asian Americans that havetaken place recently (Tessler, Choi, and Kao,2020). Moreover, political analysts have as-sociated the success of anti-immigration par-ties with the even more negative attitudesto the immigration phenomenon (Dennisonand Geddes, 2019). These stereotypes havereceived little attention to be automaticallyidentified, despite the harmful consequencesthat prejudices and attitudes, in many casesnegative, may have.

There have been some works related tothe problem of immigrant stereotypes iden-tification (Sanguinetti et al., 2018), but theyare mainly focused on the expressions of hatespeech; or social bias in general that involvesracism (Nadeem, Bethke, and Reddy, 2020;Fokkens et al., 2018). However, it is nec-essary to have a whole view of immigrantstereotypes, taking into account both posi-tive and negative beliefs, and also the vari-ability in which stereotypes are reflected intexts. This more refined analysis of stereo-types would make it possible to detect themnot only in clearly dogmatic or violent mes-sages, but also in other more formal and sub-tle texts such as news, institutional state-ments, or political representatives’ speechesin parliamentary debates.

Similar to applications of healthcare, se-curity, and social analysis, in this task isnot enough to achieve high results, but itis also mandatory that results could be un-derstood or interpreted by human experts onthe domain of study (e.g., social psycholo-gists). Taking into account these two as-pects, performance and explainability, theobjective of this work is to compare two ap-proaches diametrically opposite to each otherin the text classification state of the art. Onthe one hand, a transformer-based model,which has shown outstanding performance,but high complexity and poor explainabil-ity; and, on the other hand, a masking-basedmodel, which requires fewer computational-

resources and showed a good performance inrelated tasks like profiling.

We aim to find explainable predictions ofimmigrant stereotypes with BETO by usingits attention mechanism. In this way, we de-rive the explanation by investigating the im-portance scores of different features used tooutput the final prediction. With the otherapproach that we use, the masking technique(Stamatatos, 2017; Sanchez-Junquera et al.,2020), it is possible to know what are themost important words that the model pre-ferred to highlight. We compare these ap-proaches using a dataset of texts in Span-ish, which contains annotated fragments ofpolitical speeches from Spanish Congress ofDeputies.

The research questions aim to answer inthis work are:

RQ1: Is the transformer more effective thanthe masking technique at identifyingstereotypes about immigrants?

RQ2: Is it possible to obtain local explana-tions on the predictions of the models, toallow human interpretability about theimmigrant stereotypes?

The rest of the paper is organized as fol-low: Section 2 presents related work concern-ing the immigration stereotype detection andthe models that we propose. Section 3 de-scribes the two models, and Section 4 thedataset used in the experiments. Sections 5and 6 contain the experimental settings andthe discussion about the results. Finally, weconclude the work in Section 7 where we men-tion also future directions.

2 Related Work

2.1 Immigrant StereotypeDetection

There have been attempts to study stereo-types from a computational point of view,such as gender, racial, religion, and ethnicbias detection do (Garg et al., 2018; Boluk-basi et al., 2016; Liang et al., 2020). Thoseworks predefine two opposite categories (e.g.,men vs. women) and use word embeddingsto detect the words that tend to be moreassociated with one of the categories thanwith the other. In (Nadeem, Bethke, andReddy, 2020), the authors propose two dif-ferent level tests for measuring bias. First,

Javier Sánchez-Junquera, Paolo Rosso, Manuel Montes-y-Gómez, Berta Chulvi

84

the intra-sentence test, with a sentence de-scribing the target group and a set of threeattributes that correspond to a stereotype, ananti-stereotype, and a neutral option. Sec-ond, the inter-sentence test, with a sentencecontaining the target group; a sentence con-taining a stereotypical attribute of the tar-get group; another sentence with an anti-stereotypical attribute; and lastly, a neutralsentence. These tests are similar to the ideaof (Dev et al., 2020) that consist in usingnatural language inference to measure entail-ment, contradiction, or neutral inferences toquantify the bias. To evaluate their proposal,in (Nadeem, Bethke, and Reddy, 2020), theauthors collected a dataset (StereoSet) formeasuring bias related to gender, profession,race, and religion domains.

On the other hand, stereotypes are not al-ways the (explicit) association of words (seenas attributes or characteristics) from two op-posite social groups, like women vs. men inthe context of gender bias. Such is the caseof immigrant stereotypes, in which sentenceslike ¿Por que ha muerto una persona joven?(Why did a young person die?) do not con-tain an attribute of the immigrant group al-though from its context1 it is possible to con-clude that here immigrants are placed as vic-tims of suffering. Also, it is not clear therepresentative word of the social group, sincepersona joven (young person) is neutral toimmigrants and non-immigrants.

Other works have built annotated datato foster the development of supervised ap-proaches. In (Sanguinetti et al., 2018), waspresented an Italian corpus focused on hatespeech against immigrants, which includesannotations about whether a tweet is astereotype or not. This corpus was usedin the HaSpeeDe shared task at EVALITA2020 (Sanguinetti et al., 2020). Most partic-ipant teams only adapted their hate speechmodels to the stereotype identification task,thus, representing (and reducing) stereo-types to characteristics of hate speech. Oneof the conclusions was that the immigra-tion stereotype appeared as a more subtlephenomenon, which also needs to be ap-proached as non-hurtful text. Additionally,in (Nadeem, Bethke, and Reddy, 2020), it

1Fragment of a political speech from a PopularParliamentary Group politician in 2006. The speakeris mentioning some of the conditions of immigrantsin Spain in that period.

was proposed a dataset that includes the do-main of racism (additionally to gender, reli-gion, and profession). Although this datasetdoes not focus on the study of stereotypesabout immigrants, its authors reported theword “immigrate” as one of the most rele-vant keywords that characterized the racismdomain.

2.2 On the explainability of AImodels

Since eXplainable Artificial Intelligence(XAI) systems have become an integral partof many real-world applications, there is anincreasing number of XAI approaches (Islamet al., 2021) including white and black boxes.The first group, which includes decision trees,hidden Markov models, logistic regressions,and other machine learning algorithms, areinherently explainable; whereas, the secondgroup, which includes deep learning models,are less explainable (Danilevsky et al., 2020).XAI has been characterized according todifferent aspects, for example, (i) by thethe level of the explainability, for eachsingle prediction (local explanation) or themodel’s prediction process as a whole (globalexplanation); (ii) and if the explanationrequires post-processing (post-hoc) or not(self-explaining).

XAI has also been characterized in accor-dance to the source of the explanations, forexample: (i) surrogate models, in which themodel predictions are explained by learninga second model as a proxy, such is the case ofLIME (Ribeiro, Singh, and Guestrin, 2016);(ii) example-driven, in which the predictionof an input instance is explained by identify-ing other (labeled) instances that are seman-tically similar (Croce, Rossini, and Basili,2019); (iii) attention layers, which appeal tohuman intuition and help to indicate wherethe neural network model is “focusing”; and(iv) feature importance, in which the rele-vance scores of different features are used tooutput the final prediction (Danilevsky et al.,2020).

Taking into account this characterization,we frame our approach in the self-explainingscope, and consider two different models toobtain local explanations of the predictedtexts. In this sense, we use the attention lay-ers which have been commonly applied by lo-cal self-explaining models (Mullenbach et al.,2018; Bodria et al., 2020). For example, in

Masking and BERT-based Models for Stereotype Identication

85

(Mathew et al., 2020) the attention weightswere used to compare the posts’ segments onwhich the labeling decision was based, high-lighting the tokens that the models foundthe most relevant. Similarly, in (Clark etal., 2019) the authors used datasets for taskslike dependency parsing, to evaluate atten-tion heads of BERT, and found relevant lin-guistic knowledge in the hidden states and at-tention maps, such as direct objects of verbs,determiners of nouns, and objects of preposi-tions. Finally, in (Jarquın-Vasquez, Montes-y-Gomez, and Villasenor-Pineda, 2020) at-tention was used to prove that some swearwords are inherently offensive, whilst othersare not, since their interpretation depends ontheir context.

The other self-explaining model that weuse to obtain the local explanations, is amasking technique which can be describedas a white box. In this case, the explain-able strategy is based on the feature impor-tance idea, by measuring and observing therelevant words used in its masking process.The masking technique used in this work in-corporates an additional way to explain de-cisions (Stamatatos, 2017; Granados et al.,2011; Sanchez-Junquera et al., 2020). It al-lows highlighting content and style informa-tion from texts, by masking a predefined andtask-oriented set of irrelevant words.

3 Models

In this section, we briefly describe the twomodels that we use in our experiments.

BETO: it is based on BERT, but itwas pre-trained exclusively on a big Spanishdataset (Canete et al., 2020). The frameworkof BETO consists of two steps: pre-trainingand fine-tuning, similar to BERT (Devlin etal., 2018). For the pre-training, the collecteddata included Wikipedia and other Spanishsources such as United Nations and Govern-ment journals, TED Talks, Subtitles, NewsStories among others. The model has 12 self-attention layers with 16 attention-heads each,and uses 1024 as hidden size, with a total of110M parameters. The vocabulary contains32K tokens.

For fine-tuning, the model is first initial-ized with the pre-trained parameters, and allof the parameters are fine-tuned using la-beled data from the downstream task, whichin our case is a stereotype-annotated dataset(see Section 4). The first token of every se-

quence is always a special classification token([CLS]), which is used as the aggregate se-quence representation for classification tasks.In our work, we add to the [CLS] representa-tion two dense layers and a Softmax functionto obtain the binary classification.

The masking technique: it consists oftransforming the original texts to a distortedform where the textual structure is main-tained while irrelevant words are masked, i.e.,replaced by a neutral symbol. The irrelevantterms are task-dependent and have to be de-fined in advance, following some frequencycriteria or the expert’s intuition.

The masking technique replaces each termt of the original text by a sequence of “*”.The length of the sequence is determinedby the number of characters that t contains.One example of this is shown in Figure 1considering the Spanish stopwords. In Sec-tion 5.1, we explain which are the relevantwords that we considered better to mask.

After all texts are distorted by the mask-ing technique, we use a traditional classifierto be compared with BETO. In our experi-ments, we use Logistic Regression (LR) clas-sifier which has been used before to be com-pared with BERT (Alaparthi and Mishra,2021).

4 Dataset

We use the StereoImmigrants dataset2 foridentifying stereotypes about immigrants(Sanchez-Junquera et al., 2021). In thisprevious work we collected texts on immi-grant stereotypes from the political speechesof the ParlSpeech V2 dataset (Rauh andSchwalbach, 2020); from where we also ex-tracted the negative examples (labeled asNon-stereotypes). These texts are extractedfrom the speeches of the Spanish Congress ofDeputies (Congreso de los Diputados), andare written in Spanish.

In the construction of StereoImmigrants(Sanchez-Junquera et al., 2021), we proposeda new approach to the study of immigrantstereotyping elaborating a taxonomy to an-notate the corpus that covers the whole spec-trum of beliefs that make up the immigrantstereotype. The novelty of this taxonomy andthis annotation process is that the work hasnot focused on the characteristics attributedto the group but on the narrative contexts in

2https://github.com/jjsjunquera/StereoImmigrants.


86

Original textla inmigracion sigue siendo hoy - lo confirman los ultimos sondeos del CIS -

el principal problema que preocupa a los ciudadanos del estado

(Immigration is still today - confirmed by the latest CIS polls -the main problem that worries the citizens of the state)

Masking stopwords** inmigracion sigue siendo hoy ** confirman *** ultimos sondeos *** cis

** principal problema *** preocupa * *** ciudadanos ** *** ******Masking the non-stopwords

la *********** ***** ****** *** lo ********* los ******* ******* del ***el ********* ******** que ******** los ********** del estado

Figure 1: Example of masking the stopwords, or keeping only the stopwords unmasked.

which the immigrant group is repetitively sit-uated in the public discourses of politicians.To do this, the authors applied the frame the-ory –a social psychology theory– to the studyof stereotypes. The frame theory allows usto show that politicians in their speeches cre-ate and recreate different frames (Scheufele,2006), i.e. different scenarios, where theyplace the group. The result of this rhetori-cal activity of framing ends with the creationof a stereotype: a diverse group is seen onlywith the characteristics of the main actor ina particular scenario.

In (Sanchez-Junquera et al., 2021), weidentify different frames used to speak aboutimmigrants that could be classified in one ofthe following categories: (i) present the im-migrants as equals to the majority but thetarget of xenophobia (i.e., they must have thesame rights and same duties but are discrim-inated), (ii) as victims (e.g., they are peoplesuffering from poverty or labor exploitation),(iii) as an economic resource (i.e., they areworkers that contribute to economic devel-opment), (iv) as a threat for the group (i.e.,they are the cause of disorder because theyare illegal, too many, and introduce unbal-ances in societies), or (v) as a threat for theindividual (i.e., they are competitors for lim-ited resources or a danger to personal wel-fare and safety). In the construction of theStereoImmigrants dataset, an expert in prej-udice from the social psychology area anno-tated manually the sentences at the finestgranularity of the taxonomy and selected alsonegatives examples where politicians speakabout immigration but do not refer, explic-itly or implicitly, to the people that integratesthe group “immigrants”. After this expertannotation, five non-experts annotators readthe label assigned by the expert to each sen-tence and decided if they agreed with it orconsidered that another label from the tax-

Label Length TextsStereotype 45.62 ± 24.69 1673

3635Non-stereotype 36.00 ± 21.17 1962

Victims 48.93 ± 27.5 7431479

Threat 45.84 ± 24.42 736

Table 1: Distribution of texts per label andthe average length (with standard deviation)of their instances. The texts labeled as Vic-tims or Threat are a subset of the texts la-beled as Stereotype.

onomy was better suited for this sentence.The dataset only contains sentences whereat least three annotators agreed on the samecategory.

In (Sanchez-Junquera et al., 2021), at-tending to a second annotation of the atti-tudes that each sentence expresses, we pro-posed two supra-categories of the stereotypesannotated as Victims or Threat, where thecategories (i) and (ii) belong to the Victimssupra-category, and (iv) and (v) belong tothe Threat supra-category. Table 1 shows thedistribution per label of the dataset.

Table 2 shows examples of Non-stereotypes and Stereotypes labels. TheStereotypes examples specify if they werelabeled as Victims or Threat. From these ex-amples, it is possible to see that the datasetcontains stereotypes that are not merely theassociation of attributes or characteristicsto the group, but texts which reflect biasedrepresentations of the group (i.e., howthe immigrants are indirectly perceived orassociated with specific situations and socialissues).

5 Experimental Settings

We applied a 10-fold cross-validation proce-dure and reported our results in terms of F-measure.

For BETO, we searched the following hy-


87

Non-stereotypeNo nos vale que se contabilice todo lo que se dedica a inmigracion porque no estamos hablando de lo mismo.

(We are not worth accounting for everything that is dedicated to immigration because we are not talking about the same thing.)El Gobierno esta desbordado por la inmigracion, por su polıtica improvisada, irresponsable, descoordinada y unilateral.

(The Government is overwhelmed by immigration, by its improvised, irresponsible, uncoordinated and unilateral policy.)

Stereotype: Victims¿Por que ha muerto una persona joven?

(Why did a young person die?)Hay una situacion de desamparo en muchas personas a la que necesitamos dar una solucion.

(There is a helplessness situation in many people to which we need to provide a solution.)

Stereotype: ThreatEspana hoy esta desbordada con la inmigracion ilegal.

(Spain today is overwhelmed with illegal immigration.)Esta alarmante situacion, agravada por la incapacidad del Gobierno socialista, ha producido el colapso, el desbordamiento

de los servicios humanitarios, judiciales y policiales que han generado una gran alarma social.

(This alarming situation, aggravated by the incapacity of the socialist government, has produced the collapse, the overflowof humanitarian, judicial and police services that have generated great social alarm.)

Table 2: Examples from each label of the dataset.

perparameter grid to obtain the results ofBETO: learning rate ∈ {0.01,3e-5}; thebatch size ∈ {16,34}; and the optimizer∈{adam, rmsprop} (in bold we highlightedthe optimal hyperparameter values). More-over, we applied a dropout value of 0.3 to thelast dense layer. We have selected a valueof 180 for the max length hyperparameteraccording to the maximum length of all thetexts in the dataset. The model was fine-tuned for 10 epochs on the training data foreach task.

For the masking-based approach, we usedthe sklearn implementation of the LR classi-fier. All the parameters were taken by de-fault, except for the optimization method:we selected newton-cg. The model used thebag of words representation, using the tfidfterm weighting. We tested with unigrams,bigrams, and trigrams of words; and withcharacters n-grams (n∈ {3, 4, 5, 6}) obtainingthe better results with character 4-grams.When we used LR with the original texts, un-igrams of words achieved better results thancharacter n-grams.

5.1 Unmasking Stereotypes

Related works on social bias detection havefound a list of words that tend to be associ-ated with one of two opposite social groups(e.g., female vs. male, Asian vs Hispanicpeople) (Bolukbasi et al., 2016; Garg et al.,2018). In the immigrant stereotypes case, itis particularly difficult to define two oppositegroups and consequently to find such biasedwords. In this paper, we use the dataset de-

scribed in Section 4 to find which could bethe most relevant terms to be used in themasking process.

Intuitively, in the immigrant stereotypes’context, the relevant words could be content-related, although style-related terms likefunction words could play also an interest-ing role. After preliminary experiments, wefound higher results by masking the wordsout of the following lists: (i) the words withhigher relative frequency (RelFreq), i.e., thek words with a frequency in one class remark-ably higher than its frequency in the oppositeclass; and (ii) the k words with the high-est absolute frequency (AbsFreq) in all thecollection, excluding stopwords (i.e., stop-words were masked). In our experiments weachieved better results with k = 1000.

Each list was computed using the corre-sponding set of texts depending on the classi-fication task: Stereotype vs. Non-stereotype,or Victims vs. Threat. The information thatis kept unmasked corresponds to the content-related words.

6 Results and Discussion

We report the results of the models in Ta-ble 3. It is possible to see high results ofLR with the original texts. However, weobserve that masking the terms out of thelist RelFreq is slightly better than using theoriginal text. These results suggest that themasking technique improves the quality ofthe stereotype detection and its dimensions.

In comparison with AbsFreq, maintain-


88

S/N V/TOriginal text 0.82 0.79Masking Technique with AbsFreq 0.79 0.75Masking Technique with RelFreq 0.84 0.81BETO 0.86 0.83

Table 3: F-measure in both classificationtasks: Stereotype vs. Non-stereotype (S/N),and Victims vs. Threat (V/T).

ing unmasked the RelFreq words helps toignore more words that are less discrimina-tive for classification tasks. This could be ex-plained because AbsFreq includes words sim-ilarly frequent in both classes, which couldnot help at predicting immigrant stereotypes:paıses (countries), gobierno (government),senor (mister), partido (party); or at iden-tifying the immigrant-stereotype dimension:fronteras (frontiers), polıtica (politic), se-guridad (security), grupo (group). Table 4shows examples of words included in RelFreqthat are indeed reflecting some bias accord-ing to the category. For example, it isnot surprising to find words like derechos(rights), humanos (human3 or humans), po-breza (poverty), muerto (dead), and ham-bre (hunger) more associated to immigrantsseen as victims; and words like irregular (ir-regular), ilegal (illegal), regularizacion (reg-ularization), masiva(massive), and problema(problem), more used in speeches where im-migrants are seen as collective or personalthreat.

BETO achieves the highest results in bothclassification tasks (RQ1). This is not sur-prising because the transformer-based mod-els are known for their properties at captur-ing semantic and syntactic information, andricher patterns in which the context of thewords are taken into account. However, wedo not observe a significant difference be-tween the results of such a resource-hungrymodel, and the combination of the mask-ing technique with the traditional LR classi-fier. Considering the computational capabil-ities that BETO demands, and the less com-plexity of the masking technique, the lattershows a better trade-off between effectivenessand efficiency than the latter.

6.1 Discriminating Words

Motivated by the similar results of BETOand the masking technique, we wanted to ob-

3It refers to the adjective: human rights.

Stereotype Non-stereotype Victims Threatpersonas polıtica derechos inmigracioncanarias europea personas canariasderechos union humanos gobiernoproblema materia derecho irregularpaıs grupo mujeres ilegalesirregular polıticas paıses irregularessituacion consejo pobreza regularizacionregularizacion cooperacion integracion ilegalilegales gobierno mundo espanahumanos mocion vida problemaciudadanos europeo solidaridad procesoirregulares ley asilo masivaefecto parlamentario condiciones llegadoorigen comision millones aeropuertoscentros camara muerto ministroilegal desarrollo refugiados controldrama consenso social efectoacogida subcomision miseria paterasllamada socialista internacional llamadamenores comun ciudadanos medidasvida temas xenofobia llegadamafias tema hambre inmigrantesllegada asuntos viven marruecosmasiva grupos emigrantes cayucosextranjeros emigrantes muerte presion

Table 4: Examples of the relevant wordsthat were not masked, considering the listRelFreq in each classification task.

serve and compare what portions of the textsthey could be focusing on. For this purpose,we looked at the last layer of BETO andcomputed the average of the attention heads.Therefore, for each text, we had an atten-tion matrix from which we could compute theattention that the transformer gave to eachword in that texts. Figure 2 shows examplesof texts where the two models agreed on theright label.

From the figure, it is possible to see whatwords were relevant for both approaches. Al-though some of the relevant words are func-tion words (e.g., para, muy) and are not tooinformative at first glance for human inter-pretation, we can observe that some content-related words can be helpful for expert’sanalysis. For instance, the text labeled asStereotype has as relevant words fenomeno(phenomenon), inmigracion (immigration),problema (problem), terrorismo (terrorism),paro (unemployment), among others. Thetext labeled as Victims contains desamparo(abandonment), personas (people), necesita-mos dar una solucion (we need to give a so-lution), reflecting how immigrants were seenas people more than their illegal status (e.g.,see Tables 4), and the target of problems thatneed solutions. Moreover, in the example ofThreat, some of the words and phrases re-ceiving more importance (such as, problemamuy serio, problema muy importante) reflecthow immigrants were seen as a problem to


89

Figure 2: Examples of attention visualization and masking transformation over the same texts.These examples were correctly classified by both models. The more intense the color, the greateris the weight of attention given by the model.

the continent and the country, but not thecountry where immigrants come from.

Table 5 presents some of the words withthe highest attention scores in only the truepositive predictions of each class. Therefore,these words could be among the most dis-criminative for stereotype identification.

We contrasted the list of words with moreattention on the BETO true positive predic-tions, with the RelFreq words used by themasking technique as more discriminative foreach class. Table 6 shows the percentage ofRelFreq words (which were not masked) thatwere present in the top of the ranking as morediscriminative from BETO. In the top 30 ofthe ranking, we found the vast majority ofthe not masked words. This suggests thatthe two approaches have seen similar cues.

For now, we have seen that BETO and the

masking technique achieved similar resultsand have an intersection in the discrimina-tive words they focused on in the texts (whichin fact answer RQ2), despite one of them isa resource-hungry model and the other re-quires less computational resources. We donot think that BETO should not be used be-cause of its complexity: one of the differenceswe should highlight is that for the maskingtechnique the list of words should be pre-defined with some limitations and algorithmbias that this could imply. However, BETOlearns by itself to score the words gradu-ally, instead of giving a binary score like inthe masking technique (to mask or keep un-masked). Therefore, we can apply from thetransformer a comparison of the importanceof the different words (like it was visualizedin Figure 2), which was automatically learned


90

Stereotype Non-stereotype Victims Threatdramallegadailegalesefectoirregularesllamadacostasexpulsionestrabajadorespaterasxenofobiamafiacondicioneslegalidaddineroavalanchavienenpeninsulamuertesmileshumanitariacoladeropreocupa

consejoasuntostemascomparecenproducenrecibiracuerdosesfuerzodialogopactocongresonecesidadcumbreproyectozapateroenmiendasmiembroscolaboraconferenciagobiernosexteriorimportantesaccion

derechoesclavitudmujeresasilorefugiadospobrezaxenofobiamuertosistemadevolucionesmiseriadesgraciadifıcilgruposracismohambrerefugiopersonasituacionesexplotaciondenunciamuertedemocracia

masivapaterasllamadaavalanchaaeropuertostrasladarllegadazapaterocalderaalarmaayudasilegalmenteafrontarjudicialescapacidadarchipielagodelincuenciagravecongresotropicalesfallecidocoladerooleada

Table 5: Words whose attention scores arethe highest only on the true positive predic-tions of BETO in each class.

AttentionRanking

% from a total ofunmasked wordsS/N V/T

top 10 48.96% 35.89%top 20 78.92% 65.11%top 30 95.84% 82.98%

Table 6: The percentage of RelFreq wordsthat are in the top of words with the highestattention.

from the context of the words in the texts. Inthe next sections, we confirm the advantagesof both models by analyzing the results of anideal ensemble and other utilities of the at-tention mechanisms.

6.2 An Ideal Ensemble

We have seen the results achieved by theproposed models and the intersection set ofwords they focus on in the texts. Therefore,one could think that these models are clas-sifying correctly the same texts. In this sec-tion, we report that the models are misclas-sifying different instances in general.

Table 7 shows the misclassified instancesof LR with the masked texts and BETO ineach classification task. The models havegood performances, so it is licit to think inan ideal ensemble that could wisely combinetheir predictions. The resultant ensemble willmiss only the texts where both models arewrong: 272 texts at distinguishing Stereo-type vs. Non-stereotype, which means that

S/N V/TTotal of instances 3630 1477misclassified by LR 624 263misclassified by BETO 518 259misclassified by both 272 115Well predicted byan ideal ensemble

92.5% 92.2%

Table 7: Misclassified instances and the per-formance of an ideal ensemble for Stereotypevs. Non-stereotype and Victims vs. Threattasks.

92.5% of the 3630 texts will be correctly clas-sified. A similar analysis can be done in theVictims vs. Threat classification task, whichwill result in 92.2% of the 1477 texts that willbe potentially correctly classified.

6.3 Relations with the HighestAttention Scores

Another advantage of the attention mech-anism is the relations between the non-discriminative words and other words fromeach class. We could find noisy features sim-ilarly present in the opposite classes. One ofthe words with the highest attention scoresin our dataset is inmigracion (immigration);since we found its scores high in the two op-posite classes, we did not count it as discrim-inative by BETO. However, we think that asthe heads have the attention that each wordgives to the others in the texts, we can ob-serve how the “noisy” words are used in theopposite classes, by looking at the relationswith their context. We hypothesize that theimmigration-related words are used in differ-ent contexts in the opposite classes.

Table 8 shows an example of the wordswhose relation with inmigracion are the mostscored in each class. We omitted the ones inthe Non-stereotype class due to they are notinformative. Interestingly, the words associ-ated with inmigracion are also describing dif-ferently the Stereotype, Victims, and Threatclasses. For example, with this strategy weobserve that words like criminal, and enfer-medades (diseases) are now in the top of dis-criminating words of the Threat category (incontrast to Table 5). We conclude that theattention mechanism should be exploited inthe future in this sense. Probably the atten-tion scores could be a source of interestingcues not only in terms of biased words fromRelFreq list or the ones shown in Table 5,


91

Stereotype Victim Threatmuertessaturadomiseriapobrespolicialesinternamientodramaticosdescontrolempresarioshumanitariocostasavalanchagarantıasdelincuenciatraficodevueltosexplotacionllamadaalarmanilegalespaterasexpulsion

discriminacioncolectivosmujeresconsensodentrorefugiadosfamiliaseducativoplanteamosmiseriarefugiadospobrezaretomujeresvotovoluntadespecificaspobrezainiciativaenmiendapodıansaben

lleganuevodelincuenciaaeropuertosprocedentezapaterosaturadopolicialesretencionmadridenfermedadescongresoevolucionentranfranciaaeropuertostropicalescriminalaeropuertointentosllegaroncoladero

Table 8: Words with the highest attentionscores in relation to inmigracion (immigra-tion).

but also concerning the forms in which neu-tral terms are contextualized.

7 Conclusion and Future Work

This work is a contribution to the im-migrant stereotype identification problem.The particularities of the immigration phe-nomenon make this bias detection task dif-fers from other kinds of bias that have re-ceived much more attention (e.g., genderbias). We addressed two classification tasks,the Stereotype vs. Non-stereotype detec-tion, and Victims vs. Threat dimensionsidentification using an annotated dataset inSpanish. We proposed two different mod-els: BETO, a resource-hungry model whichdemands strong computational capabilities;and a masking technique, a less complex ap-proach that transforms the texts to be usedby a traditional classifier. We demonstratethat both approaches are suitable for immi-grant stereotype identification; and interest-ingly, the masking technique achieves almostthe same results of BETO, despite its sim-plicity (RQ1).

We developed a comparison between theattention mechanism of BETO, and the listof relevant terms that the masking techniqueuses. These two different approaches focusedon similar portions of the texts. Specifically,the majority of the relevant words main-tained unmasked are at the top of the wordsthat BETO gave the highest attention. Fur-

thermore, with these models it is possible tohighlight some stereotype cues that could beconsidered as local explanations for furtherstudies about immigrant stereotypes (RQ2).

On the basis of the reported results, weconclude that both models are effective atidentifying the immigrant stereotypes, andcould be combined to build an ideal ensem-ble that overcomes the results of each one.We also point out that BETO can help to in-vestigate with more detail the bias towardsimmigrants with the attention mechanisms.For these reasons, we think we cannot ruleout the use of either model.

To our knowledge, this is the first workon immigrant stereotypes identification thatcompares deep learning with traditional ma-chine learning approaches paying special at-tention to the explicability of the models inthis task. However, more work is necessaryto explore more deeply the advantages of theattention mechanisms in this sense. In futurework, we plan to combine the two approachesto increase the performance; and to use dis-criminative words to find debiasing strategiesto mitigate the immigrant stereotypes in so-cial media and political speeches.

Agradecimientos

The work of the authors from the Univer-sitat Politecnica of Valencia was funded bythe Spanish Ministry of Science and Inno-vation under the research project MISMIS-FAKEnHATE on MISinformation and MIS-communication in social media: FAKEnews and HATE speech (PGC2018-096212-B-C31). Experiments were carried out onthe GPU cluster at PRHLT thanks to thePROMETEO/2019/121 (DeepPattern) re-search project funded by the Generalitat Va-lenciana.

References

Alaparthi, S. and M. Mishra. 2021. Bert:a sentiment analysis odyssey. Journal ofMarketing Analytics, pages 1–9.

Bodria, F., A. Panisson, A. Perotti, andS. Piaggesi. 2020. Explainability methodsfor natural language processing: Applica-tions to sentiment analysis. In SEBD.

Bolukbasi, T., K.-W. Chang, J. Y. Zou,V. Saligrama, and A. T. Kalai. 2016. Manis to computer programmer as woman is tohomemaker? debiasing word embeddings.


92

Advances in neural information processingsystems, 29:4349–4357.

Canete, J., G. Chaperon, R. Fuentes, J.-H.Ho, H. Kang, and J. Perez. 2020. Span-ish pre-trained bert model and evaluationdata. In PML4DC at ICLR 2020.

Clark, K., U. Khandelwal, O. Levy, and C. D.Manning. 2019. What does BERT lookat? an analysis of BERT’s attention. InProceedings of the 2019 ACL WorkshopBlackboxNLP: Analyzing and InterpretingNeural Networks for NLP, pages 276–286,Florence, Italy, August. Association forComputational Linguistics.

Croce, D., D. Rossini, and R. Basili. 2019.Auditing deep learning processes throughkernel-based explanatory models. In Pro-ceedings of the 2019 Conference on Em-pirical Methods in Natural Language Pro-cessing and the 9th International JointConference on Natural Language Process-ing (EMNLP-IJCNLP), pages 4037–4046,Hong Kong, China, November. Associa-tion for Computational Linguistics.

Danilevsky, M., K. Qian, R. Aharonov,Y. Katsis, B. Kawas, and P. Sen. 2020.A survey of the state of explainable aifor natural language processing. arXivpreprint arXiv:2010.00711.

Dennison, J. and A. Geddes. 2019. A ris-ing tide? the salience of immigration andthe rise of anti-immigration political par-ties in western europe. The political quar-terly, 90(1):107–116.

Dev, S., T. Li, J. M. Phillips, and V. Sriku-mar. 2020. On measuring and mitigatingbiased inferences of word embeddings. InAAAI, pages 7659–7666.

Devlin, J., M.-W. Chang, K. Lee, andK. Toutanova. 2018. Bert: Pre-trainingof deep bidirectional transformers for lan-guage understanding. arXiv preprintarXiv:1810.04805.

Fokkens, A., N. Ruigrok, C. Beukeboom,G. Sarah, and W. Van Atteveldt. 2018.Studying muslim stereotyping through mi-croportrait extraction. In Proceedingsof the Eleventh International Conferenceon Language Resources and Evaluation(LREC 2018).

Garg, N., L. Schiebinger, D. Jurafsky, andJ. Zou. 2018. Word embeddings quan-tify 100 years of gender and ethnicstereotypes. Proceedings of the NationalAcademy of Sciences, 115(16):E3635–E3644.

Granados, A., M. Cebrian, D. Camacho, andF. De Borja Rodrıguez. 2011. Reduc-ing the loss of information through an-nealing text distortion. IEEE Transac-tions on Knowledge and Data Engineer-ing, 23(7):1090–1102. cited By 19.

Islam, S. R., W. Eberle, S. K. Ghafoor, andM. Ahmed. 2021. Explainable artificialintelligence approaches: A survey. arXivpreprint arXiv:2101.09429.

Jarquın-Vasquez, H. J., M. Montes-y-Gomez,and L. Villasenor-Pineda. 2020. Not allswear words are used equal: Attentionover word n-grams for abusive languageidentification. In K. M. Figueroa Mora,J. Anzurez Marın, J. Cerda, J. A.Carrasco-Ochoa, J. F. Martınez-Trinidad,and J. A. Olvera-Lopez, editors, Pat-tern Recognition, pages 282–292, Cham.Springer International Publishing.

Liang, P. P., I. M. Li, E. Zheng,Y. C. Lim, R. Salakhutdinov, and L.-P. Morency. 2020. Towards debiasingsentence representations. arXiv preprintarXiv:2007.08100.

Lipmann, W. 1922. Public Opinion. NewYork:Harcourt Brace.

Mathew, B., P. Saha, S. Muhie Yimam,C. Biemann, P. Goyal, and A. Mukherjee.2020. HateXplain: A Benchmark Datasetfor Explainable Hate Speech Detection.arXiv e-prints, page arXiv:2012.10289,December.

Mullenbach, J., S. Wiegreffe, J. Duke, J. Sun,and J. Eisenstein. 2018. Explainable pre-diction of medical codes from clinical text.In Proceedings of the 2018 Conference ofthe North American Chapter of the As-sociation for Computational Linguistics:Human Language Technologies, Volume1 (Long Papers), pages 1101–1111, NewOrleans, Louisiana, June. Association forComputational Linguistics.

Nadeem, M., A. Bethke, and S. Reddy. 2020.Stereoset: Measuring stereotypical bias


93

in pretrained language models. arXivpreprint arXiv:2004.09456.

Rauh, C. and J. Schwalbach. 2020. The parl-speech v2 data set: Full-text corpora of6.3 million parliamentary speeches in thekey legislative chambers of nine represen-tative democracies. Harvard Dataverse.

Ribeiro, M. T., S. Singh, and C. Guestrin.2016. ”why should i trust you?”: Explain-ing the predictions of any classifier. InProceedings of the 22nd ACM SIGKDDInternational Conference on KnowledgeDiscovery and Data Mining, KDD ’16,page 1135–1144, New York, NY, USA. As-sociation for Computing Machinery.

Sanguinetti, M., G. Comandini, E. Di Nuovo,S. Frenda, M. A. Stranisci, C. Bosco,C. Tommaso, V. Patti, R. Irene, et al.2020. Haspeede 2@ evalita2020: Overviewof the evalita 2020 hate speech detectiontask. In EVALITA 2020 Seventh Evalua-tion Campaign of Natural Language Pro-cessing and Speech Tools for Italian, pages1–9. CEUR.

Sanguinetti, M., F. Poletto, C. Bosco,V. Patti, and M. Stranisci. 2018. Anitalian twitter corpus of hate speechagainst immigrants. In Proceedings ofthe Eleventh International Conferenceon Language Resources and Evaluation(LREC 2018).

Scheufele, D. A. 2006. Framing as a Theoryof Media Effects. Journal of Communica-tion, 49(1):103–122, 02.

Stamatatos, E. 2017. Authorship attribu-tion using text distortion. 15th Confer-ence of the European Chapter of the As-sociation for Computational Linguistics,EACL 2017 - Proceedings of Conference,1:1138–1149. cited By 31.

Sanchez-Junquera, J., B. Chulvi, P. Rosso,and S. P. Ponzetto. 2021. How do youspeak about immigrants? taxonomy andstereoimmigrants dataset for identifyingstereotypes about immigrants. AppliedSciences, 11(8).

Sanchez-Junquera, J., L. Villasenor-Pineda,M. Montes-y-Gomez, P. Rosso, andE. Stamatatos. 2020. Masking domain-specific information for cross-domain de-ception detection. Pattern RecognitionLetters, 135:122–130.

Tajfel, H., A. A. Sheikh, and R. C. Gardner.1964. Content of stereotypes and the in-ference of similarity between members ofstereotyped groups. Acta Psychologica,,22(3):191–201.

Tessler, H., M. Choi, and G. Kao. 2020.The anxiety of being asian american: Hatecrimes and negative biases during thecovid-19 pandemic. American Journal ofCriminal Justice, 45(4):636–646.


94

Documents

Masking and BERT-based Models for Stereotype Identi cation