40
Image: © flickr/srqpix CC BY 2.0 GENDER/GENRE: GENDER DIFFERENCES IN PROFESSIONAL WRITING Brian N. Larson 29 October 2014 Current Research in Writing Studies

141028 Parlor Slides

Embed Size (px)

DESCRIPTION

Partial report of results of empirical study "Gender/Genre: Gender differences in disciplinary language." Study used methods from statistics and natural language processing to examine lexical and quasi-syntactic features of writing in a professional genre.

Citation preview

Imag

e: ©

flic

kr/s

rqpi

x C

C B

Y 2.

0

GENDER/GENRE: GENDER DIFFERENCES IN PROFESSIONAL WRITING

Brian N. Larson 29 October 2014

Current Research in Writing Studies

www.Rhetoricked.com @Rhetoricked

Housekeeping

•  www.Rhetoricked.com (these slides + some additional)

•  Communicate with me: –  @Rhetoricked –  [email protected]

•  Research supported by: –  Graduate Research Partnership Program fellowship (U of M

CLA), 2012 –  James I. Brown Summer Research Fellowship, 2014

www.Rhetoricked.com @Rhetoricked

Gender, sex, and research constructs

•  When I talk about my own data, I’ll refer to – Gender F authors/writers – Gender M authors/writers

•  These categories may or may not correspond to other researchers’ –  {woman, female, feminine} –  {man, male, masculine}

•  That’s the subject of another talk (or for Q&A)

www.Rhetoricked.com @Rhetoricked

Many researchers have asked

•  Do men and women communicate differently?

•  Much work inspired by Robin Lakoff (1975) •  Scholarly and popular works by Deborah

Tannen (e.g. 1990[2001]) and others •  Much of this research in oral/face-to-face

communication

www.Rhetoricked.com @Rhetoricked

Writing: Process and product

•  In writing studies, we can (roughly) divide process and product – Do men and women produce writing using

different processes? –  Is the writing they produce distinguishable

based on author gender?

www.Rhetoricked.com @Rhetoricked

Previous studies: Process research

•  Focus on interpersonal communications in mixed-gender contexts – Lay, 1989 (Schuster); Rehling, 1996; Raign

& Sims, 1993; Ton & Klecun, 2004; Wolfe & Alexander, 2005; Brown & Burnett, 2006; Wolfe & Powell, 2006, 2009.

www.Rhetoricked.com @Rhetoricked

Previous studies: Product research

•  In technical and professional communication – Sterkel, 1988 (20 stylistic chars) – Smeltzer & Werbel, 1986 (16 stylistic and

evaluative measures) – Tebeaux, 1990 (quality of responses) – Allen, 1994 (markers of authoritativeness)

•  Manual methods, small samples

www.Rhetoricked.com @Rhetoricked

Enter computational methods

•  Natural language processing (NLP) •  Allows processing of large quantities of

text data •  Study that attracted my attention

– Koppel, Argamon & Shimoni, 2002 (machine-learning algorithms)

– Argamon et al., 2003 (statistical analysis) –  I’ll focus on Argamon et al. in this talk

www.Rhetoricked.com @Rhetoricked

Argamon et al. 2003

•  Used 500 published texts from BNC •  Mean 34,000 words (‘tokens’) per text •  Statistical analysis showed

correspondence to Biber’s (1995) “informational/involved” dimension

www.Rhetoricked.com @Rhetoricked

Gender in computer-mediated communication (CMC)

•  CMC popular for NLP studies –  Data are readily available –  Data are voluminous

•  Examples –  Herring & Paolillo, 2006 (blog posts, stat analysis) –  Yan & Yan, 2006 (blog posts, MLA analysis) –  Argamon et al., 2007 (blog posts, MLA analysis) –  Rao et al., 2010 (Twitter, MLA analysis) –  Burger et al., 2011 (Twitter, MLA analysis)

www.Rhetoricked.com @Rhetoricked

Rationale: Why is the question important?

•  Lend support to one or more theories of gender –  ‘Two cultures’ (Maltz & Borker, 1982) –  ‘Standpoint’ (Barker & Zifcak, 1999) –  ‘Performative’ (Butler 1993, 1999, 2004) – Others

•  Sorting out methodological problems, particularly use of gender as a variable

www.Rhetoricked.com @Rhetoricked

Study design goals

•  Research questions –  Did Gender F and Gender M writers in a disciplinary

genre in which they are being trained use lexical and quasi-syntactic stylistic features with relative frequencies that varied with their genders?

–  If so, did the differences appear in interpretable patterns?

•  Examine a corpus of texts –  All of the same genre –  Where we can be confident of single authorship –  Where author gender is self-identified

www.Rhetoricked.com @Rhetoricked

Data collection

•  Major writing project at end of first year of law school – Students address hypothetical problem

(writing in same ‘genre’) – Students not allowed to collaborate – Plagiarism difficult (but still possible)

•  Students self-identified gender* •  193 texts (mean word tokens = 3764) *This study IRB-approved (UMN Study #1202E10685)

www.Rhetoricked.com @Rhetoricked

Text genre: Memorandum regarding motion to dismiss

•  Written to hypothetical court •  Supporting or opposing a motion before

the court •  High-level organization is formulaic

www.Rhetoricked.com @Rhetoricked

r

•  t

www.Rhetoricked.com @Rhetoricked

Memorandum Sections

•  Caption** •  Introduction/summary* •  Facts •  Legal standard of review* •  Argument •  Conclusion •  Signature block**

* Not always present. **I did not analyze (content is highly formulaic)

www.Rhetoricked.com @Rhetoricked

Feature (“variable”) selection

•  For now, those of Argamon et al. 2003 •  Relative frequencies of

– 429 “function words” (Argamon used 405) – 45 parts of speech from the Penn

Treebank tagset (Argamon used 76 BNC POS tags)

– 100 common part-of-speech bigrams – 500 common POS trigrams

www.Rhetoricked.com @Rhetoricked

‘Part-of-speech’ tags? ‘Bigrams & trigrams’?

•  First, ‘tokenize’ each sentence (automated): –  ‘My aunt’s pen is on the table.’

www.Rhetoricked.com @Rhetoricked

POS tags

•  Purple words are function words

•  Tag the parts of speech (automated) •  Then calculate relative frequency of

function words and POS tags (automated)

www.Rhetoricked.com @Rhetoricked

POS bigrams and trigrams •  A bigram or trigram is a 2- or 3-token

‘window’ on the sentence. –  Automated calculation

www.Rhetoricked.com @Rhetoricked

Feature (“variable”) selection

•  First-person pronouns (total) –  Singular: I, me, my, mine, myself. –  Plural: We, us, our, ours, ourselves.

•  Second-person pronouns: You, your, yours, yourself. •  Third-person pronouns (total)

–  Singular (total) •  Feminine: She, her, hers, herself. •  Masculine: He, him, his, himself.

–  Plural: They, them, their, theirs, themselves. •  Contractions: Including all instances of n’t, ’ld, ’ve, etc. •  All relative frequencies calculated (automated)

www.Rhetoricked.com @Rhetoricked

Each student’s text is represented by variables

•  A series of numerical values expressing each feature (variable), i.e., the relative frequency of: –  Function words / total tokens –  POS tags / total tokens –  Bigrams / total bigrams* –  Trigrams / total trigrams* –  Pronouns –  Automated calculation

*Multiplied by a factor.

www.Rhetoricked.com @Rhetoricked

t

•  T

www.Rhetoricked.com @Rhetoricked

Example 1

•  Tokens of the function word-type “all” in paper 1007 account for less than 7/100 of 1% of all tokens in that paper.

www.Rhetoricked.com @Rhetoricked

Example 2

•  Bigrams made up of a plural common noun (NNS) followed by a coordinating conjunction (CC) accounted for 1/10 of 1% of bigrams in paper 1009.

www.Rhetoricked.com @Rhetoricked

Mean relative frequencies calculated

•  For each feature – Mean frequency (SD) for Gender F authors – Mean frequency (SD) for Gender M

authors – Statistical significance assessed with

Mann-Whitney U test (expressed as p-value)

•  A priori threshold for significance: 0.05

www.Rhetoricked.com @Rhetoricked

What Argamon et al. 2003 found: Men

•  Males used significantly more – Determiners, a, the, these – Determiner+noun bigrams: the books, a

dog, these Tories – Attributive-adjective+noun bigrams: great

leaders, old form – Prepositions: at, from, for, of, behind –  Its

www.Rhetoricked.com @Rhetoricked

What Argamon et al. 2003 found: Women

•  Females used significantly more – Pronouns (all)

•  1st person sing.: I, my, mine •  2nd person: you, yours •  3rd person: they, them, theirs

– Present tense verbs: walks, eradicates – Contractions – Negation with “not”

www.Rhetoricked.com @Rhetoricked

Informational/involved

•  Biber (1995) labeled this a dimension of register variation after doing cluster analyses on frequencies to identify co-varying features as “dimensions”

•  Consistent with popular conceptions and works such as Tannen (1990 [2001]) that characterize women as “affiliative” and men as “informative”

www.Rhetoricked.com @Rhetoricked

What I found: Nouns & determiners

•  Nouns – Some categories showed non-significant

Gender F preference (weakly contradicting Argamon)

•  Determiners and determiner+noun – Only significant: DET-NNP (proper noun) – But all showed non-significant Gender M

preference –  (Overall, weakly supporting Argamon)

www.Rhetoricked.com @Rhetoricked

What I found: Adjectives & prepositions

•  Attributive-adjective+noun – Non-significant Gender M preference

(weakly supporting Argamon) •  Prepositions

– Non-significant Gender M preference (weakly supporting Argamon)

www.Rhetoricked.com @Rhetoricked

What I found: Pronouns (i.e., a mess)

•  All pronouns: Non-significant Gender M preference (weakly contradicting Argamon)

•  1st p sing., 2nd p., 3rd p. overall, 3rd s. fem: Non-significant Gender F preference (weakly supporting Argamon)

•  3rd p. plural: Significant Gender M preference (contradicting Argamon)

•  Its: Non-significant Gender F preference (weakly contradicting Argamon)

www.Rhetoricked.com @Rhetoricked

What I found: Verbs, contractions, “not”

•  Present-tense verbs – Significant Gender M preference for 3rd p.

singular (contradicting Argamon) – Non-significant Gender M preference for the

rest (weakly contradicting Argamon) •  Contractions: Non-significant Gender F

preference (weakly supporting Argamon) •  Negation with “not”: (weakly supporting

Argamon)

www.Rhetoricked.com @Rhetoricked

The take-away?

•  Statistics: The non-significant differences should probably be regarded as non-significant –  In that case, M-informational/F-involved is not

confirmed in this study •  If the non-significant differences are real,

evidence for M-informational/F-involved is still mixed, especially in pronouns and present-tense verbs

www.Rhetoricked.com @Rhetoricked

Explaining the findings with relevance theory

•  Relevance theory (Sperber & Wilson 1995) recognizes the effects of habituation

•  If boys and girls are acculturated to writing in certain genres and certain topics in their youths . . .

•  . . . they may unconsciously habituate to certain (appropriate) word choices

•  . . . and may not be completely free to vary their word choices consciously later.

www.Rhetoricked.com @Rhetoricked

Situating the findings within gender & language theories

•  Findings weakly support or contradict – Two sociolinguistic cultures view (Maltz &

Borker 1982; Tannen 1990 [2001]) –  Intersectionality/performativity views (Barker &

Zifcak 1999; Butler; many others) •  Some gendered linguistic habits appeared

to resist retraining and conscious efforts to conform to register conventions . . .

•  . . . others were apparently overcome.

www.Rhetoricked.com @Rhetoricked

I’m left with more questions than answers . . .

•  But you are entitled to ask some questions now . . .

www.Rhetoricked.com @Rhetoricked

THANK YOU!

•  www.Rhetoricked.com (these slides + some additional)

•  Communicate with me: –  @Rhetoricked –  [email protected]

•  Research supported by: –  Graduate Research Partnership Program fellowship (U of M

CLA), 2012 –  James I. Brown Summer Research Fellowship, 2014

www.Rhetoricked.com @Rhetoricked

Works cited Allen, J. (1994). Women and authority in business/technical communication scholarship: An analysis of writing... Technical Communication Quarterly, 3(3), 271. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3), 321–346. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the Blogosphere: Age, gender and the varieties of self-expression. First Monday, 12(9). Retrieved from http://firstmonday.org/issues/issue12_9/argamon/index.html Armstrong, C. L., & McAdams, M. J. (2009). Blogs of information: How gender cues and individual motivations influence perceptions of credibility. Journal of Computer-Mediated Communication, 14(3), 435–456. Barker, R. T., & Zifcak, L. (1999). Communication and gender in workplace 2000: creating a contextually-based integrated paradigm. Journal of Technical Writing & Communication, 29(4), 335. Biber, D. (1995). Dimensions of register variation  : a cross-linguistic comparison. Cambridge  ;;New York: Cambridge University Press. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python (1st ed.). O’Reilly Media. Brown, S. M., & Burnett, R. E. (2006). Women hardly talk. Really! Communication practices of women in undergraduate engineering classes (pp. T3F1–T3F9). Presented at the 9th International Conference on Engineering Education, San Juan, Puerto Rico: International Network for Engineering Education & Research. Retrieved from http://ineer.org/Events/ICEE2006/papers/3219.pdf Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Bedford, MA: MITRE Corporation. Retrieved from http://www.mitre.org/work/tech_papers/2011/11_0170/

Butler, J. (1993). Bodies that matter: on the discursive limits of“ sex.” New York: Routledge. Butler, J. (1999). Gender trouble. New York: Routledge. Butler, J. (2004). Undoing gender. New York: Routledge. Cunningham, H., Maynard, Diana, Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., … Peters, W. (2012, December 28). Developing Language Processing Components with GATE Version 7 (a User Guide). GATE: General Architecture for Text Engineering. Retrieved January 1, 2013, from http://gate.ac.uk/sale/tao/split.html Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Computational Biology, 9(2), e1002854. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10–18. Herring, S. C., & Paolillo, J. C. (2006). Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4), 439–459. Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401 –412. Lakoff, R. T. (1975/2004). Language and Woman’s Place: Text and Commentaries. (M. Bucholtz, Ed.) (Revised and expanded ed.). New York: Oxford University Press.

www.Rhetoricked.com @Rhetoricked

Works cited Lay, M. M. (1989). Interpersonal conflict in collaborative writing: What we can learn from gender studies. Journal of Business and Technical Communication, 3(2), 5–28. Maltz, D. N., & Borker, R. (1982). A cultural approach to male-female miscommunication. In J. J. Gumperz (Ed.), Language and social identity (pp. 196–216). Cambridge U.K.: Cambridge University Press. Pakhomov, S. V., Hanson, P. L., Bjornsen, S. S., & Smith, S. A. (2008). Automatic classification of foot examination findings using clinical notes and machine learning. Journal of the American Medical Informatics Association, 15, 198–202. Raign, K. R., & Sims, B. R. (1993). Gender, persuasion techniques, and collaboration. Technical Communication Quarterly, 2(1), 89–104. Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in Twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37–44). Toronto, ON, Canada: ACM. Rehling, L. (1996). Writing together: Gender’s effect on collaboration. Journal of Technical Writing and Communication, 26(2), 163–176. Smeltzer, L. R., & Werbel, J. D. (1986). Gender differences in managerial communication: Fact or folk-linguistics? Journal of Business Communication, 23(2), 41–50. Sperber, D., & Wilson, D. (1995). Relevance: Communication and Cognition (2nd ed.). Wiley-Blackwell. Sterkel, K. S. (1988). The relationship between gender and writing style in business communications. Journal of Business Communication, 25(4), 17–38. Tannen, D. (2001). You Just Don’t Understand: Women and Men in Conversation. William Morrow Paperbacks. Tebeaux, E. (1990). Toward an understanding of gender differences in written business communications: A suggested perspective for future research. Journal of Business and Technical Communication, 4(1), 25–43.

Tong, A., & Klecun, E. (2004). Toward accommodating gender differences in multimedia communication. Professional Communication, IEEE Transactions on, 47(2), 118–129. Wolfe, J., & Alexander, K. P. (2005). The computer expert in mixed-gendered collaborative writing groups. Journal of Business and Technical Communication, 19(2), 135–170. Wolfe, J., & Powell, B. (2006). Gender and expressions of dissatisfaction: A study of complaining in mixed-gendered student work groups. Women & Language, 29(2), 13–20. Wolfe, J., & Powell, E. (2009). Biases in interpersonal communication: How engineering students perceive gender typical speech acts in teamwork. Journal of Engineering Education, 98(1), 5–16. Yan, X., & Yan, L. (2006). Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 228–230).