Upload
bart-de-goede
View
471
Download
0
Embed Size (px)
DESCRIPTION
Presentation of my bachelor thesis Information Science. It provides an overview of my attempt to use parsimonious language models on parliamentary proceedings to derive characteristic words for left-wing and right-wing parties, and compare the occurences of these words in subtitles of programmes broadcasted by Dutch public broadcasting organizations.
Citation preview
Political slant in public broadcasting
Author:Bart de Goede
Supervisors:Dr. Maarten Marx
Dr. Johan van DoornikJune 23, 2011
Why?
Automatically identify political slant in Dutch public broadcasting
Gentzkow & Shapiro (2010)
Gentzkow, M. and Shapiro, J. M. (2010). What drives media slant? Evi-dence from U.S. daily newspapers. Econometrica, 78(1):35–71.
Econometrical research: compare language use of news outlets to political language
Conclusion: ‘An economically signi!cant demand for news slanted towards one’s own political ideology exists.’
Operationalization
Gentzkow, M. and Shapiro, J. M. (2010). What drives media slant? Evi-dence from U.S. daily newspapers. Econometrica, 78(1):35–71.
Find characteristic words for Republicans and Democrats in Congress Proceedings.
Count relative frequencies of these words in newspapers
Compare occurrence of words between newspapers
Di"erences
Dutch versus English
Television instead of newspapers
More political parties
Other technique to derive characteristic words
Other comparison method(s)
Television
Subtitles for the hearing impaired (http://tt888.nl)
Data complete from January 2008 to February 2011
Problem: Hardly any useful metadata
Television
Solution: TV guide
Before After
Broadcast with title
Unique titles
Broadcast frequency > 2
16.995 32.491
4.560 --> 2.702 2.238
1.104 1.064
TelevisionPauw & Witteman
895.935 words
Nova362.844 words
Nos Journaal12.609.620 words
NOS Jeugdjournaal1.383.728 words
Netwerk879.635 words
Goedemorgen Nederland760.658 words
EenVandaag1.556.642 words
DWDD1.626.929 words
Buitenhof DWDDEenVandaag Goedemorgen NederlandHet Elfde Uur Holland DocKnevel en Van den Brink NetwerkNieuwsuur NOS JeugdjournaalNos Journaal NovaOchtendspits Pauw & WittemanPowNews SchoolTV WeekjournaalSinterklaasjournaal TegenlichtUitgesproken VragenuurtjeZembla
Political groups
Hirst, G., Riabinin, Y., Graham, J., and Boizot-Roche, M. Text to Ideology
or Text to Party Status?
Parliamentary period with greatest overlap on TV data set:Balkenende IV
Ideology: goverment - opposition, not left - right (Hirst et al., 2010)
Political groups
Hirst, G., Riabinin, Y., Graham, J., and Boizot-Roche, M. Text to Ideology
or Text to Party Status?
Government (CDA, PvdA and ChristenUnie)
Left wing opposition (GroenLinks, SP)
Right wing opposition (PVV, VVD)
Parsimonious language models
Hiemstra, D., Robertson, S., and Zaragoza, H. (2004). Parsimonious lan-
guage models for information retrieval. In Proceedings of the 27th Annual Inter-national ACM SIGIR Conference on Research and development in InformationRetrieval, SIGIR ’04, pages 178–185, New York, NY, USA. ACM.
et = tf(t,D) · λ(t|D)
(1− λ)P (t|C) + λP (t|D)
P (t|D) =et�t et
Parsimonious language models
Hiemstra, D., Robertson, S., and Zaragoza, H. (2004). Parsimonious lan-
guage models for information retrieval. In Proceedings of the 27th Annual Inter-national ACM SIGIR Conference on Research and development in InformationRetrieval, SIGIR ’04, pages 178–185, New York, NY, USA. ACM.
Probability distribution from word frequencies per document
Compare distribution with collection of documents
Choose terms that are substantially more frequent than expected
Parsimonious language models
Hiemstra, D., Robertson, S., and Zaragoza, H. (2004). Parsimonious lan-
guage models for information retrieval. In Proceedings of the 27th Annual Inter-national ACM SIGIR Conference on Research and development in InformationRetrieval, SIGIR ’04, pages 178–185, New York, NY, USA. ACM.
Filter out corpus speci!c stopwords (‘voorzitter’)
Remove noise
Parsimonious language models
Parsimonious language models
Parsimonious language models
Comparison
Two methods: estimated probability and Kullback-Leibler divergence
‘For each political group, estimate the probability that an arbitrary word in a tv-programme is one of their characteristic words’
‘Calculate the risk of returning a document to the query’
P̂ (q|TV ) =�
t∈q
tft,TV
|TV | KL(Md � Mq) =�
t�V
P (t|Mq) · logP (t|Mq)
P (t|Md)
Results
Right never wins
Casual evaluation does not imply ‘strange’ right wing words
Government and left results are close
Comparison with regular Dutch does imply a little preference for left wing words
Conclusions
Language in Dutch public broadcasting is not particularly left (only a slight preference was found)
Descriptive right wing words used less
Might be PVV-in#uence; further investigation is needed
Questions?