11
Introduktion till språkteknologi Christian Hardmeier 2015-11-10 Kursmål Typiska språkteknologiska produkter, tjänster, tekniker Viktigaste delområden inom språkteknologin Problemer och metoder inom parsning, språkgranskning, dokumentsökning, informationsextraktion, maskinöversättning, taligenkänning och talsyntes Implementera elementära exempel Leta fram och presentera information om ett språkteknologiskt problem Kursens upplägg: Föreläsningar F1 10/11 CH Inledning F2 11/11 CH Språkmodeller, skrivstöd F3 18/11 MD Språkets analoga former F4 25/11 BM Textklassificering F5 2/12 CH Maskinöversättning F6 7/12 BM Språkteknologi och textforskning F7 9/12 JN Dialogsystem F8 14/12 JN Informationssökning F9 16/12 CH Posterpresentationer och avslutning

Introduktion till språkteknologi

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduktion till språkteknologi

Introduktion till språkteknologi

Christian Hardmeier

2015-11-10

Kursmål

Typiska språkteknologiska produkter, tjänster, tekniker

Viktigaste delområden inom språkteknologin

Problemer och metoder inom parsning, språkgranskning,dokumentsökning, informationsextraktion,maskinöversättning, taligenkänning och talsyntes

Implementera elementära exempel

Leta fram och presentera information omett språkteknologiskt problem

Kursens upplägg: Föreläsningar

F1 10/11 CH Inledning

F2 11/11 CH Språkmodeller, skrivstöd

F3 18/11 MD Språkets analoga former

F4 25/11 BM Textklassificering

F5 2/12 CH Maskinöversättning

F6 7/12 BM Språkteknologi och textforskning

F7 9/12 JN Dialogsystem

F8 14/12 JN Informationssökning

F9 16/12 CH Posterpresentationer och avslutning

Page 2: Introduktion till språkteknologi

Kursens upplägg: Laborationer

1 Ord och n-gramLabb: 11/11 eller 12/11Inlämning: 18/11

2 Tokenisering och ordklasstaggningLabb: 18/11 eller 20/11Inlämning: 29/11

3 SentimentanalysLabb: 25/11 eller 26/11Inlämning: 2/12

Kursens upplägg: Inlämningsuppgifter

Inlämningsuppgift 1: Undersök en språkteknologisk tjänsteller ett språkteknologiskt produkt.Short paper (3–6 sidor) och posterpresentation

9/12: Inlämning artikel13/12: Inlämning poster16/12: Posterpresentation

Inlämningsuppgift 2:25/11–2/12, om föreläsningar 1–4

Inlämningsuppgift 3:14/12–20/12, om föreläsningar 5–8

Många namn

Språkteknologi

Datorlingvistik

Language technology

Computational linguistics (CL)

Natural language processing (NLP)

Natural language engineering (NLE)

Page 3: Introduktion till språkteknologi

Beröringspunkter

SpråkteknologiLingvistik

LexikografiSatslära

PragmatikÖversättnings-teori

Datavetenskap

Algoritmer/datastrukturer

ProgrammeringHPC

Formell logik Matematik

Maskininlärning

Artificiell intelligens

Statistik

Varför är det svårt?

Mänskligt språk kännetecknas av egenskaper som är svåra attformalisera på ett sätt som en dator “förstår”:

Flertydighet

Vaghet

Ofullständighet

Variation

Produktivitet

Redundans

(Några bilder lånade från Mattias Nilsson.)

Flertydighet

Kan gälla såväl enskilda ord som strukturer ochsyftningsförhållanden

Lexikal flertydighetOrdklassflertydighet

fiskar verb el. nomen (substantiv)var verb el. bestämningsord el. nomen

Flertydighet inom en ordklass (polysemi)bank nomen

1. långsträckt undervattensgrund2. kreditinstitution

Strukturell flertydighetJag såg mannen i parken med kikaren.

SyftningsflertydighetShe dropped the vase on the floor and broke it.

Page 4: Introduktion till språkteknologi

Vaghet, ofullständighet

VaghetMänskligt språk innehåller vaga, el. relativa begreppVar går t. ex. gränsen mellan

eftermiddag och kväll, natt och morgonen stor och en liten hund

OfullständighetFör att förstå ett språkligt sammanhang krävs ofta en allmänbakgrundskunskap om hur världen ser ut och fungerar.

Det var bal på slottet. Riddarsalen var festligt utsmyckad.

Variation, produktivitet

VariationSynonymi: tjej, flicka, tös, jäntaParafras

Han bor i Uppsala.I Uppsala bor han.Det är i Uppsala som han bor.I Uppsala är det som han bor.Han bor i den stad som har Sveriges äldsta universitet.

ProduktivitetSpråket "fylls ständigt på" med nya ord och med nya sätt attanvända gamla ord.Språket består av en oändlig mängd satser.

I Noaks ark fanns minst en åsna och en häst och en ko och en . . .

Vi uttrycker oss ofta indirekt, t.ex. med ironi, över- ochunderdrifter, metaforer, pragmatiska satser.

Redundans

I mänskligt språk signaleras ofta ett och sammabetydelseelement på flera sätt.

Igår sken solen - Förfluten tid uttrycks både genomtempusvalet hos verbet och och genom adverbet i går

En ny cykel - Obestämdhet och numerus signaleras på varjeord.

Behovet av redundans i språksignalen ökar underotillfredställande yttre omständigheter (buller och andrastörningar).

Page 5: Introduktion till språkteknologi

Hur ser det ut i verkliga texter?

Vad har ni fått för text?Titta på texten och försök hitta exempel på

FlertydighetVaghetOfullständighetVariationProduktivitetRedundans

Finns det några andra speciella egenskaper som kan vara svåraatt hantera för språkteknologiska datorsystem?

PubMed Commons home

How to join PubMed Commons

Mol Pharm. 2014 Oct 15. [Epub ahead of print]

Moxifloxacin Loaded Nanoemulsions having Tocopheryl Succinate asIntegral Component Improves Pharmacokinetics and EnhancesSurvival in E Coli Induced Complicated Intra Abdominal Infection.Shukla P, Verma AK, Dwivedi P, Yadav A, Gupta PK, Rath SK, Mishra PR.

AbstractIn the present work a novel nanoemulsion laden with moxifloxacin has been developed foreffective management of complicated Intra-abdominal infections. Moxifloxacin nanoemulsionfabricated using high pressure homogenization was evaluated for various pharmaceuticalparameters, pharmacokinetics and pharmacodynamics in rats with E coli induced sepsis. Thedeveloped nanoemulsion MONe6 (size 168±28 d.nm and ZP -24.78±0.45mV respectively)was effective for intracellular delivery and sustaining the release of MOX. MONe6demonstrated improved plasma (AUC MONe6/MOX= 2.38 fold) and tissue pharmacokineticsof MOX (AUC MONe6/MOX= 2.63 and 1.47 times in lung and liver respectively). CalculatedPK/PD index correlated well with reduction in bacterial burden in plasma as well as tissues.Enhanced survival on treatment with MONe6 (65.44 %) and as compared to control group(8.22 %) was result of reduction in lipid peroxidation, neutrophil migration and cytokine levels(TNF-! and IL1b) as compared to untreated groups in rat model of E coli induced sepsis.Parenteral nanoemulsions of MOX hold a promising advantage in the therapy of E. coliinduced complicated intra-abdominal infections and is helpful in the prevention of furthercomplication like septic shock and death.

PMID: 25317848 [PubMed - as supplied by publisher]

PubMed Commons

0 comments

Display Settings: Abstract

LinkOut - more resources

Where Young College Graduates Are Choosing to LiveNew York Times, 20 Oct 2014

When young college graduates decide where to move, they are not justlooking at the usual suspects, like New York, Washington and San Fran-cisco. Other cities are increasing their share of these valuable residentsat an even higher rate and have reached a high overall percentage, led byDenver, San Diego, Nashville, Salt Lake City and Portland, Ore., accordingto a report published Monday by City Observatory, a new think tank.

And as young people continue to spurn the suburbs for urban living,more of them are moving to the very heart of cities — even in econom-ically troubled places like Buffalo and Cleveland. The number of college-educated people age 25 to 34 living within three miles of city centers hassurged, up 37 percent since 2000, even as the total population of theseneighborhoods has slightly shrunk.

Some cities are attracting young talent while their overall population falls,like Pittsburgh and New Orleans. And in a reversal, others that used to bemagnets, like Atlanta and Charlotte, are struggling to attract them at thesame rate.

Page 6: Introduktion till språkteknologi

Anna-Lena@AnnalenaLauren

Moscow correspondent forHufvudstadsbladet and SvenskaDagbladet

Gått med april 2011

106 Foton och videor

Anna-Lena @AnnalenaLauren · 18 tim

Mot Donetsk, än en gång. Taas matkalla Donetskiin. Once again on the road to Donetsk. 3 4 Minna Holopainen @HolopainenMinna · 18 okt Anna-Lena retweetade

Svd skuuppaa: sen mukaan sukellusvene-epäilyyn liittyy havaintoja radioliikenteestä sekä torstaina että eilisiltana svd.se/nyheter/inrike…

Visa sammanfattning 10 4 Anna-Lena @AnnalenaLauren · 18 okt

Min Hbl-krönika om demokratiseringens sk frammarsch i världen.Den ser inte ut exakt som vi tror att den gör. hbl.fi/opinion/utrike … 7 8 Jon Kyst (!"# $%&') @jonkyst · 17 okt Anna-Lena retweetade

Ugens vigtigste begivenhed i Rusland var, at oppositionens ledere @navalny og @khodorkovsky begge afviste, at Krim skal tilbage til Ukraine.

4 1 Anna-Lena @AnnalenaLauren · 16 okt

Min recension i SvD av bla Berners bok "Härskarna i Kreml".Härlig skildring av ryskt 90-tal,en tid av tankens frihet.svd.se/kultur/underst…

Visa sammanfattning 2 4 Kerstin Kronvall @KKronvall · 14 okt Anna-Lena retweetade

Det laddas upp inför parlamentsvalet i Ukraina 26.10. Stora marscher i Charkiv och Kiev i kväll. Bland andra är extremhögerns folk ute.

4

Tweets Tweets & svar Foton & videoklipp

TWEETS

1 007FÖLJER

104FÖLJARE

3 613FAVORITER

4 Följ

Har du ett konto? Logga in

Anna-Lena@AnnalenaLauren

Moscow correspondent forHufvudstadsbladet and SvenskaDagbladet

Gått med april 2011

106 Foton och videor

Anna-Lena @AnnalenaLauren · 18 tim

Mot Donetsk, än en gång. Taas matkalla Donetskiin. Once again on the road to Donetsk. 3 4 Minna Holopainen @HolopainenMinna · 18 okt Anna-Lena retweetade

Svd skuuppaa: sen mukaan sukellusvene-epäilyyn liittyy havaintoja radioliikenteestä sekä torstaina että eilisiltana svd.se/nyheter/inrike…

Visa sammanfattning 10 4 Anna-Lena @AnnalenaLauren · 18 okt

Min Hbl-krönika om demokratiseringens sk frammarsch i världen.Den ser inte ut exakt som vi tror att den gör. hbl.fi/opinion/utrike … 7 8 Jon Kyst (!"# $%&') @jonkyst · 17 okt Anna-Lena retweetade

Ugens vigtigste begivenhed i Rusland var, at oppositionens ledere @navalny og @khodorkovsky begge afviste, at Krim skal tilbage til Ukraine.

4 1 Anna-Lena @AnnalenaLauren · 16 okt

Min recension i SvD av bla Berners bok "Härskarna i Kreml".Härlig skildring av ryskt 90-tal,en tid av tankens frihet.svd.se/kultur/underst…

Visa sammanfattning 2 4 Kerstin Kronvall @KKronvall · 14 okt Anna-Lena retweetade

Det laddas upp inför parlamentsvalet i Ukraina 26.10. Stora marscher i Charkiv och Kiev i kväll. Bland andra är extremhögerns folk ute.

4

Tweets Tweets & svar Foton & videoklipp

TWEETS

1 007FÖLJER

104FÖLJARE

3 613FAVORITER

4 Följ

Har du ett konto? Logga in

Athough people enjoyment differs from stage to stage, youthenjoys more as compared with the older people is definitelyreasonable to agree with the statement. I observed in my life atpart, that young people enjoys more.

In modern world, due to the development of technology, they aremore fascinated to new kind of objects, such as mobiles,computer, internet, etc., and by using them that were availableenjoying life.

Youngers are more affectionated to the day to day developmentof technology, by using those things they enjoying more. Olderpeople are not that much aware of these technologies. Becausetechnology is developing rapidly in these day to day life. Youngerones are more energetic than older people, they can go and dowhatever they want. They make their friends as a part to enjoy inmost of the public places, they usually attends party andmeetings, and have capability to think and improve by adaptingnew technologies they enjoys more.

136 00:16:41,880 –> 00:16:43,792Well, what? He’s not happy?137 00:16:43,880 –> 00:16:46,759He can’t be, can he, if he’s, you know,138 00:16:46,840 –> 00:16:48,797messing around.139 00:16:50,520 –> 00:16:52,432You gonna see him again?140 00:16:55,080 –> 00:16:57,549Do you think I should?141 00:16:57,640 –> 00:17:01,236- Want me to be honest?- No. No.142 00:17:07,640 –> 00:17:09,916When you gonna take that thing off?143 00:17:10,000 –> 00:17:13,232It’s too tight. I’ve got to get it cut off.144 00:17:13,320 –> 00:17:15,039Mm.

Page 7: Introduktion till språkteknologi

Digitalisering

Digitalisering

Det mörknar mer och mer: december-mörker i augusti. De svarta rosenbladenha redan skrynklat sig. Men kortlapparnapå mitt bord lysa i gälla skrattande färgeri allt det grå som för att påminna om attde en gång blefvo uppfunna för att skingraen sjuk och galen furstes svårmod. Men

Page 8: Introduktion till språkteknologi

Tokenisering och meningsuppdelning

Det mörknar mer och mer : decembermörker i augusti .

De svarta rosenbladen ha redan skrynklat sig .

Men kortlapparna på mitt bord lysa i gälla skrattandefärger i allt det grå som för att påminna om att de engång blefvo uppfunna för att skingra en sjuk och galenfurstes svårmod.

Men. . .

Syntaxanalys

DT JJ NN VB AB VB PN MADDe svarta rosenbladen ha redan skrynklat sig .

DT

AT SS TA

VG

OO

IP

Djupare analys

times in the novel or are responsible for less than1% of the named entity mentions in the novel.

We assigned undirected edges between verticesthat represent adjacency in quoted speech frag-ments. Specifically, we set the weight of eachundirected edge between two character vertices tothe total length, in words, of all quotes that eithercharacter speaks from among all pairs of adjacentquotes in which they both speak– implying face toface conversation. We empirically determined thatthe most accurate definition of “adjacency” is onewhere the two characters’ quotes fall within 300words of one another with no attributed quotes inbetween. When such an adjacency is found, thelength of the quote is added to the edge weight,under the hypothesis that the significance of the re-lationship between two individuals is proportionalto the length of the dialogue that they exchange.Finally, we normalized each edge’s weight by thelength of the novel.

An example network, automatically constructedin this manner from Jane Austen’s Mansfield Park,is shown in Figure 1. The width of each vertex isdrawn to be proportional to the character’s shareof all the named entity mentions in the book (sothat protagonists, who are mentioned frequently,appear in larger ovals). The width of each edge isdrawn to be proportional to its weight (total con-versation length).

We also experimented with two alternate meth-ods for identifying edges, for purposes of a base-line:

1. The “correlation” method divides the textinto 10-paragraph segments and counts thenumber of mentions of each character ineach segment (excluding mentions insidequoted speech). It then computes the Pear-son product-moment correlation coefficientfor the distributions of mentions for each pairof characters. These coefficients are used forthe edge weights. Characters that tend to ap-pear together in the same areas of the novelare taken to be more socially connected, andhave a higher edge weight.

2. The “spoken mention” method counts occur-rences when one character refers to anotherin his or her quoted speech. These counts,normalized by the length of the text, are usedas edge weights. The intuition is that charac-ters who refer to one another are likely to bein conversation.

!"#$%&'(!&#)*%+,'-!./01234&5%.#!$*4&6%&7$89*+$%.%,

!"#$%"&'(

)"*#$%"&'(

+,-.&

/01,'0

)&".&)"*#$2,*345"(3

).**$6"&475"0

851

)"#$9&(:*

;,*&')"#$2,*345"(3

<&''=

;."$8351&*

)"*#$>5"".*

?.--.&1

Figure 1: Automatically extracted conversationnetwork for Jane Austen’s Mansfield Park.

4.4 Evaluation

To check the accuracy of our method for extractingconversational networks, we conducted an evalua-tion involving four of the novels (The Sign of theFour, Emma, David Copperfield and The Portraitof a Lady). We did not use these texts when devel-oping our method for identifying conversations.For each book, we randomly selected 4-5 chap-ters from among those with significant amountsof quoted speech, so that all excerpts from eachnovel amounted to at least 10,000 words. We thenasked three annotators to identity all the conversa-tions that occur in all 44,000 words. We requestedthat the annotators include both direct and indi-rect (unquoted) speech, and define “conversation”as in the beginning of Section 4, but exclude “re-told” conversations (those that occur within otherdialogue).

We processed the annotation results by breakingdown each multi-way conversation into all of itsunique two-character interactions (for example, aconversation between four people indicates six bi-lateral interactions). To calculate inter-annotatoragreement, we first compiled a list of all possi-ble interactions between all characters in each text.In this model, each annotator contributed a set of“yes” or “no” decisions, one for every characterpair. We then applied the kappa measurement foragreement in a binary classification problem (Co-

(Elson et al., ACL 2010)

Strindberg IbsenGender Pronouns 3rd

HonorificsDeterminers

FemaleFemaleMale

Pronouns 3rdFamily wordsModals

FemaleFemaleMale

Age Nouns singularFamily wordsModals

OldYoungYoung

Family wordsVerbs sing. pres. non-3rdPrepositions

YoungYoungOld

Class For/withVerbs past part.Honorifics

LowerUpperLower

For/withHonorificsNouns singular

LowerLowerLower

Table 6: Most useful features for gender, age, and class for each author, determined by examining thecoefficients of classifiers that performed above baseline during cross-author testing. The pairs in thetable consist of a linguistic feature and the label indicated by more frequent use of that feature (e.g. forStrindberg, third-person pronoun usage contributed to predicting gender, with greater usage indicating afemale character). Features marked in bold are shared between authors for a given attribute.

validation, if the trained classifier for a given foldperformed above the baseline of its own test set,then we record its three most heavily weighted fea-tures. At the end, we have a tally of which fea-tures most often appeared in the top three featuresfor such better-performing classifiers. We can usethis to compare which features were more consis-tently involved for each author and attribute pair,as shown in Table 6.

Some of the useful features are more intuitivethan others. For example, as mentioned in anearlier section, it seems reasonable that familywords may relate to depictions of gender rolesof the time period in which the plays were writ-ten, with women being expected to take on so-cial roles more confined to the home. This ap-pears to be true for Ibsen, but not for Strindberg.We also see family words suggesting young char-acters for both authors’ texts. It seems intuitivethat authors may have chosen to depict children asspending more time around family members, in-cluding using family terminology as terms of ad-dress. The use of honorifics is also as predictedearlier in the paper: lower class characters usemore higher terms of address, presumably wheninteracting with their superiors. Another inter-esting result is the frequency of third-person pro-nouns being the most useful predictor of gender,indicating female characters for both authors. Pos-sibly, women may have spoken more about otherpeople than men did in these texts.

Some other results are not as easy to explain.For example, the use of the prepositions for andwith was consistently the most useful predictor oflower class characters (which could explain why

the models performed comparably on opposite au-thors in Table 5). An interesting result was themore frequent use of singular, present tense, non-third person verbs among young characters in theIbsen texts. This suggests that young charactersused more verbs centered around I and you in thepresent tense. One possible explanation is thatchildren were depicted as being more involved intheir own personal world, speaking less about peo-ple they were not directly interacting with in agiven moment.

5 Conclusion

We have presented a dataset of translated plays byAugust Strindberg and Henrik Ibsen, along withcomputational models for predicting the sociolin-guistic attributes of gender, age, and social classof characters using the aggregation of their tex-tual lines-to-be-spoken. We compared the per-formance and important features of the models inboth intra- and inter-author testing using a cross-author validation procedure, finding that modelsgenerally performed above challenging baselinesfor their own authors, but less so for the other,as one would expect. The exception was the so-cial class variable, which was consistently slightlyabove baseline regardless of the author used fortesting. While this could indicate that the trans-lated Strindberg and Ibsen texts conveyed socialclass using similar linguistic cues, this remains atopic for future exploration, given the class im-balance for that attribute. We also examine someindicative features for each attribute and authorpair, identifying similarities and differences be-

15

(Bullard and Ovesdotter Alm, CLfL 2014)

Page 9: Introduktion till språkteknologi

Språkteknologiska metoder

Före 1990-talet dominerades området av rationalistiska(deduktiva, regelbaserade, symboliska) metoder.

Lingvistisk teori och logiska regelsystem formar teknikenDatorlingvister försöker programmera inmänsklig språkkunskap i datorn.

I dag domineras området av empiriska(induktiva, datadrivna, statistiska) metoder.

Korpusdata och statistiska modeller formar tekniken.Datorlingvister försöker programmera datornså att den lär sig kunskap om språk från korpusdata.

Maskininlärning är mycket effektiv, menden kräver bra modeller!

Nästa steg

2015-11-11 kl 10–12 (i morgon):Språkmodeller, skrivstöd

Dickinson et al., kap. 2

2015-11-11 kl 13–15 eller 2015-11-12 kl 13–15Labb om ord och n-gram

Titta på labbinstruktionerna!

Inlämningsuppgift 1

Presentera en språkteknologisk tillämpning

BeskrivningEkonomiska aspekter:

Vem tillhandahåller tjänsten?Vilka använder den?Hur tjänar man pengar med den?

Teknologiska aspekter:Hur fungerar tjänsten?Vilka teknologier använder den?

Page 10: Introduktion till språkteknologi

Inlämningsuppgift 1

Short paper, 3–4/4–6 sidor (+ referenser)Format som ett ACL short paper.Vetenskaplig stil med källhänvisningar mm.Inlämning den 10 december.

Poster i A3-format (max 2 blad)Inlämning den 14 december.Presentation den 17 december.

Vetenskaplig stil?

Gör tydlig skillnad mellan dina egna och andras bidrag.

Redogör tydligt för källor som använts.

A multifactorial analysis of explicitation in translation

Sandrine Zu!erey and Bruno CartoniUtrecht University / University of Geneva

"e search for translation universals has been an important topic in translation studies over the past decades. In this paper, we focus on the notion of explici-tation through a multifaceted study of causal connectives, integrating four di!erent variables: the role of the source and the target languages, the in#uence of speci$c connectives and the role of the discourse relation they convey. Our results indicate that while source and target languages do not globally in#uence explicitation, speci$c connectives have a signi$cant impact on this phenomenon. We also show that in English and French, the most frequently used connectives for explicitation share a similar semantic pro$le. Finally, we demonstrate that explicitation also varies across di!erent discourse relations, even when they are conveyed by a single connective.

Keywords: translation universals, explicitation, discourse connectives, causality, corpus-based translation studies

1. Introduction

An important topic in translation studies over the past decades has been the iden-ti$cation of ‘translation universals,’ that is properties of translated texts triggered by the process of translation (e.g., Vanderauwera 1985; Mauranen and Kujamäki 2004). Many potential translation universals have been formulated, such as simpli-$cation (Laviosa-Braithwaite 1997), conventionalization (Baker 1993) and under-representation of target-language speci$c items (Tirkkonen-Condit 2004). But one of the most well known hypotheses about translation universals, called the “explicitation hypothesis,” was formulated by Blum-Kulka (1986, 292):

["e explicitation hypothesis] postulates an observed cohesive explicitness from SL to TL regardless of the increase traceable to di!erences between the two linguistic and textual systems involved. It follows that explicitation is viewed here as inherent in the process of translation.

Target 26:3 (2014), 361–384. doi 10.1075/target.26.3.02zufissn 0924–1884 / e-issn 1569–9986 © John Benjamins Publishing CompanyVetenskaplig stil?

Gör tydlig skillnad mellan dina egna och andras bidrag.

Redogör tydligt för källor som använts.

382 Sandrine Zu!erey and Bruno Cartoni

5. Absolute uses designates uses of a connective in a self-contained answer, as in the following exchange: Marie: « Pierre n’est-il pas formidable » ? — Jeanne: « En e!et ».

6 "e absolute numbers for each relation are: justi#cation = 319 and con#rmation = 71.

References

Asher, Nicholas. 1993. Reference to Abstract Objects in Discourse. Dordrecht: Kluwer. DOI: 10.1007/978-94-011-1715-9

Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications.” In Text and Technology: In Honour of John Sinclair, ed. by Mona Baker, Gill Francis, and Elene Tognini-Bonelli, 233–250. Amsterdam: John Benjamins.

Baker, Mona. 1995. “Corpora in Translation Studies: An Overview and Some Suggestions for Future Research.” Target 7 (2): 223–243. DOI: 10.1075/target.7.2.03bak

Becher, Viktor. 2010a. “Abandoning the Notion of ‘Translation-Inherent’ Explicitation: Against a Dogma of Translation Studies.” Across Languages and Cultures 1 (1): 1–28. DOI: 10.1556/Acr.11.2010.1.1

Becher, Viktor. 2010b. “Towards a More Rigorous Treatment of the Explicitation Hypothesis in Translation Studies.” Trans-Kom 1: 1–25.

Becher, Viktor. 2011. “When and Why Do Translators Add Connectives?” Target 23 (1): 26–47. DOI: 10.1075/target.23.1.02bec

Blum-Kulka, Shoshana. 1986. “Shi$s of Cohesion and Coherence in Translation.” In Interlingual and Intercultural Communication, ed. by Juliana House, and Shoshana Blum-Kulka, 17–35. Tübigen: Narr.

Cartoni, Bruno, and "omas Meyer. 2012. “Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies.” In Proceedings of Language Resources and Evaluation Conference, 2132–2137. Istanbul, Turkey.

Cartoni, Bruno, Sandrine Zu!erey, and "omas Meyer. 2013. “Using the Europarl Corpus for Linguistic Research.” Belgian Journal of Linguistics 27: 23–42. DOI: 10.1075/bjl.27.02car

Charolles, Michel, and Benjamin Fagard. 2012. “En e!et en français contemporain: de la con#r-mation à la justi#cation/explication.” Le français Moderne 80 (2): 137–164.

Danlos, Laurence. 2012. “Formalisation des Conditions d’Emploi des Connecteurs ‘en réalité’ et ‘(et) en e!et’.” In Proceedings of 3e Congrès Mondial de Linguistique Française, 493–508. Lyon, France.

Degand, Liesbeth. 2000. “Contextual Constraints on Causal Sequencing in Informational Texts.” Functions of Language 7 (2): 173–201. DOI: 10.1075/fol.7.2.02deg

Degand, Liesbeth, and Henk Pander Maat. 2003. “A Contrastive Study of Dutch and French Causal Connectives on the Speaker Involvement Scale.” In Usage-Based Approaches to Dutch: Lexicon, Grammar, Discourse, ed. by Arie Verhagen, and Jeroen van de Weijer, 175–199. Utrecht: LOT.

Degand, Liesbeth, and Benjamin Fagard. 2012. “Competing Connectives in the Causal Domain. French Car and Parce Que.” Journal of Pragmatics 34 (2): 154–168. DOI: 10.1016/j.prag-ma.2011.12.009

Page 11: Introduktion till språkteknologi

Tipps och instruktioner

http://stp.lingfil.uu.se/~matsd/pub/akupp.pdf

Introduktion till att skriva akademiska uppsatser i språkteknologi

http://www.im.uu.se/student/guide-till-harvardsystemet/

Instruktioner för ett vanligt sätt att hanteraciteringar och referenser

Olika sätt att hitta vetenskaplig litteratur

Har man redan hittat någon artikel eller bok, så kan man

titta i bibliografin för att hitta äldre litteratur

slå upp den på exempelvis Google Scholarför att hitta nyare litteratur

Olika sätt att hitta vetenskaplig litteratur

Börjar man helt från början, så kan man till exempel

söka på webben (med försiktighet)

söka på bibliotekethttp://www.ub.uu.se/soktips-och-sokteknik/söka i databaser, exempelvis

ACL Anthology (språkteknologi)http://www.aclweb.org/anthology/MT Archive (maskinöversättning)http://www.mt-archive.info/MLA International Bibliography (lingvistik)genom universitetsbibliotekets webbsida

be någon om hjälp. . .