Grapho-phonological parsing: Corpus annotation for … › bmolinea › ESHP3.pdf‘InvesQgaQng...

Preview:

Citation preview

DataavailablefromLAOS:

Example:LAOStext#341,NLS

Ms34.4.3 Year:

TheFromInglisToScots(FITS)ProjectandOlderScotsphonologyFITS(AHRCgrantnumberAH/L004542/1)isafour-yearprojectattheAngusMcIntoshCentreforHistoricalLinguisQcs.Focus:thesound/spellinghistoryofearlyScotsasevidencedinrootmorphemesofGermanicoriginMainRQ:WhatphonologicalfactsunderliethediversityofspellingaWestedinScotsoftheperiod1380-1500?Mainoutput:afreelyavailable,fullysearchableonlinedatabasewhichestablishes,quanQfiesandvisualisesrelaQonsbetweenunitsofsoundandtheirspellings.Possibleuser-definedquesQons:• Whatsound(s)didthedigraph<ch>representin15th-centuryScots?• Whenandwhereistheta-hardening([θ]>[t])aWestedinearlyScotsspellings?• WhatarethereflexesofOldEnglish/f/in15th-centuryScots?

Historicalcorpusphonology:canitbedone?VariaQoninnon-standardisedalphabeQcsystems,suchasthoseofpre-modernEurope,haslongbeenexploitedtoreconstructdiachronicanddiatopicalternantsinphonologicalhistories(e.g.McIntosh1956;Laing&Lass2003).However,electroniccorporaforthehistoryoflanguagearerarelybuiltwithphonologicalquesQonsinmind.Historicalsoundsubstanceismediatedbyagraphicsystemwhichmakesitdifficulttointerpretthebasicfacts.Thebuildingofhistoricalphonologicalcorpora,whilepossible,requiresafairdegreeofpreliminaryanalysisinordertoestablishthepotenQalsound-spellingmappingsofthelanguage.Whilethismaybeapainstakingfirststep,itissurprisingnobespoketoolshavethusfarbeendevelopedtoassistintheprocess.

Theoriginaldataset:ALinguis3cAtlasofOlderScots(‘LAOS’,Williamson2008)• c.1,250‘localdocuments’Burghrecords,charters,deeds,wills,etc.• c.400,000words• Mostlylocalisedanddated1380-1500• DiplomaQcallytranscribedandlexico-grammaQcallytagged

BibliographyAitken,A.J.&CarolineMacafee.2002.TheOlderScotsvowels:AhistoryofthestressedvowelsofOlderScotsfromthebeginningstothe

eighteenthcentury.Edinburgh:TheScoqshTextSociety.Alcorn,Rhona,BenjaminMolineaux,JoannaKopaczyk,VasiliosKaraiskos,BeWelouLos&WarrenMaguire.2017.'Theemergenceof

Scots:CluesfromGermanic*areflexes'inJ.CruickshankandR.McCollMillar(eds.)BeforetheStorm:PapersfromtheForumforResearchontheLanguagesofScotlandandUlstertriennialmeeBng,Ayr2015,pp.1-32.Aberdeen:FRLSU.

CoNE.2013ACorpusofNarraBveEtymologiesfromProto-OldEnglishtoEarlyMiddleEnglishandaccompanyingCorpusofChangescompiledbyRogerLass,MargaretLaing,RhonaAlcorn&KeithWilliamson[hWp://www.lel.ed.ac.uk/ihd/CoNE/CoNE.html].Edinburgh:Version1.1,2013-,©TheUniversityofEdinburgh.

Maguire,Warren,Alcorn,Rhona,BenjaminMolineaux,JoannaKopaczyk,DaisySmith,VasiliosKaraiskos&BeWelouLos.Forthcoming.‘InvesQgaQngevidenceforfinal[v]-devoicinginOlderScots’.

McIntosh,Angus1956.‘TheanalysisofMiddleEnglishtexts’.TransacBonsofthePhilologicalSociety55(1):26-55.Molineaux,Benjamin,JoannaKopaczyk,WarrenMaguire,RhonaAlcorn,VasiliosKaraiskos&BeWelouLos.2016.‘TracingL-vocalisaQon

inearlyScots’.PapersinHistoricalPhonology1,pp.187-217.Molineaux,Benjamin,JoannaKopaczyk,WarrenMaguire,RhonaAlcorn,VasiliosKaraiskos&BeWelouLos.Forthcoming.‘Anemergent

15cScotsspellingnorm:contrasQvevoicingindentalfricaQves’Johnston,Paul.1997.‘OlderScotsphonologyanditsregionalvariaQon’.InCharlesJones(ed.)TheEdinburghhistoryoftheScots

language,47-111.Edinburgh:EdinburghUniversityPress.Kopaczyk,Joanna,BenjaminMolineaux,VasiliosKaraiskos,RhonaAlcorn,BeWelouLos&WarrenMaguire.2018.‘Towardsagrapho-

phonologicallyparsedcorpusofmedievalScots:DatabasedesignandtechnicalsoluQons’,Corpora13(2).Laing,Margaret&RogerLass.2003.’Talesof1001nists:ThephonologicalimplicaQonsofliWeralsubsQtuQonsetsinsomethirteenth-centurySouth-WestMidlandtexts',EnglishLanguageandLinguisBcs7(2),pp.257-278.

LAOS.2008.ALinguisBcAtlasofOlderScots,Phase1:1380-1500.CompiledbyKeithWilliamson.RetrievedfromhWp://www.lel.ed.ac.uk/ihd/laos1/laos1.html.TheUniversityofEdinburgh.

Grapho-phonologicalparsing:Mappingspellingstosounds: Weassumethatoursourcematerialsweresetdownbyscribes“capableofsophisQcatedandsubtlelinguisQcanalysis”(Laing&Lass2003:258),soweexpecttheretobeasystemaQcconnecQon—albeitnotnecessarilyaone-to-onematch—betweenorthographicchoicesandunderlyingsoundsystems.Eachvariantspellingoftheroot-morphemesintheLAOScorpusisbrokenupintoasequenceofgraphemicunits,preservingtheirmorphological/graphologicalcontext.

EachgraphemeisthenassignedaplausiblesoundvaluebytriangulaQngonanumberoffactors(seeKopaczyketal.2018fordetails):

TheMedusa:Grapho-phonologicalsetsvisualisation*

Geographicalpinpointingofattestations

Viewingattestationsincontext(texts)

Mappingsoundstosources:ThediachronicdimensionSinceasizeableamountofwell-describeddataisavailablefortheGermanicsourcesofOlderScots,(OldEnglish,NorseandMiddleDutch),wecanidenQfymostofthelikelyhistoricalantecedentsofourtargetmorphemes.ThisallowsustopinpointparQculardiachronictrajectoriesforsoundsandmorphemes,helpingusalsoimprovetheaccuracyofourproposedsoundvaluesfortheOlderScotsperiod.WeaWempttomatcheachsoundintheOlderScotslayertotheaWestedformintherelevant(usuallynorthern)dialectsofOldEnglish,aswellasNorseandMiddleDutch.Wherethereisamismatchbetweenthesourceandthecorpusform,weproposeachange,drawingonexisQngliteratureandthegeneraldistribuQonofouraWestedvariants.

Whatcanyoudowithagrapho-phonologicallyparsedcorpus?Thecorpusallowsforafine-grainedexaminaQonofthephonotacQcandmorphotacQcdistribuQonofindividualsound-spellingpairingsaswellasvariaQonintheirvaluesoverQme,spaceandtext.Itfurtherallowsusersto:• Selectspecificsound,orthographicandgrammaQcalenvironments• DefinetemporalandspaQaldomainsforsearchresults• Traceetymologicalsourcesmorpheme-by-morphemeandsound-by-sound• LinketymologicalsourcestocorpusaWestaQonsviaaCorpusofChanges• FurtherinvesQgateformsvialinkstotheonlineDicQonaryoftheScotsLanguageandOED• Accessfullsourcetextsforcontext-checkingandcreaQngscribalprofiles

*MedusaIIisunderdevelopmentandwillallowmappingofsourcesegmentstoattestationsinourcorpus

WestartwithindividualtokensandestablishpaWernsacrosstheenQredataset.Asmoredataisenteredinthedatabase,theiniQalassumpQonsarereevaluated.Gradually,weestablishanetworkofrelaQonshipsbetweenthegraphemicunitsandtheirplausibleunderlyingsounds.WeuseabespokevisualisaQontoolcalledMedusa.• SoundsubsKtuKonsetsidenQfysoundsassociatedwithaspecificgrapheme• GraphemicsubsKtuKonsetsidenQfygraphemesassociatedwithaparQcularsound.Manysoundsandgraphemesbelongtomorethanoneset.

AcorpusofChangesFollowingtheexampleinCoNE(Lassetal.2013),wegiveadetaileddescripQonofeachoneofthechangesinvokedtomaptheproposedsourceformtotheplausibleFITSsoundvalue.

AgraphemicsubsKtuKonset:[ð]&[θ],morphemeiniKally

TheCorpusofChanges:

Whathavewefoundoutsofar?1. OurperiodhasfewlocalisableinnovaQons,asmightbe

expectedfromarelaQvelynewdialect(Alcornetal.2017).2. Changesdescribedelsewhereasquickandcomplete(such

asL-vocalisaQon)mayprogressslowlyoverQme,phonologicalenvironmentsandthelexicon(Molineauxetal.2016).

3. OlderScotsasawholeinnovateddisQnctspellingconvenQonssuchas<y>for[ð]vs.<th>for[θ](Molineauxetal.forthcoming)

4. Somechangesadvancedduringourperiodandlaterprobablyreversed(especiallyinthefaceofAnglicisaQon),suchasthecaseofpre-inflecQonaldevoicingoffricaQves(Maguireetal.forthcoming)

ProporQonofmedial<y>(orange),<th>(grey)and<þ>(yellow)foretymological[ð]bydecade.Blackline=datadensity.

leader trailer

AsoundsubsKtuKonset:<ch>

Tableview(+dataextraction)

Based on all FITS morphemes with <ch>๏ [x] = aucht, dochter, loch …๏ [ç] = nicht, echt, hech …๏ [θ] = bach, lench, muoch, strencht … ๏ [ʧ] = chalys, cheike, chekin, cheis …๏ [k] = chorn, chynde, chechyne …๏ [ð] = worchy, nechtir, skachlaß …

Datacapturetool

Grapho-phonological parsing: Corpus annotation for historical phonology

B. MOLINEAUX1, J. KOPACZYK2, V. KARAISKOS1, D. SMITH1, W. MAGUIRE1, R. ALCORN1 & B. LOS1 1 The University of Edinburgh; 2The University of Glasgow

[ð]-morphemes:thus,there,

those,thence,etc.

[θ]-morphemes:three,thief,think,thaw,thanketc.

The FITS Toolbox

spellings

sounds

Recommended