View
17
Download
0
Category
Preview:
Citation preview
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
149
Received20September2017.
Accepted30October2017.
VARIATIONOFTHESECONDPERSONSINGULAROFTHESIMPLEPAST
TENSEINTWITTER:HICISTEVS.HICISTES‘YOUDID’1
AntonioRUIZTINOCO
UniversidadSophia*∗
a-ruiz@sophia.ac.jp
Abstract
Inthisstudy,thegeographicaldistributionofthe-saddedtothesimplepasttenseofindicative
inseveralhighfrequencyverbsisanalyzed,i.e.hicistesvshiciste.Fordoingso,ageocorpusformedby
geolocatedtweetshasbeenused.Thedataobtainedsuggestthatthetraditionalexplanationofaddition
byanalogywithotherverbtensesisstillvalid,especiallyinruralareas,althoughitisnottheonlyfactor.
Inurbanareasofvoseo,asisthecaseoftheRíodelaPlatavariety,thereisnogreatdifferenceinthe
ratioof-saddedwiththesurroundingruralareas,ashadbeenconsideredinpreviousstudies.
Keywords
morphologicalvariation,geocorpus,socialnetworks,GIS
LAVARIACIÓNDELASEGUNDAPERSONADELPLURALDELPRETÉRITOPERFECTO:HICISTEVS.HICISTES
Resumen
Enelpresenteestudioseanalizaladistribucióngeográficadela-sañadidaalpretéritosimplede
indicativoenvariosverbosdealta frecuencia, comoenhicistes vshiciste.Paraello sehautilizadoun
geocorpusformadoportuitsgeolocalizados.Losdatosobtenidossugierenquelaexplicacióntradicional
1ThisstudyisarevisedversionofacommunicationintheXLIIICongressoftheJapaneseAssociationofHispanists held at theUniversity of Kanagawa, Japan, onOctober 7 and 8, 2017. Thiswork has beendeveloped under the FFI2013-41077-P project, funded by the Spanish Ministerio de Economía yCompetitividadandbyJSPSKAKENHIGrantNumber15K02527.*FacultyofForeignStudiesDepartmentofHispanicStudies,7-1Kioicho,Chiyoda,Tokyo102-0094.
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
150
de añadidura por analogía con otros tiempos verbales sigue siendo válida, especialmente en áreas
rurales, aunque no es el único factor. En zonas urbanas de voseo como es el caso de la variedad
rioplatense,noseapreciaunagrandiferenciaderatiode-sañadidasconlaszonasruralescircundantes,
comosehabíaconsideradoenestudiosanteriores.
Palabrasclave
variaciónmorfológica,geocorpus,redessociales,SIG
1.Introduction
The formof the2ndpersonof thepast tenseending in -s ashicistes ‘youdid’
occurs frequently and is a phenomenon of variation on which there are abundant
studiesinalmostallSpanish-speakingcountries,suchastheworksbyCharlesE.Kany
(1945) and John Lipski (1994). Lipski considers it themost acceptable non-standard
formintheareasofvoseo.ThephenomenonhasalsobeenstudiedbyMaríaVaquero
(1998). Other more recent quantitative approaches are the works by Sonia Barnes
(2012)andClaudiaParodi (2015) inSpanish-speakingareasoftheUnitedStates,etc.
Ralph Penny (2002) attributes hicistes to the analog extension of the final -s in the
other 2nd person forms. Although this form is not academically accepted, it is
frequently used in the informal style.On the other hand, according to FragoGracia
(2003), it is observed in low and middle social classes in both Spanish and Latin
AmericanSpanish.TheRAE(2010)considersthenon-standardformnon-correcttoday
althoughitmentionsthatitisfrequentinhistoricaltexts.Ingeneral,itisaruralform,
discreditedandstigmatized.Manyofthestudiesofthisphenomenonhavebeenbased
onspokenSpanishindifferentperiods,includinghistoricalSpanishandlimitedtosome
geographicalareas.Thus,itisdifficulttoobtainaglobalvision.
Thiswork analyzes the geographical distribution of the -s added to the simple
pasttenseof indicative inasampleofgeolocatedtweetscontainingformsofseveral
highfrequencyverbsinTwittermicroblogs.Fordoingso,ageocorpusofapproximately
32milliontweetsfromalltheSpanish-speakingareasandcollectedbetweenJanuary
2016andOctober2017hasbeenused.Whencomparingtheoccurrencesofstandard
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
151
andnon-standardforms,differencesinfrequencyofusedependingoneachverbhave
beenfound.Thesedifferencescannotbeattributedsolelytothespeaker’sperception
of an analogy with the 2nd person of other verbal tenses. Moreover, relative
frequenciesinareasofvoseodonotseemtofavortheiruseespecially.
InCatalan,therearealsoseveralcasesofcontrastbetween-sand∅,inthiscase
in relation to the 2nd person of the plural of the imperative of certain verbs. This
persondoesnotshowtheusual2ndpersonmorpheme -s inmostverbs,although it
appearsinsome(dir‘tosay’,ser‘tobe’,estar‘tobe’,fer‘todo’,etc.).Insomedialects
there isacontrastbetweenthepresenceandthedeletionof -s,especiallywhenthe
verbisfollowedbyaclitic.Thus,“Laflexióverbalenelsdialectescatalans”,byAntoni
M. Alcover & Francesc de B. Moll, records the following forms for the 2nd person
singular of the imperative of the verb dir ‘to say’: digues – dis, according to the
dialects,butdigue’mvsdis-me,digue-livsdigues-li–dis-li,whentheyarefollowedby
apronoun.Theresultsofanongoingresearchwillshowiftheformsaremaintainedin
the respective dialectal areas, if the action of the standardmay have increased the
solutionswith-sandiftherearepossiblepointsincommonwithSpanish.
The data has been inserted into a MySQL relational database. For the
preparationoftherespectivedistributionmapsandthevisualizationoftheresultsof
theanalysis,opensourcesoftwareQGIShasbeenused.
The resulting corpus allows us to compare the use of such forms in the same
periods of time in distant areas, which would not have been possible through
traditionalmethodsofdatacollection. Inthiswaythephenomenoncanbeobserved
fromdifferentperspectives.
2.Geocorpus
The data comes from a geocorpus of collected tweets that have the following
characteristics:
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
152
a.ThedatahasbeenobtainedthroughdirectconnectiontoStreamingAPI1.1of
Twitter,2whichisapproximately1%ofalltweetsthatcanbeobtainedfreely.
b.Onlygeocodedtweets,3whichareafractionofthetotal,havebeencollected,
andforwardedtweets,retweets,havebeenexcludedbecausetheyareduplicatesand
donotnecessarilycoincidewiththeplaceoforigin.
c. From January 1, 2016 to October 1, 2017, 24 hours a day, approximately
32,400,000 tweets have been collected; so, a high degree of synchrony can be
considered.
d. For the preparation of this geocorpus and according to our objectives, a
modifiedversionof140devStreamingAPIFramework,4programmedinPHPbyAdam
Greenhasbeenused.
e. The tweets are arranged in aMySQL database,5accessiblewith a password
overtheInternet.
3.Dataselection
Forouranalysiswehavechosentenverbsinthe2ndsingularpersonalformof
the simple past tense with high frequencies. We have used the corpus of Spanish
(Web-Dialects)6by Mark Davies (2016) applying the search formula “(*ste.[VIS*])”
withwhichweobtainthefrequencyofthestandardverbformsofthesimplepastof
tense indicativeending in -ste,orderedbyfrequency.Theform fuiste,corresponding
tothepasttenseoftheverbsser‘tobe’andir‘togo’,hasbeeneliminated.InFigure1
thelistofobtainedverbsisshown,fromwhichwechoosethetenmostfrequentones:
hiciste‘youdid’,dijiste‘yousaid’,diste‘yougave’,tuviste‘youhad’,dejaste‘youleft’,
llegaste ‘youarrived’,pusiste ‘youput’,escribiste ‘youwrote’,pudiste‘youwereable
to’andestuviste‘youwere’.
2https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data3Thatis,containingthefieldsoflongitudeandlatitudeoftheplaceofemission.4http://140dev.com/free-twitter-api-source-code-library/5https://www.mysql.com/6https://www.corpusdelespanol.org/web-dial/
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
153
Figure1.Frequentverbsinthe2ndpersonfromthesimplepast
Although the Corpus of Spanish is lemmatized, it does not distinguish non-
standard forms as hicistes ‘you did’, etc., so they have to be searched individually.
Figure 2 shows the absolute and relative frequencies of hicistes in the Spanish-
speakingcountries.
Figure2.Frequenciesoftheformhicistes‘youdid’intheCorpusofSpanish(Web-Dialects)
383143524607480655785848
71948413
1080014650
23352
0 5000 10000 15000 20000 25000
PERDISTEESTUVISTEPUDISTE
ESCRIBISTEPUSISTE
LLEGASTEDEJASTETUVISTEDISTEDIJISTEHICISTE
Frequentverbs(second-personsingular,preterite)
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
154
Moreover,Figure2showsthatthefrequencypermillionwordsvariesbetween
0.03p.m.fromtheDominicanRepublic,althoughofasingleoccurrence,and0.86p.m.
from Nicaragua. These data suggest that there is a considerable difference in the
extentofthephenomenonamongSpanish-speakingcountries,butitisnotpossibleto
seemoredetaileddistributionsduetothemethodologyusedforthiscorpus.
4.Interface
Tosearchthegeocorpusinthedatabase,theAdminer7interfacehasbeenused,
asshowninFigure3.
Figure3.Adminerinterfaceshowingpartialsearchresults
Adminer has been used because it allows us to easily apply all the formulas
availableinMySQL,includingregularexpressions.Italsoallowstoexporttheresultsto 7https://www.adminer.org/
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
155
various formats for further processing. In Figure 4 some data and several fields are
showninMS-Excel,includingthetext,theexactdateandthecoordinatesoftheplace
fromwhereeachtweetwassent.
Figure4.Partialdataofdistes‘yougave’inMS-Excel
5.Dataobtained
The data obtained from all the Spanish-speaking countries of the ten selected
verbsisshowninTable1.Thereare33,921standardforms,961non-standard,andthe
ratioperverbvariesbetween1,97%forhiciste/sand4,18%fordiste/s.
VERBFORM without-s with-s ratio%
estuviste/s 1613 49 3.03
pudiste/s 1060 30 2.83
escribiste/s 255 8 3.13
pusiste/s 1290 39 3.02
llegaste/s 6500 229 3.52
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
156
VERBFORM without-s with-s ratio%
dejaste/s 3841 106 2.75
tuviste/s 1570 34 2.16
diste/s 3987 167 4.18
dijiste/s 4603 117 2.54
hiciste/s 9202 182 1.97
Total 33921 961 2.83
Table1.Frequenciesobtainedfromtheformswith-sandwithout-s
ThedistributionofoccurrencesbycountryisshowninTable2.
Table2.Distributionofnon-standardformsbycountry
Table 2 shows that Argentina, Spain and Venezuela have most occurrences;
whereas,CostaRica,ElSalvador,etc.havefewest.Forthatreason,weneedmoredata
2
3
3
4
6
7
12
13
19
22
32
43
49
51
51
76
141
176
229
0 50 100 150 200 250
CostaRica
ElSalvador
Honduras
DominicanRepublic
Paraguay
Guatemala
Nicaragua
Chile
PuertoRico
Peru
UnitedStates
Ecuador
Mexico
Panama
Uruguay
Colombia
Venezuela
Spain
Argentina
Non-standardforms(occurrences)
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
157
sothatthesefigurescanbesignificant.Lowfiguresarenotonlyduetothelowertotal
population,butalsotheymaybeduetothesmallerpopulationthatusesTwitter.
Table3showstheproportionofuseofthetwoforms,whichgivesusanewview
of the phenomenon. Thus, the highest proportion is found in Puerto Rico (18.45%),
Panama(14.74%),Nicaragua(13.33%),Venezuela(7.6%),etc.CountriessuchasSpain
(5.31%)orArgentina(1.51%)havelowerratiosasawhole,asshowninFigure5.
Table3.Distributionbycountryofnon-standardformsvsstandardformsratios
0,77
1,01
1,11
1,16
1,51
2,64
2,67
3,31
3,32
3,61
4,29
4,6
5,31
5,57
6,86
7,6
13,33
14,74
18,45
0 2 4 6 8 10 12 14 16 18 20
DominicanRepublic
Chile
Mexico
Paraguay
Argentina
Guatemala
CostaRica
Colombia
Peru
ElSalvador
Honduras
UnitedStates
Spain
Uruguay
Ecuador
Venezuela
Nicaragua
Panama
PuertoRico
StandardvsNon-standardformsratios%
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
158
Figure5.Absoluteresults(occurrences)vs.relativeresults(ratio)
6.Visualizationofthedata
For the visualization of the data,QGIS8 ver. 2.18, a free and open source GIS
applicationhasbeenused. Itcontainsadatamanagementsystemasadatabase.This
GIS system allows various forms of data selection and visualization. For example,
Figures6aand6bshowasmallreddotforeachoccurrenceofthestandard(6a)and
non-standard (6b) forms, proving that bothoccur in all Spanish-speaking areas,with
higher absolute frequency in the most populated areas, such as Argentina, Spain,
Venezuela,Colombiaandothergeographicalareas.
Figure6a.Standardformswithout-s Figure6b.Non-standardformswith-s
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
159
For a better visualization, the zones of higher absolute frequencies can be
presentedwithheatmaps,asinFigures7aand7b.Thismethodisaformofclustering,
since it distinguishes the zones of higher absolute frequencybymeansof a scale of
colors.Theparametersforthecolorscalecanbeeasilymodifiedbyvaryingtheradius
ofinfluence(bymeters,kilometers,etc.)ofeachoftheoccurrences.Figure7ashows
the heatmap corresponding to the distribution of the standard forms and Figure 7b
showstheformsfinishingwith-s.Inthesemapswecanseesomeareas(inred)where
non-standardoccurrencesare concentrated, suchas southernSpain,Ríode laPlata,
Venezuela,etc.
Tobettervisualizethedatawithgreatergranularitywecanreducethemapto
a specific geographical area as shown in Figures 8a and 8b, of Spain and Argentina
respectively,andfordoingsowehavemodifiedtheradiusofinfluence.
Figure7a.Standardformswithout-s Figure7b.Non-standardformswith-s
Figure8a.Non-standardforms(Spain) Figure8b.Non-standardforms(Argentina)
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
160
Figure 8a shows that the geographical areas in Spain withmost non-standard
forms areMadrid and Seville, as well as the area around the Bay of Cadiz and the
province of Malaga. Although Barcelona, Zaragoza and Valencia have a larger
populationthanthesesouthernareas,itisanindicationoffrequentuse.InCatalonia,
AsturiasandGaliciathereisnonoticeableconcentrationofoccurrences.Analogously,
wecanobserve inFigure8bthat theareaofGreaterBuenosAires (about15million
inhabitants)andMontevideoconcentratemanymoreoccurrencesthancitieswithalso
alargepopulationsuchasSantiagodeChile(approximately5millioninhabitants).
The absolute and relative data for each country can also be visualized using
choropleth maps, with scales of colors like Figures 9 and 10. In Figure 9, Spain,
Argentina,Mexico,Venezuela,Colombia,etc.arethecountrieswiththedarkestblue
color,thatis,withmoreoccurrencesthanothers.Figure10showsthecountrieswith
the highest ratio as shown in Table 3. In the choroplethmaps, the same color or a
graduatedscaleisusuallyusedforeachcountry(province,city,etc.)tomarkregional,
provincialdifferences,etc.Boliviaiswhiteduetothelackofdata.
Figure9.Choroplethmapsoftheoccurrencesofnon-standardforms
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
161
Figure10.Choroplethmapoftheratiosofnon-standardforms
Analogously to Figure 10,we can visualize the ratios by provinces, usingQGIS
withmapswithlessextensiveadministrativedivisions,asshowninFigure11forSpain.
This figure shows that the highest ratios are concentrated in the Southwest of the
country,especially intheprovincesofCordoba,Huelva,CaceresandCadizaswellas
theCanaryIslands,beingvirtuallynon-existentorwithverylowratiosintherestofthe
country. The areas that include cities with a larger population such as Madrid,
Barcelona,ValenciaandZaragozaarenotfoundinzonesthataddthe-s,exceptSeville,
in themiddle of an area that usually adds the -s. These areas coincide largelywith
southernvarietiessuchasAndalusiaandExtremadura,stigmatizedinSpain.
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
162
Figure11.Choroplethmapoftheratiosofnon-standardformsintheprovincesofSpain
Withthehelpofgeographicdata,ofteninshapefile8format,withdetailsofthe
contours of any area, such as countries, cities, neighborhoods, etc., much more
detailedmaps can be prepared such as the following. In Figure 12 of the extensive
provinceofBuenosAires,theFederalDistrict,inredonthemap,containsthehighest
concentrationofoccurrencesofnon-standardforms.However,inthemapofFigure13
thesurroundingareasofferthehighestratioofnon-standardforms,asinSpain,these
beingmorefrequentinruralareasthaninurbanareas.
8https://en.wikipedia.org/wiki/Shapefile
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
163
Figure12.Thematicmapofnon-standardformsintheprovinceofBuenosAires
Figure13.Thematicmapoftheratiosofnon-standardformsintheprovinceofBuenosAires
AlthoughLipski(1994)saidthatthenon-standardformwasmoreacceptable,at
present,withdataofdifferentnaturesuchastweets,suchacceptanceisnotobserved,
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
164
since the ratios are relatively low with exceptions such as Baradero, La Plata, San
Vicente, etc. The data also suggests that the ratio is especially low in the Federal
District, themosturbanareaof theprovince. InBarnes (2012),usingdata fromoral
transcription of the Corpus de Referencia del Español Actual (CREA), 9 the Habla
Popular10byLopeBlanch(1976)andtheCorpusdelEspañolbyMarkDavies,theratios
oscillate between 13% and 16%, whereas in the geocorpus, which is very different
from this last study, the ratios vary between 0.77% in the Dominican Republic and
18.45%inPuertoRico.Itshouldnotbeforgottenthatthereareseveralareasinwhich
dataarescarceandthereforetheyarenotsignificant.
7.Relationshipofform,ratioandfrequency
Accordingtothetheoryoftheadditionof-sbyanalogywithotherverbtensesin
whichaproportionofspeakersusesthenon-standardform,therelationbetweenthe
frequency of the two forms and their ratios should be similar for each verb and for
each geographical area. Figure 14 shows a graph with the linear regression of the
standardformwiththevariantendingin-s.Thepointscorrespondtodifferentverbs.
9http://corpus.rae.es/creanet.html10http://www.iifilologicas.unam.mx/elhablamexico/index.php?page=habla-popular
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
165
Figure14.Linearregressionbetweenthefrequencyofthestandardandnon-standardforms
Figure15.Variationoftheratiosofeachverbbetweenthetwoforms
The graphs in Figures 14 and 15 suggest that, despite possible differences in
regionalvarieties,theanalogyhypothesisisreasonabletosomeextent,althoughsome
ratiodifferencesalsosuggestthatotherunknownfactorscanintervene.Theycouldbe
related to the coexistence of voseo forms, the degree of stigmatization in different
areas,etc.
4930
8
39
229
106
34
167
117
182
0
50
100
150
200
250
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Non-standardforms
occurrences
frequency--standardforms
frequencyandvariation
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
166
8.Conclusions
The -s added to the 2nd person of the singular of the perfect past tense is a
phenomenon spread throughout the geography of Spanish with different ratios
regardingthestandardform.Thisphenomenonappearsmorefrequentlyinruralareas,
eveninvoseoareas.Theratiosbygeographiczonesaredifferent,butwithinlimitsthat
arenottooextreme.Inaddition,theratiosofsuchformsarealsovariableaccordingto
theverb.
The use of geocorpora of tweets for the study of linguistic variation adds
additionalinformationtotraditionalstudiesbasedonsurveys,etc.Ontheotherhand,
geocoded data allows the quick and accurate preparation of synchronous variants
mapsusingGIStechnology.
References
BARNES,Sonia (2012)“¿Quédijistes?:AVariationistReanalysisofNon-Standard -sonSecond
Singular Preterit Verb Forms in Spanish”, Selected Proceedings of the 2010 Hispanic
LinguisticsSymposium,Somerville,MA:CascadillaPress,38-47.
DAVIES,Mark (2016-)Corpusdel Español: Twobillionwords, 21 countries.Availableonlineat
http://www.corpusdelespanol.org/web-dial/.(Web/Dialects)
DÍAZCOLLAZOS, AnaMaría (2015)Desarrollo sociolingüístico del voseo en la región andina de
Colombia(1555-1976),Berlín:MoutondeGruyter.
FRAGOGRACIA,JuanAntonio(2003)ElEspañoldeAmérica,Cádiz:UniversidaddeCádiz.
KANY,CharlesE.(1945)American-SpanishSyntax,2nded.Chicago:UniversityofChicagoPress.
LIPSKI,JohnM.(1994)LatinAmericanSpanish,LondonandNewYork:Longman.
PENNY,Ralph(2002)AhistoryoftheSpanishlanguage,Cambridge:CambridgeUniversityPress.
PARODI,Claudia(2015)“DialectosdelespañoldeAmérica:MéxicoyCentroamérica”, inJavier
Gutiérrez Rexach (coord.), Enciclopedia de Lingüística Hispánica, Vol. 2, London:
Routledge,375-386.
©Universitat de Barcelona
Dialectologia.Specialissue,VII(2017),149-167.ISSN:2013-2247
167
REAL ACADEMIA ESPAÑOLA (2010) Nueva gramática de la lengua española. Manual, Madrid:
EspasaLibros,2010.
VAQUERODERAMÍREZ,María(1998)ElespañoldeAméricaII:Morfosintaxisyléxico,Madrid:Arco
Libros.
©Universitat de Barcelona
Recommended