HMS: A Predictive Text Entry Method using Bigrams

LUND UNIVERSITY

PO Box 117221 00 Lund+46 46-222 00 00

HMS: A Predictive Text Entry Method using Bigrams

Hasselgren, Jon; Montnemery, Erik; Svensson, Markus; Nugues, Pierre

Published in:Proceedings of the Workshop on Language Modeling for Text Entry Methods

2003

Link to publication

Citation for published version (APA):Hasselgren, J., Montnemery, E., Svensson, M., & Nugues, P. (2003). HMS: A Predictive Text Entry Methodusing Bigrams. In Proceedings of the Workshop on Language Modeling for Text Entry Methods (pp. 43-49)

Total number of authors:4

General rightsUnless other specific re-use rights are stated the following general rights apply:Copyright and moral rights for the publications made accessible in the public portal are retained by the authorsand/or other copyright owners and it is a condition of accessing publications that users recognise and abide by thelegal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private studyor research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Read more about Creative commons licenses: https://creativecommons.org/licenses/Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will removeaccess to the work immediately and investigate your claim.

https://portal.research.lu.se/en/publications/07d733ff-baef-4b4a-ba37-9bc7a0b5f8d0

HMS: A Predictive Text Entry Method Using Bigrams

Jon Hasselgren Erik Montnemery Pierre Nugues Markus SvenssonLund Instituteof Technology

Departmentof ComputerScienceBox 118

S-22100 Lund,Sweden�d99jh, d99em, d99msv � @efd.lth.se, [email protected]

Abstract

Dueto theemergenceof SMSmessages,the significanceof effective text entryonlimited-sizekeyboardshasincreased.In this paper, we describeand discussa new methodto enter text more effi-ciently using a mobile telephonekey-board. This method,which we calledHMS, predictswords from a sequenceof keystrokes using a dictionary and afunction combiningbigramfrequenciesandword length.

We implementedthe HMS text entrymethodon a software-simulatedmobiletelephonekeyboardandwe compareditto a widely available commercialsys-tem. We trainedthe languagemodelona corpusof Swedishnews andwe eval-uatedthe method. Although the train-ing corpusdoesnot reflectthelanguageusedin SMSmessages,theresultsshowadecreaseby 7 to13percentin thenum-berof keystrokesneededto entera text.Thesefiguresareveryencouragingeventhoughthe implementationcanbeopti-mized in several ways. The HMS textentry methodcan easily be transferredto otherlanguages.

1 Introduction

Theentryof text in computerapplicationshastra-ditionally beencarriedout using a 102-key key-board. Thesekeyboardsallow to input charac-tersin a completelyunambiguousway usingsin-gle keys or sometimeskey combinations.

However, in the last few years, mobile tele-phoneshave introducedanew demandfor text en-try methods.Mobile telephonesareusuallyopti-mizedin sizeandweight.As aresult,thekeyboardis reducedto a minimal 12-button keyboard(Fig-ure1).

Figure 1: The 12-button keyboard of a Nokia3410.

Thereducedkeyboardmakesit hardfor theuserto entertext in anefficientwaybecauses/hehastousemultiple tappingor long key combinationstodisplay and disambiguatethe characters.Albeittedious,themultiple tappingmethodwasthemostcommonlyimplementedin mobiletelephonesun-til sometimeago.To sparetheusertheseelementsof frustration,a new classof text entry methodshasappeared.It usesdictionariesin anattempttoresolve theword ambiguityandrequires,in mostcases,only onekeystroke percharacter.

This paperproposesamethodthatsupplementsthe dictionary with word and bigram probabili-ties. Themethodusesthelastwritten word to im-prove thepredictionof thecurrentwordandto de-creasethenumberof neededkeystrokesevenfur-ther. This methodthatwe refer to asHMS in the

restof the text, usesthe frequenciesof commonbigramsthatwe extractedfrom acorpusof texts.

2 Current Text Entry Methods

In this section,we summarizethetext entrymeth-odscurrentlyin useandsomemethodsunderde-velopment.All thementionedmethodsusea key-boardwith 12buttons.

As ameasurementof theefficiency of thediffer-enttext entrymethods,we will usethenumberofkeystrokes per characteror �� (MacKenzie,2002). A completelyunambiguouskeyboarden-ablesa �� of 1, text predictionmethodsmayreducethisnumberevenfurther.

2.1 Multi-Press Methods

The multi-pressmethodsrequire more than onekeystroke to entera character. Thesemethodsal-lows for unambigoustyping of characters.Theycanbeusedaloneor asa fallbackfor systemsus-ing morecomplex text entrymethods.Themulti-pressmethodsare well suitedto type words notcontainedin thedictionary.

2.1.1 The Multi-Tap Method

Thefirst andstill mostcommonway to entertexton a mobile telephoneis the multi-tap method.Since‘a’, ‘b’ and‘c’ sharethesamekey, theuserpressesit once to enteran ‘a’, twice to entera‘b’, and threetimes to entera ‘c’. To entertheword dog, the userpressesthe sequenceof keys“36664”.

As two consecutive charactersof a word cansharea samekey, as for examplethe word “no”whereboth‘n’ and‘o’ areassignedto 6, a timeoutis neededto determinewhen to stopshifting thelettersanddisplayanew character.

This methodresultsin a �� of 2.0342ifEnglishtext is entered(MacKenzie,2002).

2.1.2 Remapped Keyboard

On currentmobile telephonekeyboards,charac-ters are assignedalphabeticallyto keys. This isnot optimalgiven that, for instance,themostfre-quentcharacterin English, ‘e’, is displayedus-ing two taps. Remappedkeyboardsassigna sin-gle key to the most frequentcharacters.The re-mainingcharactersaregroupedinto setsthatshare

a samekey. This methoddecreasesthe ��becausefrequentcharactersareenteredwith onlyonekeystroke.

TheprogramMessagEase(Saied,2001)of EX-ideasusestheideaof theremappedkeyboardtech-nique. MessagEaseresultsin a �� at 1.8210(MacKenzie,2002).

2.2 Single-Press Methods

Thesingle-pressmethodstry to reducethe ��to roughly one. They resortto a dictionary as ameansof resolvingtheambiguityof theinput.

2.2.1 The Predictive Text Entry Method

With the predictive text entry method, the userpressesone key per characterand the programmatchesthe key sequenceto words in a dictio-nary (Haestrup,2001). In thatsense,althoughitsnamesuggestsotherwise,thismethodmerelyaimsatdisambiguatingthewordsratherthanpredictingthem.Evenif severalcharactersaremappedto thesamekey, in many cases,only one word is pos-sible given the sequence.This methodmakes itpossibleto reducethe �� to roughly1. If thekey sequencecorrespondsto two or morewords,theusercanbrowsethroughtheresultingwordlistandchoosethewords/heintendedto write.

Theuser, for example,entersthewordcome, byfirst pressing2. Theprogramwill thenproposetheword a becauseit matchesthe enteredsequence.When the userpresses6, 6, and 3, the programmightproposethewordsan, con andfinally come.Thewordsbone, bond, andanod (andsomemore),alsofit the given sequence.The usercanaccessthesewordsby pressinganext-key.

Many new mobile telephonesusethis method.Themostwidely usedimplementationis T9 byTe-gic (Grover et al., 1998). Otherimplementationsare eZiText by Zi Corporation(Zi Corporation,2002) and iTAP by Motorola (Lexicus Division,2002). Most implementationsonly matchwordswith the samelengthasthe key sequence,result-ing in a �� of slightly greaterthan1 whentheusertypeswordsthat arecontainedin the dictio-nary.

Some implementationsproposewords longerthanthetappedsequencebasedon probability in-formation for the words. Theseimplementations

canreacha �� .2.2.2 WordWise

WordWise developedby EatoniErgonomicsusesanauxiliary key. A characteron a key is selectedexplicitly by simultaneouslypressingthekey cor-respondingto the characterandthe auxiliary keyindicatingthepositionof thecharacteron thekey.This decreasesthenumberof matchingwordsfora key sequenceconsiderablybecausetheuserex-plicitly disambiguatessomecharactersin the se-quence.

A drawback is that two keys must be pressedconcurrently. With a limited spacekeyboard,thiscanprove difficult to someusers.

2.2.3 LetterWise

LetterWise (MacKenzie et al., 2001), also byEatoniErgonomics,is adifferentapproach,whicheliminatestheneedfor a large dictionary. It onlyconsidersthe letter digram probabilities. In En-glish, the letter ‘t’ is often followed by ‘h’ andhardlyever by ‘g’. Theprogramselectsthemostprobableletter knowing the previous one. Theuser can browse and changethe charactersbypressinga ‘Next’ key.

TheLetterWisemethodhasa �� of 1.1500(MacKenzie,2002). Oneof its main advantagesis the small amountof memoryneeded.Anotheradvantageis thefact that it is just aseasyto enterwords, which are not in a dictionary. Thereforethiscouldbeasuitablefallbackmethodinsteadofthemulti-tapmethods,to producefastertext input.

3 Predictive Text Entry Using Bigrams

Predictionmay further improve the performanceof text entrywith a limited keyboard.With it, thesuggestedwordsmaybelongerthanthecurrentlytypedinput.

We proposeto useword bigrams,i.e. two con-secutivewords,to giveabettertext prediction,seeinter alia (Shannon,1948), (Jelinek,1997), and(ManningandSchutze,1999).Thelist of bigramsis storedin memorytogetherwith their frequencyof occurrenceand it is accessedsimultaneouslywith thecharacterinput.

Givenapreviouslywrittenword,themostprob-ablesubsequentwordsareextractedfrom the bi-

gramlist. Using the maximumof likelihood,theprobabilityof thebigram�� giventheword� �� is computedas:

�� !��#"%$ �&��'��("�&��#" (1)

Sincethepreviously written word ��!�� is alwaysknown andconstant,it is sufficient to usethefre-quency of thebigramsandsetaside�&�� " .

In practice,bigramsmustbe combinedwith adictionary. Sparsedatafrom thedevelopmentcor-pusandmemoryconstrainsmake it impossibletostore an exhaustive list of bigrams. To choosethe words to propose,we useda variationof theKatz model (Katz, 1987). The Katz model takesthe longestavailableN-gramandusescorrectiontermsto normalizetheprobabilities.In thecaseofbigrams,theprobabilitiescanbeexpressedas:

�)�� !��#"$* �)��+� ��,".-0/��&��("21$435 �)��(" -0/��&��("6$43 (2)

where 5 is thecorrectionterm.In our implementation,the bigramsarealways

prioritized over the unigrams.The Katz back-offmodel is well suitedfor our implementationasitallowsfor asmallmemoryfootprintof thebigramslist, while still ensuringthat the systemwill sup-port enteringof all wordsin thedictionary.

In additionto thebigramfrequencies,thewordlength is a useful criterion to presentthe match-ing wordsto theuser. Thisadditionalparameterisjustifiedby thenavigation througha list of wordswith thekeys availableonmobiletelephones.

Bigram probabilitiesusedaloneproducea listof possiblewords and rank them without regardto the effort neededto selectthe intendedword.Sincebrowsing the list is carriedout using onescrollingkey, it maytakeacoupleof keystrokestoreachthe word. Even, if corpusfrequenciessug-gestalongerwordbeingpreferredto ashorterone,a presentationby decreasingfrequenciesmay beinadequate.

Thelist navigationis in facteasierin somecasesusingcharacterinputkeys. A singlekeystroke can

resolve a greatdealof ambiguitybecausethereisa totalof 8 keys to choosecomparedto theuniquescrollingkey to cycle the list of suggestedwords.That’s why the list of proposedwordsis rescoredandshortwordsaregivenanadditionalweight.

4 Implementation

Weimplementedasoftwareprototypeof theHMSmethodwe describedin this paper. We chosethe Java programminglanguagebecauseof itsextensive packagesthat allow for rapid develop-ment. Anotheradvantageis Java’s platform inde-pendence,which should,in theory, make it pos-sible to run the programon any modernmobiletelephone.

The programwas designedto run on a hand-held device i.e. on the client side of the mobilenetwork. The memoryof a mobile telephoneisvery limited anda disadvantageof this strategy isthe memoryfootprint of the languagemodelsweuse. A possibleworkaroundwould be to imple-menttheHMS softwareon an applicationserver.All theuserswould thensharethelanguagemod-elswith possiblecustomizations.Modernmobiletelephoneinfrastructuresenablea real-timeroundtrip of thetypedcharactersandthustheinteractivesuggestionof matchingwords.

The programcomputesa list of word sugges-tionsevery time akey is pressedandthebestsug-gestionis displayedsimultaneouslyon thescreen:The top white window in our Java program(Fig-ure2). Theusercanbrowsethelist of suggestionsusingtheup anddown keys.

4.1 Program Design

Theprogramis divided into two parts: a userin-teractionmoduleanda lexical databasemodule.

The userinteractionmodulecurrentlyconsistsof a GraphicalUser Interface (GUI) whoselay-out closelyresemblesthat of a mobile telephone.The simulatedkeyboardlayout makes it possibleto comparetheHMS prototypewith softwarerun-ningon mobiletelephones.

The lexical databasemodulecontainsthe coreof the program. It is responsiblefor the gener-ation of a list of suggestedwordsgiven the userinputsofar. Themodulescommunicatewith eachotherusingan interface. Thus, the two partsare

Figure2: Screenshotof theHMS Java prototype.

independentandonemaymodify theuserinterac-tion modulein particularto fit differentplatformswithout having to modify the moduleconcerningthewordguessingalgorithm.

4.2 Data Structures

A compactencodingstructureof the bigramandunigramlists hasa significantimpact to achieveanefficientwordproposal.

The data structurewe used is comparabletothatof a letter treeor trie (dela Briandais,1959).However, thenodesof thenew treestructurecor-respondto an input key insteadof a characteras in the classicaltries. For instance,the char-acters �879�;:<�>=?�;@A" areassociatedto a singlenode.Thus, the tree structureenablesto representthekeystroke ambiguity and makes it easierto tra-versethetree. It alsointroducestheneedto storea completelist of words that matcha keystrokesequencein the leaves resulting in a somewhathighermemoryoverhead.

Searchingthis type of tree is straightforward.Thekeys pressedsofar by theuserareusedasin-put andthetreeis traversedonelevel down basedon every key pressed.Whenthetraversalis com-pletedthe resultingsub-treeincludesall possiblesuggestedwordsfor thetypedkey combination.

For the bigrams,a slightly different structureis needed.Sincethe previously written word hasbeenchosenfrom the list of suggestedwords, itcanno longerbeconsideredambiguous.Onecan-not simply build a treeof bigramsusingthe pro-posedstructurebecausethe treeitself is ambigu-ous. A collectionof trees,onetreefor eachpre-cedingword,wasused.For performancereasons,ahashtablewasusedto managethecollection.

4.3 Training the Language Model

Weusedadictionaryof 118,000inflectedSwedishwords and we trainedthe languagemodel – un-igrams and bigrams – on the Stockholm-Umea(SU)Corpus(Ejerhedetal.,1992).TheSUcorpusis a POSannotated,balancedcorpusof Swedishnewsreports,novels,etc.Tokeepthememorysizeof theprogramunder100megabytes,we retainedonly 195,000bigramsfrom the corpus. The SUcorpusdoesnot reflectthelanguageof SMSmes-sagesthat differs greatly from that of the “clas-sical” written Swedish. This results in a non-optimal languagemodel. We choseit becauseofthe unavailability of a large-enoughpublic SMScorpus.

Whenthe input of a singleword is completed,its correspondingbigramandunigramprobabili-ties areupdated. It resultsin a learningsystem,which adaptsto every user’s style of writing. Toincreasethe speedof adaptation,languagefre-quenciesderived from the userinput have higherpriorities than what has been learnedfrom thetraining corpus. If the systemwereimplementedas a server application, the personaladaptationwould becomemorecomplex. However, a sepa-ratetablewith usercustomizedparameterscanbesavedeitherlocally or on theserver.

All corporaanddictionariesusedwith thesoft-warehave beenin Swedishso far. However, theHMS programdoesnot carry out any language-specificparsingor semanticanalysis.Hence,themethodcouldbetransferredto any languagepro-videdthatasufficient corpusexists.

5 Evaluation

As an evaluationof the efficiency of our imple-mentation,we madean initial comparative test

betweenthe HMS programand the Nokia 3410,whichusestheT9 system.

As we said in the previous section,we couldnot train a languagemodeloptimizedfor anSMSapplication. This certainly biasedthe evaluationof the entry methodsin our disfavor. Therefore,we choseto evaluatebothprogramswith a testsetconsistingof asampleof SMSmessagesandshorttexts from newspapers.

A total of nine testersenteredthe texts. Theyfirst hadthepossibility to getaccustomedto boththe HMS and the T9 methods. The testerswereencouragedto composea short arbitrary SMSmessageof 50-100characterscontainingeverydaylanguage.They alsochosean excerptof a news-paper article of approximatelythe samelengthas the typedSMS messagefrom the AftonbladetSwedishnewspaperwebsite.Thekeystroke countwasrecordedandusedto calculatethe �� pa-rameter.

The entry of new words,i.e. missingfrom thedictionary, usesthe sametechniquein the HMSand T9 methods. If the text selectedby a testercontainedwordsnot presentin at leastoneof thedictionaries,thetesterwasaskedto choosea newtext.

Table 1 shows the results we obtained inkeystrokespercharacter.

Table1: Testresults.

Method Typeof text ��T9 SMS 1.0806HMS Bigrams SMS 1.0108T9 News 1.0088HMS Bigrams News 0.8807

In thisrelatively limitedevaluationtheHMS en-try methodshows a �B�6�� smallerthan that oftheT9 systemin both tests:news andSMStexts.Theimprovementis of, respectively, 7 and13per-cent. The betterresult for the bigrammethodismainly due to two reasons.First, the utilizationof thepreviously written word to predictthenextword resultsin an improvementof the predictioncomparedto the methodsrelying only on dictio-nariessuchasT9. Secondly, the fact that wordsareactuallypredictedbeforeall charactersareen-

tered improves even further the performanceofHMS over T9.

6 Discussion

The differencein �� betweenthe SMS andnews text with our method is to a large extentdueto thecorpus,which doesnot fit themoreca-sual languageof the typical SMS texts. The T9method,on theotherhand,is optimizedfor typingSMStexts.

Another reasonthat may contribute to the ob-serveddifferenceis that thenews texts in generalcontainlongerwords. The meanword length inourtestis about4 charactersfor theSMStextsand5 charactersfor thenews texts. At agivenstateofa word input,a few morekeystrokes,oftenoneortwo, reducedramaticallythe selectionambiguityandthe identificationof a longerword will needlesskeystrokes in proportionto its length thanashorterone. This correspondsa smaller ��for longerwords. Figure3 shows the �� ac-cording to the word lengthand the falling curvefor longerwords.

Figure3: �B�6�� versusthemeanword lengthintheHMS bigrammethod.

Ambiguity is also reducedfor longer wordswhen the length of the key sequenceis exactlythat of the word. The possiblewordsfor a givenlonger key sequenceare often fewer than for ashorterone.Thisexplainswhy theT9 systemalsoshows a resultbetterfor thenews text thanfor theSMSmessages.However, theT9 cannever reacha �� lessthan1 sinceit doesn’t predictwords

longerthanthegivensequence.

Figure4: �� versusmeanword lengthin theT9 system.

Othersignificantdifferencesbetweenthe SMSandnews texts play a role in thefinal results.Forexample,theSMS texts show a higherfrequencyof certaincharacterssuchas the questionmarks,slashesandexclamationmarks,which resultsin ahigher �� . This factcanexplain thesurpris-ingly high �� for sometexts. This propertyaffectsbothmethodsto thesameextentthough.

7 Conclusion and Perspectives

Weimplementedanew text entrymethodadaptedto mobiletelephonekeyboardsandwecompareditto theT9 methodwidely availableon commercialdevices. The HMS methodis basedon languagemodelsthatwe trainedon theSU corpus.

The trainingcorpuswas,to a greatextent,col-lectedfrom Swedishnews wiresanddidn’t fit ourapplicationvery well. This is heavily relatedtothe languageusedin SMSmessages,which tendsto includeabbreviationsandslangabsentfrom theSU corpus.However, theresultsweobtainedwiththeHMS methodshow a decreaseby 7 to 13 per-cent in the numberof keystrokesneededto entera text. Thesefiguresare very encouragingeventhough the implementationcan be optimized inseveralways.

Furtherevaluationcould be conductedwith anautomatictest system. It would produceresultsthat would eliminateall input mistakes. It would

also enableto computethe optimal solution intermsof �� given the two optionsleft to theuseratagivenpointof thewordentry: eithertypeanew letterkey or scroll thelist of possiblewords.

It would alsobevery interestingto evaluatethe�� of the bigram methodafter training thesystemwith a better-suited corpus. We expectthe �� to besignificantlylower thanwith thepresentcorpus.It is worthonceagainpointingoutthatevenwith thenon-optimalcorpus,theresultsof thebigrammethodareonparor superior.

We also observed that the languagemodeladaptsquicker to theusers’individualwaysof ex-pressingthemselvesthanothersystems.It thusin-creasesthegainover time.

At the time we wrote this paper, we could notgain accessto a large corpusof SMS messages.However, we intendto collect texts from Internetchat roomsand messageboards,wherethe lan-guageshows strongsimilaritiesto SMSlanguage.We expect a better languagemodel and an im-proved �� from thisnew corpus.

A problemwith the bigrammethodis its largememoryfootprint comparedto thatof dictionary-basedsystems. However, this should not be aproblem on the next generationof mobile tele-phoneslike GPRSand 3G. The languagemod-els could be off-loadedon an applicationserverandthelow round-triptimeof thenetwork systemshouldenablea real-timeinteractionbetweentheserver andtheuserterminalto carryout thewordselection.

References

Zi Corporation. 2002. eZiText. Technical report,http://www.zicorp.com.

R. dela Briandais.1959.File searchingusingvariablelength keys. In Proceedings of the Western JointComputer Conference, volume15, pages285–298,New York. Instituteof RadioEngineers.

LexicusDivision. 2002. iTap. Technicalreport,Mo-torola, http://www.motorola.com/lexicus, Decem-ber.

Eva Ejerhed, Gunnel Kallgren, Ola Wennstedt,andMagnusAstrom. 1992. The linguistic annotationsystemof the Stockholm-Umea project. Technicalreport,Universityof Umea, Departmentof GeneralLinguistics.

DaleL. Grover, Martin T. King, andClifford A. Kush-ler. 1998. Reducedkeyboarddisambiguatingcom-puter. U.S.Patentno.5,818,437.

JanHaestrup. 2001. Communicationterminal hav-ing a predictive editor application. U.S. Patentno.6,223,059.

Frederick Jelinek. 1997. Statistical Methods forSpeech Recognition. The MIT Press,Cambridge,Massachusetts.

Slava M. Katz. 1987. Estimationof probabilitiesfrom sparsedatafor a languagemodelcomponentof aspeechrecognizer. IEEE Transaction on Acous-tics, Speech, and Signal Processing, 35(3):400–401,March.

I. Scott MacKenzie, Hedy Kober, Derek Smith,Terry Jones,and EugeneSkepner. 2001. Let-terwise: Prefix-based disambiguation for mo-bile text input. Technical report, Eatoni,http://www.eatoni.com/research/lw-mt.pdf.

I. ScottMacKenzie.2002.KSPC(keystrokesperchar-acter) as a characteristicof text entry techniques.In Proceedings of the Fourth International Sym-posium on Human Computer Interaction with Mo-bile Devices, pages195–210,Heidelberg, Germany.Springer-Verlag.

ChristopherD. ManningandHinrich Schutze. 1999.Foundations of Statistical Language Processing.MIT Press,Cambridge,Massachusetts.

Nesbat B Saied. 2001. Fast, full text en-try using a physical or virtual 12-buttonkeypad. Technical report, EXideas,http://www.exideas.com/ME/whitepaper.pdf.

ClaudeE. Shannon.1948. A mathematicaltheoryofcommunication.The Bell System Technical Journal,27:379–423,623–656,July-October.

Documents

HMS: A Predictive Text Entry Method using Bigrams