Speech-Enabled Hybrid Multilingual Translation for Mobile Devices

Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 41–44,Gothenburg, Sweden, April 26-30 2014. c©2014 Association for Computational Linguistics

Speech-Enabled Hybrid Multilingual Translation for Mobile Devices

Krasimir AngelovUniversity of Gothenburg

[email protected]

Bjorn BringertGoogle Inc

[email protected]

Aarne RantaUniversity of [email protected]

AbstractThis paper presents an architecture and aprototype for speech-to-speech translationon Android devices, based on GF (Gram-matical Framework). From the user’spoint of view, the advantage is that thesystem works off-line and yet has a leansize; it also gives, as a bonus, gram-matical information useful for languagelearners. From the developer’s point ofview, the advantage is the open architec-ture that permits the customization of thesystem to new languages and for specialpurposes. Thus the architecture can beused for controlled-language-like transla-tors that deliver very high quality, whichis the traditional strength of GF. However,this paper focuses on a general-purposesystem that allows arbitrary input. It cov-ers eight languages.

1 Introduction

Many popular applications (apps) on mobiledevices are about language. They rangefrom general-purpose translators to tourist phrasebooks, dictionaries, and language learning pro-grams. Many of the apps are commercial andbased on proprietary resources and software. Themobile APIs (both Android and iOS) make it easyto build apps, and this provides an excellent way toexploit and demonstrate computational linguisticsresearch, perhaps not used as much as it could.

GF (Grammatical Framework, (Ranta, 2011)) isa grammar formalism designed for building multi-lingual grammars and interfacing them with othersoftware systems. Both multilinguality and inter-facing are based on the use of an abstract syntax,a tree structure that captures the essence of syntaxand semantics in a language-neutral way. Transla-tion in GF is organized as parsing the source lan-guage input into an abstract syntax tree and then

linearizing the tree into the target language. Hereis an example of a simple question, as modelled byan abstract syntax tree and linearized to four lan-guages, which use different syntactic structures toexpress the same content:

Query (What Age (Name ”Madonna”))English: How old is Madonna?Finnish: Kuinka vanha Madonna on?French: Quel age a Madonna?Italian: Quanti anni ha Madonna?

In recent years much focus in GF has beenput on cloud applications (Ranta et al., 2010) andon mobile apps, for both Android (Detrez andEnache, 2010) and iOS (Djupfeldt, 2013). Theyall implement text-based phrasebooks, whereasAlumae and Kaljurand (2012) have built a speech-enabled question-answering system for Estonian.An earlier speech translation system in GF is pre-sented in Bringert (2008).

All embedded GF systems are based on astandardized run-time format of GF, called PGF(Portable Grammar Format; Angelov et al. 2009,Angelov 2011). PGF is a simple “machine lan-guage”, to which the much richer GF source lan-guage is compiled by the GF grammar compiler.PGF being simple, it is relatively straightforwardto write interpreters that perform parsing and lin-earizations with PGF grammars. The first mobileimplementations were explicitly designed to workon small devices with limited resources. Thus theywork fine for small grammars (with up to hun-dreds of rules and lexical entries per language), butthey don’t scale up well into open-domain gram-mars requiring a lexicon size of tens of thousandsof lemmas. Moreover, they don’t support out-of-grammar input, and have no means of choosingbetween alternative parse results, which in a largegrammar can easily amount to thousands of trees.

A new, more efficient and robust run-time sys-tem for PGF was later written in C (Angelov,2011). Its performance is competitive with the

41

state of the art in grammar-based parsing (Angelovand Ljunglof, 2014). This system uses statisti-cal disambiguation and supports large-scale gram-mars, such as an English grammar covering mostof the Penn Treebank. In addition, it is leanenough to be embedded as an Android applicationeven with full-scale grammars, running even ondevices as old as the Nexus One from early 2010.

Small grammars limited to natural languagefragments, such as a phrasebook, are usable whenequipped with predictive parsing that can suggestthe next words in context. However, there is nonatural device for word suggestions with speechinput. The system must then require the user tolearn the input language; alternatively, it can bereduced to simple keyword spotting. This canbe useful in information retrieval applications, buthardly in translation. Any useful speech-enabledtranslator must have wide coverage, and it cannotbe restricted to just translating keywords.

In this paper, we show a mobile system thathas a wide coverage and translates both text andspeech. The system is modular and could be eas-ily adapted to traditional GF applications as well:since the PGF format is the same, one can combineany grammar with any run-time PGF interpreter.

The rest of the paper is organized as follows:Section 2 describes the system’s functionalitiesfrom the user’s point of view. Section 3 explainsthe technology from the developer’s point of view.Section 4 presents some preliminary results on theusability of the system, and discusses some waysof improving it. Section 5 concludes.

A proper quantitative evaluation of the transla-tion quality has to wait till another occasion, andwill be more properly done in a context that ad-dresses hybrid GF-based translation as a researchtopic. Early attempts in this area have not yet con-verged into a stable methodology, but we believethat setting translation in the context of a practicaluse case, as here, can help identify what issues tofocus on.

2 Functionalities

The app starts with the last-used language pair pre-selected for input and output. It waits for speechinput, which is invoked by touching the micro-phone icon. Once the input is finished, it appearsin text on the left side of the screen. Its translationappears below it, on the right, and is also renderedas speech (Figure 1 (a)).

(a) (b)

Figure 1: Translation between various languageswith (a) speech (b) text input.

The source and target languages are selected bythe two drop-down lists on the top of the screen.The icon with two arrows to the right of the lan-guage selectors allows the two languages to beswapped quickly.

The speech recognition and text-to-speech(TTS) is done using public Android APIs. Onmost devices, these make use of Google’s speechrecognizer and synthesizer, which are available inboth online and offline versions. The offline en-gines tend to have a reduced choice of languagesand reduced quality compared to the online en-gines, but don’t require an internet connection.

Alternatively, the user can select the keyboardmode. The microphone icon is then changed to akeyboard icon, which opens a software keyboardand shows a text field for entering a new phrase.Once the phrase is translated, it is shown on thescreen but also sent to TTS (Figure 1 (b)).

If the input consists of a single lexical unit,the user can open a dictionary description for theword. The resulting screen shows the base formof the word, followed by a list of possible transla-tions. The target language is shown on the top ofthe screen and it can be changed to see the transla-tions in the other languages (Figure 2 (a)). Touch-ing one of the translations opens a full-form in-flection table together with other grammatical in-formation about the word, such as gender and verbvalency (Figure 2 (b)).

Finally, the translator also works as an inputmode for other apps such as SMS. It provides asoft keyboard, which is similar to the standard An-droid keyboard, except that it has two more keysallowing the entered phrase to be translated in-place from inside any other application.

42

(a) (b)

Figure 2: (a) Results of dictionary lookup. (b) Va-lency and the inflection table for a Bulgarian verb.

3 Technology

3.1 Run-time processing

The core of the system is the C runtime for PGF(Angelov, 2011). The runtime is compiled to na-tive code with the Android NDK and is called viaforeign function interface from the user interface,which is implemented in Java.

The main challenge in using the runtime on mo-bile devices is that even the latest models are stillseveral times slower that a modern laptop. For in-stance, just loading the grammars for English andBulgarian, on a mobile device initially took about28 seconds, while the same task is a negligibleoperation on a normal computer. We spent con-siderable time on optimizing the grammar loaderand the translator in general. Now the same gram-mar, when loaded sequentially, takes only about5-6 seconds. Furthermore, we made the grammarloader parallel, i.e. it loads each language in par-allel. The user interface runs in yet another thread,so while the grammar is loading, the user can al-ready start typing or uttering a sentence. In addi-tion, we made it possible to load only those lan-guages that are actually used, i.e. only two at atime instead of all eight at once.

Parsing is a challenge in itself. As the grammarsgrow bigger, there tends to be more and more needfor disambiguation. This is performed by a statis-tical model, where each abstract syntax tree nodehas weight. We used the method of Angelov andLjunglof (2014) to find the best tree.

Moreover, since any sound grammar is likely tofail on some input, there is need for robustness.This has been solved by chunking the input intomaximal parsable bits. As a result, the translationsare not always grammatically correct, because de-

Bulgarian 26664 French 19570Chinese 17050 German 9992English 65009 Hindi 33841Finnish 57036 Swedish 24550

Table 1: Lexical coverage (lemmas)

pendencies between chunks, such as agreement,get lost. This kind of errors are familiar to anyonewho has used a statistical system such as Googletranslate. In the GF system it is easy to avoid them,provided the parse is complete.

3.2 The language component

The language-specific component of the app is thePGF grammar, which contains both the grammarsproper and the probabilistic model of the abstractsyntax. The app can be adaptad to a different PGFgrammar by changing a few lines of the sourcecode. Hence any grammar written in GF is readilyusable as the language component of an app. Buthere we focus on the large-scale grammar meantfor robust translation.

The core of the grammar is the GF ResourceGrammar Library (Ranta, 2009), which currentlycovers 29 languages. Of these, 8 have been ex-tended with more syntax rules (about 20% in ad-dition to the standard library) and a larger lexi-con. Table 1 shows the list of languages togetherwith the size of the lexicon for each of them. Theabstract syntax is based on English lemmas andsome split word senses of them. The other lan-guages, having fewer words than English, are thusincomplete. Unknown words are rendered by ei-ther showing them in English (if included in theEnglish lexicon) or just returning them verbatim(typical for named entities).

The lexicon has been bootstrapped from variousfreely available sources, such as linked WordNetsand the Wiktionary. Parts of the lexicon have beenchecked or completely written manually.

4 First results

The most striking advantage of the translation appis its lean size: currently just 18Mb for the wholeset of 8 languages, allowing translation for 56language pairs. This can be compared with thesize of about 200Mb for just one language pairin Google’s translation app used off-line. TheApertium off-line app is between these two, usingaround 2MB per language pair.

43

The speed is still an issue. While the appnow loads smoothly on modern hardware (suchas Nexus 5 phones), translation is usually muchslower than in Google and Apertium apps. Thespeed depends heavily on the complexity of thesource language, with Finnish and French theworst ones, and on sentence length. Only withshort sentences (under ten words) from Bulgarian,Chinese, English, and Swedish, does the translatordeliver satisfactory speed. On the other hand, longsentences entered via speech are likely to con-tain speech recognition errors, which makes theirtranslation pointless anyway.

Translating single words is based on a simpleralgorithm (dictionary lookup) and is therefore im-mediate; together with the grammatical informa-tion displayed, this makes single word translationinto the most mature feature of the app so far.

The translation quality and coverage are rea-sonable in phrasebook-like short and simple sen-tences. The app has exploited some idiomatic con-structions of the earlier GF phrasebook (Detrezand Enache, 2010), so that it can correctly switchthe syntactic structure and translate e.g. how oldare you to French as quel age as-tu. In many othercases, the results are unidiomatic word-to-wordtranslations but still grammatical. For instance,hur mycket ar klockan, which should give what isthe time, returns how mighty is the bell. Such shortidioms are typically correct in Google’s translationapp, and collecting them into the GF resources willbe an important future task.

On the plus side, grammar-based translation ismore predictable than statistical. Thus (currently)when using Google translate from Swedish to En-glish, both min far ar svensk and its negation minfar ar inte svensk come out as the positive sen-tence my father is Swedish. With grammar-basedtranslation, such semantic errors can be avoided.

5 Conclusion

We have presented a platform for mobile transla-tion apps based on GF grammars, statistical dis-ambiguation, and chunking-based robustness, en-hanced by Android’s off-the-shelf speech inputand output. The platform is demonstrated by asystem that translates fairly open text between 8languages, with reasonable performance for shortsentences but slow parsing for longer ones, withmoreover lower quality due to more parse errors.

The processing modules, user interface, and the

language resources are available as open sourcesoftware and thereby usable for the communityfor building other systems with similar function-alities. As the app is a front end to a grammati-cal language resource, it can also be used for otherlanguage-aware tasks such as learning apps; this isillustrated in the demo app by the display of inflec-tion tables. The app and its sources are availablevia http://www.grammaticalframework.org.

ReferencesTanel Alumae and Kaarel Kaljurand. 2012. Open and

extendable speech recognition application architec-ture for mobile environments. The Third Interna-tional Workshop on Spoken Languages Technologiesfor Under-resourced Languages (SLTU 2012), CapeTown, South Africa.

Krasimir Angelov and Peter Ljunglof. 2014. Faststatistical parsing with parallel multiple context-freegrammars. In European Chapter of the Associationfor Computational Linguistics, Gothenburg.

Krasimir Angelov, Bjorn Bringert, and Aarne Ranta.2009. PGF: A Portable Run-Time Format for Type-Theoretical Grammars. Journal of Logic, Languageand Information, 19(2), pp. 201–228.

Krasimir Angelov. 2011. The Mechanics of the Gram-matical Framework. Ph.D. thesis, Chalmers Univer-sity of Technology.

Bjorn Bringert. 2008. Speech translation with Gram-matical Framework. In Coling 2008: Proceedings ofthe workshop on Speech Processing for Safety Crit-ical Translation and Pervasive Applications, pages5–8, Manchester, UK, August. Coling 2008 Orga-nizing Committee.

Gregoire Detrez and Ramona Enache. 2010. A frame-work for multilingual applications on the androidplatform. In Swedish Language Technology Confer-ence.

Emil Djupfeldt. 2013. Grammatical framework on theiphone using a C++ PGF parser. Technical report,Chalmers Univerity of Technology.

Aarne Ranta, Krasimir Angelov, and Thomas Hallgren.2010. Tools for multilingual grammar-based trans-lation on the web. In Proceedings of the ACL 2010System Demonstrations, ACLDemos ’10, pages 66–71, Stroudsburg, PA, USA. Association for Compu-tational Linguistics.

Aarne Ranta. 2009. The GF resource grammar library.Linguistic Issues in Language Technology.

Aarne Ranta. 2011. Grammatical Framework: Pro-gramming with Multilingual Grammars. CSLI Pub-lications, Stanford. ISBN-10: 1-57586-626-9 (Pa-per), 1-57586-627-7 (Cloth).

44

Documents

Speech-Enabled Hybrid Multilingual Translation for Mobile Devices