12
Using Speech Input and Output in Logo Programming Peter Tomcsányi Department of Infromatics Education, Faculty of Mathematics, Physics and Informatics,Comenius University, Bratislava, Slovak Republic [email protected] Abstract Speech technology is a quickly developing new technology. In our new Logo implementation called Imagine (http://www.logo.com/catalogue/titles/imagine/index.html) we wanted to make this new technology available for children. In this article we describe how speech input and output can be used in Logo programs to make them more interesting for all kinds of users. Keywords: Speech Output, Speech Input, Speech Engine, Imagine, Logo 1. Introduction Imagine is a new Logo implementation (see Blaho, Kalas, Tomcsányi (1999) and Blaho, Kalas (2001)). During its development we were influenced by number of new computer technologies. Speech engines for Windows are such a new technology. Although speech-related technology is around for some years, only recently it becomes to be available for a broader audience of users and also software developers. In Windows there is a defined standard for interfacing between an application and a speech engine called SAPI. Using this software interface any program can use the functionality of a third party speech engine, which implements speech input and output functionality. Imagine is able to co-operate with any SAPI 4.0 compliant speech engine. Some of the engines can be found on the Internet for free, some of them can be bought. In Windows 2000 the output engine is already installed automatically. Sometimes when you get a game using speech output you get also a speech engine even if you do not know about it. Imagine gives the power of these engines to the Logo programmer. Most speech engines are English, but there are also engines for other major languages like French, Spanish, Russian or German. Speech engines for languages spoken by relatively small number of people are not available yet, but this may change in next few years. 219

Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

Using Speech Input and Output in Logo Programming

Peter TomcsányiDepartment of Infromatics Education, Faculty of Mathematics, Physics and Informatics,Comenius

University, Bratislava, Slovak [email protected]

AbstractSpeech technology is a quickly developing new technology. In our new Logo implementation called Imagine (http://www.logo.com/catalogue/titles/imagine/index.html) we wanted to make this new technology available for children. In this article we describe how speech input and output can be used in Logo programs to make them more interesting for all kinds of users.

Keywords: Speech Output, Speech Input, Speech Engine, Imagine, Logo

1. Introduction

Imagine is a new Logo implementation (see Blaho, Kalas, Tomcsányi (1999) and Blaho, Kalas (2001)). During its development we were influenced by number of new computer technologies. Speech engines for Windows are such a new technology. Although speech-related technology is around for some years, only recently it becomes to be available for a broader audience of users and also software developers.

In Windows there is a defined standard for interfacing between an application and a speech engine called SAPI. Using this software interface any program can use the functionality of a third party speech engine, which implements speech input and output functionality.

Imagine is able to co-operate with any SAPI 4.0 compliant speech engine. Some of the engines can be found on the Internet for free, some of them can be bought. In Windows 2000 the output engine is already installed automatically. Sometimes when you get a game using speech output you get also a speech engine even if you do not know about it. Imagine gives the power of these engines to the Logo programmer.

Most speech engines are English, but there are also engines for other major languages like French, Spanish, Russian or German. Speech engines for languages spoken by relatively small number of people are not available yet, but this may change in next few years.

2. Speech output

Speech output is called also Text to Speech conversion. It turns written text into spoken text. Imagine implements a Say command to use speech output and SetVoice to select a voice (as there may be installed several voices in several languages at the same time).

Speech output can be used in several ways. When saying constant texts then the same effect could be achieved by using a WAV file containing the text. The power of speech input is utilised better when it is used to say variable texts. Then it brings something new comparing to using WAV files.

Talking dice

This example is based on the Web dice project, which can be found at:

http://www.logo.com/imagine/project_gallery/dice.HTM

219

Page 2: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

Let's create a turtle and give it a dice shape. The shape is an animated one, so we must set it to manual animation mode because we do not want the turtle to animate automatically.

Then we define the onClick event of this turtle to change the frame items randomly 3 to 8 times and play a short sound each time and finally the announce the resulting frame item number:

repeat 3 + random 6 [setframeitem 1 + random 6 wait 20 play [S0 I115 T120 L4 O2 32F]] ( Say [You threw] frameitem )

So now the project is ready - each time the dice is clicked the dice is thrown and the speech output announces the thrown number.

3. Speech input

The more exciting thing is speech input. In general there are two levels of speech recognition.

The Command and Control functionality accepts a fixed set of voice commands - in Imagine we name it the voice menu. When the speech engine recognises a spoken command then it notifies the application and the application can react accordingly.

In Continuous Dictation the speech engine tries to recognise each spoken word and transform it to written text.

Continuous dictation demands much more computing power and is much more sensitive to the way how each person speaks. A period of training is usually needed to adapt the engine to each individual speaker.

Therefore in Imagine we implemented only the use of command and control speech recognition. The implementation enables the user to add recognition of spoken commands to any Logo program.

Speech input in Logo? So let's command the turtle!

The easiest thing to start with is trying to command the turtle by voice. It means to create a voice menu for basic commands.

In Imagine each page can have its own voice menu. The menu is active while the particular page is shown. Another voice menu (called mainVoiceMenu) is defined for the whole Main Window and it is active all the time in addition to the voice menu of the active page.

220

Page 3: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

A voice menu in Imagine is a list consisting of pairs:[phrase1 action1 phrase2 action2 ...]

Each phrase is a word or list and each action is a list of Logo instructions to be executed when the phrase is recognised.

To implement the voice menu for basic turtle movements we will define this voice menu:[left [lt 45] right [rt 45] forward [fd 50] back [bk 50] [pen up] [pu] [pen down] [pd]]

The menu can be set using a command:page1'setVoiceMenu [left [lt 45] right [rt 45] forward [fd 50] back [bk 50] [pen up] [pu] [pen down] [pd]]

Or it can be written into the "change me" dialogue of the current page (without the outer brackets):

Our experience with both kids and adults shows that using even such a simple application of voice commands is very interesting for them and they feel to be much more actually controlling the turtle than when they must write the commands. And it is also more fun for them.

Even more fun is usually when commands to start and stop a continuous movement are added to the menu. Then the turtle can start moving and the user can modify its path by left and right commands. The forward and back commands are not needed anymore. Also adding colour and line thickness commands are interesting.

We define two helper procedures:Page1'to startMoving if done? "move [(every 30 [fd 1] "move)]end

Page1'to stopMoving cancel "moveend

Then we can define a longer voice menu:

221

Page 4: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

[[red pen] [setpc "red] [black pen] [setpc "black] [big pen] [setpw 10][small pen] [setpw 1] right [rt 45] left [lt 45] [pen up] [pu] [pen down] [pd] clean [clean] start [startMoving] stop [stopMoving]]

Handling command inputs

In the above examples we can also see one of the constraints of speech menus - spoken phrases cannot include variable parts - variable inputs to commands. The command and control speech engine interface cannot recognise parameters given to commands - it can recognise just exact phrases. This disadvantage can be compensated in several ways, for example:

Define more voice commands doing the same thing with different parameters:[[big forward] [fd 100] forward [fd 50] [small forward] [fd 10]]

Define special voice commands for changing the parameter for all subsequent commands of the same kind:

[[angle 45] [make "angle 45] [angle 90] [make "angle 90] left [lt :angle] right [rt :angle]]

Define an alternative (not speech) way of setting the parameter for all subsequent commands of the same kind. For example we can create a slider named angle, having values from 0 to 360 and then the action for the phrases left and right could be [lt angle] or [rt angle] respectively.

Listen just when I speak to you

Even in the above simple examples we quickly run into the basic problem of real world speech input usage. If the person commanding a turtle is not alone in a quiet room then his computer may hear also voices of others and the speaker tends to speak not just to the computer but also to others.

The first problem can be partially solved by using better hardware (some microphones can more or less successfully eliminate background noise) and partially by organisational means (putting the computers more distant from each other, do not use speech input with big groups of pupils etc.).

The second problem can be avoided by designing the program in such a way that it will listen only when the user wants.

In Imagine there is a global switch to switch listening to voice commands on or off. The MainWindow object has a setting acceptVoiceMenu. Its value can be true or false. By default it is true, which means that if the content of voiceMenu setting is not empty then speech commands are recognised and executed. When set to false then Imagine does not execute voice commands even there are active voiceMenu and mainVoiceMenu settings.

The basic way to set the acceptVoiceMenu setting is using the main menu. Its Options/Accept Voice Menu command directly corresponds to the acceptVoiceMenu setting.

To make switching voice menus on and off easier for the user of a particular Logo program, we can program a switch button located on the current page, which will switch the acceptVoiceCommand setting on and off according to the state of that button:

222

Page 5: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

Another technique is defining special voice commands to switch listening on and off. The trick here is that we cannot switch off listening completely because then no command for switching it on could be recognised. We will rather change between two menus. One menu (the sleep menu) will contain just one phrase and its corresponding action will redefine the voice menu to the full menu. One command of the full menu can switch the menu back to the sleep menu.

In the following example we will demonstrate a slightly modified approach: the full menu will revert to the sleep menu after some time of silence. It means that if for some period of time there has been no command recognised the computer will turn off listening to the whole set of commands and will listen only to the only command, which can wake it up again.

In Page1 we define three new procedures:page1'to switchOn setvoicemenu [left [do [lt 90]] right [Do [rt 90]] forward [Do [fd 30]]] indicator'setshape [setpc red setpenwidth 60 dot]end

page1'to switchOff setvoicemenu [computer [switchOn]] indicator'setshape [setpc black setpenwidth 60 dot]end

page1'to Do :x cancel [switchOff] run :x after 2000 [switchOff]end

Procedure switchOn switches listening to the full menu and defines the shape of a turtle called Indicator to a red filled circle.

Procedure switchOff switches listening to the sleep menu. It contains just one phrase: Computer. Its corresponding action is calling SwitchOn. Then it sets the indicator's shape to black circle.

The Do procedure is used to run a command from the full menu. After each command is executed it launches a process, which will do nothing for 2 seconds and then calls switchOff. But before any next command is

223

Page 6: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

executed the switchOff process is cancelled. So its effect can take place only if for more than 2 seconds the procedure Do was not invoked.

Then we define the voiceMenu of page1:page1'setVoiceMenu computer [switchOn]

And create a turtle somewhere in the corner and name it Indicator. And you can try the whole program.

Note that after saying Computer the computer waits for the first command infinitely because the mechanism of switching off after two seconds is controlled by the Do procedure, which is invoked only when a command is recognised. We think that it is a good feature to wait for the first command infinitely. If you do not like it, you can modify the switchOn procedure starting a switchOff process it its last line:

page1'to switchOn setvoicemenu [left [do [lt 90]] right [do [rt 90]] forward [do [fd 30]]] indicator'setshape [setpc red filledcircle 30] after 2000 [switchOff]end

Commanding multiple turtles

Now let's try to make a program, which commands multiple turtles.

Create three turtles and give them names: John, Mary and Annie.

Then define voiceMenu just slightly differently than in our last turtle commanding example:[[red pen] [setpc "red] [black pen] [setpc "black] [big pen] [setpw 10][small pen] [setpw 1] right [rt 45] left [lt 45] [pen up] [pu] [pen down] [pd] clean [clean] start [startMoving] stop [stopMoving]] John [setPageWho "John] Mary [setPageWho "Mary] Annie [setPageWho "Annie] Nobody [setPageWho []]]

The added commands change the active turtle to John, Mary or Annie accordingly. These new commands do not use the tell command because it would change the active turtle just in the current process, which was invoked by the speech command. But we want to change the active turtle globally within the page i.e. to force also processes started in the future to use the new active turtle. Therefore setPageWho must be used. The content of PageWho setting of the current page becomes who for all new processes started by voiceMenu. The command Nobody switches off all turtles.

We must somehow make the active turtle evident to the user. Let's make it blinking. For this we define a procedure startBlink for Page1 to start a process, which will each 200ms hide the active turtle, then wait 200ms and then show it. Note that during the execution of this procedure the active turtle may change and therefore we must store the name of the active turtle in a local variable w.

page1'to startBlink every 200 [ let "w pagewho if :w <> [] [ask :w [ht] wait 200 ask :w [st]] ]end

Procedure startBlink must run all the time. Therefore it has to be invoked from the startup procedure of the MainWindow object:

to startup StartBlinkend

224

Page 7: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

Then we must define slightly modified startMoving and stopMoving procedures. They must consider that now we can have more move processes running, so they must be named according to the currently active turtle.

Page1'to startMoving if done? word "move who [(every 30 [fd 1] word "move who)]end

Page1'to stopMoving cancel word "move whoend

We used this activity with my son (he was 11) and his friend (10). They are not native English speakers. So it has taken some time to adjust their pronunciation to be understood by the speech engine correctly. But when they finally succeeded to command the turtles, they were quite excited about the fact that the computer understands them.

Other uses of speech input

In all above examples we just commanded turtles. This is an evident idea, which comes to anybody's mind when having speech input with Logo.

There are much more possible uses of speech input where the user does not speak turtle commands, but provides other input to a program.

Our final example will be a memory game. Its objective is to show a number of things, then hide all of them and then show all but one of them and ask: "What's missing?". The player must tell the name of the thing, which is missing.

At first we will create 20 turtles having different shapes and names set according to their shapes. We also want each turtle to say its name when clicked on it. So we at first create one turtle, give it the shape of a cat, change its name to Cat and define its onClick event:

Then we copy the turtle to clipboard and paste it 19 times. Then we find a nice picture for each copy and rename the turtle accordingly. This is an example of using cloning in Logo as it is discussed in more details in Tomcsanyiova, (2001).

225

Page 8: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

Then we must program the hiding of one thing and waiting for the correct answer. In this case we would need to construct a voice menu containing 20 names each time with different actions (19 say "no" and one saying "yes"). So we will rather use a different approach now. If an action in the voiceMenu setting is an empty list then after recognising the corresponding phrase an onVoiceCommand event is triggered on the object containing the voice menu (Page1 in our case). That event can get the heard phrase as the content of variable :voicecommand variable and can react accordingly.

Create a button (named automatically b1) and a text box (automatically named text1) on the page. Define the Caption and the onPush action of b1:

b1'setcaption "Playb1'setevent "onPush [takeThing]

and define two procedures for the button:to createVoiceMenu :x ifelse empty? :x [op :x] [op fput first :x fput []createVoiceMenu bf :x]end

to takeThing text1'setvalue " ask all [ht] wait 300 make "hidden pick all setvoicemenu createVoiceMenu all ask butmember :hidden all [st] say [Tell me, what's missing!]end

The function createVoiceMenu creates a voice menu containing all words on the input list :x and having each action an empty list.

The main procedure of the button is takeThing. It empties the content of text1, then hides all turtles, then picks randomly one turtle and assigns its name to a global variable hidden. Then creates the voice menu, shows all turtles but the picked one and announces the assignment.

When the user says anything it is either on the voice menu generated from names of all turtles or it is something unknown. In each case an onVoiceCommand event of Page1 is triggered. So we need to define a reaction:

Page1'setevent "onVoiceCommand [evaluateAnswer]

to evaluateAnswer text1'setvalue :voicecommand if :voicecommand = [] [say [Sorry?] stop] ifelse :voicecommand = :hidden [say [You are right!]] [(say [No,] :hidden [was missing.])] ask :hidden [st] setvoicemenu []end

The procedure evaluateAnswer puts sets the heard phrase into text box text1. Then if the heard phrase is an empty list (which means that the speech engine has registered something said but it was not similar to any of the phrases on current menus) then the program asks "Sorry?" otherwise it checks if the answer was correct and reacts accordingly. Lastly shows the one hidden thing and erases the voice menu.

226

Page 9: Using Speech in Logo Programming - OCGocg.at/activities/books/volumes/band 156/P73PeterTo… · Web viewImagine gives the power of these engines to the Logo programmer. Most speech

4. Conclusion

In the article we wanted to show some basic approaches to utilising speech input and output. Although speech output was already present in some Logo implementation, speech input is a new phenomenon in the Logo world (as far as we know). Therefore we had no experience with using speech technology in Logo-like environments.

We wanted to give some introductory examples how to use this new and exciting technology. We hope that these examples will be interesting for all kinds of Logo users and help them to start using speech technology in their programs and gathering more experience how to use this new technology in Logo-like microworlds.

During the development of Imagine we made several trials with small groups of children and adults as well. The basic result is that the use of speech commands attracts the users. In one of our few practical trials we the Commanding multiple turtles program with two 11 years old boys. For them it was a challenge to see how the turtles react to their commands. As their native language was not English the other challenge was trying to pronounce the English commands as well as possible to be understood by the computer.

On the other hand there are still some problems - there are technical difficulties when used in noisy environment, the speech engines need strong hardware to run on and there is a small number of languages supported yet. In fact we have just English recognition engines and therefore we were not able to try any activities with children, who are not able to pronounce at least a few English words.

5. References

Blaho A, Kalas I and Tomcsányi P (1999) OpenLogo - A New Implementation of Logo in Proceedings of Europlogo 1999, Sofia 1999

Blaho A, Kalas I (2001) Object Metaphore Helps Create Simple Logo Projects in Proceedings of Eurologo 2001

Tomcsányiová M (2001) Cloning in Logo programming in Proceedings of Eurologo 2001

227