Acoustic Modems for Ubiquitous Computingomni.isr.ist.utl.pt › ~aguiar › 2003-PERVASIVE.pdf · enabling communication among small computing devices and humans in a ubiquitous computing

62 PERVASIVEcomputing Published by the IEEE CS and IEEE ComSoc ■ 1536-1268/03/$17.00 © 2003 IEEE

C O M M U N I C A T I O N D E V I C E S

Acoustic Modems forUbiquitous Computing

Sound has been largely ignored or dis-missed in digital communications.Given that computer systems have beentraditionally aimed at long-distance andhigh-bit-rate communications, this dis-

missal is understandable. The data rates forsound are relatively low compared to media suchas radio, and the sounds can be annoying.Designers of most current digital communica-tion systems assume applications need com-minicating devices to exchange large amounts ofinformation, and so they focus on maximizing

the transmission bit rate. Thisis the case, for example, withtransferring files over a localarea network (LAN) or videoover the Internet. As small,mobile computational devicesbecome more pervasive, how-

ever, the need for short-range, localized com-munications is as urgent as the need for long-distance communications.

Ubiquitous computing1 refines the very notionof computers and computation, envisioning aworld in which small devices with embedded com-putational and communication capabilities inter-act with other devices and with humans for pur-poses other than traditional desktop work. Effortssuch as IrDA, Bluetooth, and, more recently,impulse radio, or ultrawideband signaling2,3 tar-get the short-range communications critical in ubi-

comp. Ubicomp also breaks the assumption thatcomputing applications need high bandwidth. Infact, interconnected devices’ communication needsare diverse, ranging from very low to very highbit rates, and from short-range, localized messag-ing to long-range connections.

Motivated by the specific characteristics of theaerial acoustic communication paradigm used byhumans and other animals, we examine the useof device-to-device aerial acoustic communica-tions in ubicomp applications. Unlike radio,sound is contained within walls, which can helpsolve interference problems at the physical layer.And, unlike infrared, sound doesn’t need line ofsight and so can go around corners, which canbe useful for moving or hidden devices. Unlikeboth media, sound can also expose the commu-nication to humans when necessary, as well assome location aspects of the communication.Another reason to revisit acoustic modems 40years after their invention is that CPUs are sev-eral orders of magnitude faster than they were inthe 1960s. Thus, we can now implement, entirelyin software, cheap acoustic modems that use ordi-nary microphones and speakers.

The Digital Voices project explores device com-munications using audible sound.4 (The “Soundin Device Communications” sidebar gives anoverview of the use of sound in device commu-nications.) Within the project, we investigatebridges between human and computer commu-

Sound offers features not available with other short-range, lowbandwidth communication technologies, such as radio and infrared,enabling communication among small computing devices and humansin a ubiquitous computing environment.

Cristina Videira LopesUniversity of California, Irvine

Pedro M.Q. AguiarTechnical University of Lisbon

nications and explore the existence of apervasive infrastructure for sound in theaudible band, including desktop com-puters, palm devices, memo recorders,and televisions, to support no-cost short-

range communications in ubicompapplications. Hence our focus on audiblesound: We could also use ultrasound, butit requires hardware that few existingaudio devices have.

Digital Voices overviewWhen we first started exploring

sound, we analyzed common modula-tion techniques and the sounds they pro-duce. Some modulation parameters

JULY–SEPTEMBER 2003 PERVASIVEcomputing 63

U ltrasonic remote controls, invented in the 1950s, were one of the

earliest attempts to use sound in machine communications and

are still used in commercial products. The first commercial ultrasonic

remote control, Zenith’s Space Command, was a purely mechanical

device that used four rods—two for channel up and down, one for

power on or off, and one for sound on or off—instead of batteries. De-

signers cut the rods to lengths that would generate sounds of slightly

different frequencies.

We categorize the traditional uses of sound in computer systems

into three groups:

• Long-distance point-to-point computer communications over

existing telephone networks (phone-line modems)

• Underwater communications

• Speech recognition/synthesis and other nonspeech auditory di-

splays that enhance human-machine interactions

Telephone modems Modems originally allowed long-distance point-to-point com-

munication using the voice band in ordinary telephone networks.

The first modems were acoustically coupled: a user placed the

telephone receiver into a modem handset and the modem sent

tones to the telephone. Early acoustic modems transmitted at 300

bits per second.

Direct-connect modems, which interface directly with the tele-

phone line, have replaced these acoustic modems. They are less

bulky, give a better connection, and avoid the background noise

problems of acoustic modems. Modern modems transmit at bit

rates of up to 56 Kbps.

Underwater communicationsBanned from above-ground computer networks, acoustic mo-

dems found a niche in underwater communications. Although

water is a poor medium for radio propagation, it’s very good for

sound propagation. Since World War II, submarine crews have used

ultrasound to detect objects, position, and communicate. Recent

advances in digital signal processing hardware, driven by the needs

of autonomous underwater vehicles, have made acoustic modems

for underwater wireless communications possible. These modems

use the upper audible and lower ultrasound bands (12 KHz to 30

KHz) and transmit at a maximum bit rate of 19.2 Kbps. A survey of

underwater communications is available elsewhere.1

Human-computer interactionHuman-computer communications use sound in speech recog-

nition and auditory displays.2 Speech recognition and synthesis is

a well-established field of research and development; we focus on

the less examined nonspeech interfaces.

Auditory displays help users monitor and comprehend the data

the sounds represent. Any data source—from databases to real-time

stock market quotes—can provide information to display. Many sys-

tems have used nonspeech sound for auditory displays, either as

stand-alone interfaces or graphical user interface enhancements.

Auditory display of computer communications takes many

forms. Anyone with root access to a Unix machine, for example,

can feed the Ethernet card traffic into the loudspeakers, generat-

ing a series of noises that reflect the electric impulses of messages

arriving. Several digital artists have created artwork reflecting the

sounds of network traffic.

The modems we describe in this article aren’t displays for

humans to hear; rather, they are displays for computers to read,

but they can convey information to human listeners.

Recent workIn recent years, researchers have examined the area of informa-

tion hiding in audio (a survey by Stefan Katzenbeisser and Fabien

Petitcolas, for example, contains extensive references to work in the

field3). The music industry has been exploring audio watermarking

as a way to preserve ownership of music in electronic format. Inter-

est in this kind of communication is growing, and much work

remains.

Vadim Gerasimov and Walter Bender used the aerial acoustic

channel for device-to-device communications.4 They evaluated

simple modulation techniques according to the data rate, compu-

tational overhead, noise tolerance, and disruption level, and report

a maximum data rate of 3.4 Kbps using a frequency of 18 KHz.4

REFERENCES

1. M. Stojanovic, “Recent Advances in Underwater Acoustic Com-munications,” IEEE J. Oceanic Eng., vol. 21, no. 2, 1996, pp.125–136.

2. G. Kramer, ed., Auditory Display, vol. XVIII of SFI Studies in theSciences of Complexity, Addison-Wesley, 1994.

3. S. Katzenbeisser and F. Petitcolas, eds., Information Hiding: Tech-niques for Steganography and Digital Watermarking, ArtechHouse Publishers, 2000.

4. V. Gerasimov and W. Bender, “Things that Talk: Using Sound forDevice-to-Device and Device-to-Human Communication,” IBMSystems J., vol. 39, nos. 3–4, 2000, pp. 530–546.

Sound in Device Communications

strongly affect how humans perceivesound quality. Varying those parameterslets us obtain many different types ofacoustic messages, ranging from modemnoise to music.

In this article, we describe our experi-ence designing audible-range acousticmodems for ubicomp using two criteria:

• The messages of these communicationsystems must be pleasant to people.They should sound like music or otherfamiliar sounds.

• The systems must be deployed in ordi-nary hardware and use the existinginfrastructure for voice to avoid extracost. Software should perform thecoding and decoding.

Our experiments show that our goalsare attainable. This article reports thoseexperiments and lays out the groundwork for future research.

Application areasUbicomp applications can and should

exploit all sorts of short-range commu-nication media, including radio, infrared,and sound. Because these media com-plement each other, they should be ap-propriate for each application.

Applications that don’t require highbit rates can use sound. A developermight choose sound instead of radio orinfrared for an application that

• Can easily be deployed in the existingvoice infrastructure, avoiding the ex-pense of extending the hardware withradio or infrared transmitters

• Requires or benefits from humanawareness of the communication

• Requires short-range broadcasts en-closed within walls and windows

One application of Digital Voices,developed at Xerox PARC, uses sound tosafely transmit preauthentication infor-mation in wireless networks.5 To preau-thenticate information using the wireless(radio) network in an office building, alaptop user first goes to the room wherethe base station is located. Once there, thelaptop and base station exchange crypto-graphic keys via microphones and speak-ers, with all exchanges remaining securewithin the room. When the laptop has thekey, as well as other context-dependentinformation, the user can roam the build-ing using crypto over the “leaky” wire-less network.

Another application is related to ste-

ganography. We’ve implemented com-munication systems that produce acousticmessages using natural sounds, such asbird songs or crickets. To the human ear,these sounds are indistinguishable fromthose produced by animals, yet they carrycoded messages. Agencies conductingsecret operations might be interested insuch an application, given that thesesounds can be broadcast by traditionalmedia and can be picked up anywhere inthe world with a laptop or PDA.

A related application area is enter-tainment. One of our communicationsystems sounds like R2D2, the robotfrom the Star Wars movies. For demon-stration purposes, we edited parts of themovies, replacing the original R2D2sounds with our sounds containing mes-sages. An R2D2 toy augmented with amicroprocessor and a microphone canreceive instructions or programs fromthe movies or from TV. Using sound asthe communication medium is not onlyfun but also easy to implement withouthaving to replace or augment the TV.

Ad hoc networks that relate back to theInternet is a fourth candidate area for Dig-ital Voices. Memo-recording devices (suchas most cell phones) could store andexchange URLs in a form that’s more reli-able and compressed than human speech.For example, a radio program couldbroadcast the URL for a special discountduring a commercial. Listeners couldrecord it in their cell phones and later playit back to the desktop computer.

Human factorsBecause this work targets short-range,

64 PERVASIVEcomputing http://computer.org/pervasive


0

2,600

2,400

2,200

2,000

1,800

1,600

1,400

1,200

1,000

8001 2 3

Time4 5 6 7

Freq

uenc

y

8

Figure 1. Spectrogram of the sound of amultifrequency B-ASK (binary-amplitudeshift key) modulated message with a 100-ms symbol duration. Carrier frequenciesare those of the musical notes of thepentatonic scale. The message is asequence of 7-bit ASCII characters inwhich the higher-order bit is always 0.This sound resembles music played bysoprano flutes.

low-bit-rate aerial communications thatmight interest people, why don’t we usehuman speech? In short, human speech isnot always the most efficient encoding,and speech recognition engenders a hugecomputational overhead in the decodingdevices. For example, the text www.ama-zon.com/exec/obidos/ASIN/0262640414/qid=1034295226/sr=2-1/ref=sr_2_1/002-2920555-7785632 is human-readablebut not meant for humans to read, muchless pronounce. At most, humans mightbe interested in knowing the Web site towhich the URL refers. These sort of shortmessages are central to several ubicompapplications. Standard techniques in dig-ital communications engineering are sim-pler and more efficient than humanspeech.

Early in our experiments, we foundthat the straightforward application ofstandard modulation techniques used innonaudible modems is inappropriate foraudible modems, because people associ-ate those sounds with machine commu-nications and tend to get annoyed bythem. When designing aerial acousticcommunication systems, we must ac-count for human perception and makesounds that are acceptable to peoplewhen they need to be aware of them, andimperceptible when they do not.

Human factors affect sound design intwo types of acoustic messaging systems:

• Systems using very short durationsounds that are seldom transmitted(say, occasional blips under 1 second)

• Systems using longer duration sounds

In the first case, human perceptionfactors in sound design are importantbut not crucial; however, in the secondcase, human perception is critical, bothas a limitation and an opportunity forcreative artists. Although the DigitalVoices project uses both types of sounds,this article focuses on modems for long-duration sounds.

Musically oriented variationsof common modulationtechniques

We adapt common modulation tech-niques, such as amplitude shift key(ASK) and frequency shift key (FSK), tosupport our aerial acoustic communica-tion systems. Samples of the sounds ourmodems produce are available at www.ics.uci.edu/∼lopes/dv/dv.html.

Amplitude shift keyIn ASK, information is coded in the

amplitude of a sinusoidal carrier. To im-prove robustness, we use binary (B-ASK)signaling—that is, the signal coding a bitis either zero or a sinusoid segment witha specified fixed-value amplitude. Toincrease the sound’s frequency diversity,we use multifrequency coding schemes—that is, the coder transmits several bitsin the same time slot.

Pentatonic scale. We use an 8-frequencycoding scheme, starting at 1,000 Hz. Fre-quency selection determines the sound’sperceptual quality. Although using majoror minor chords, or sets of tones, workedwell, selecting frequencies according to apentatonic scale6 achieved a much morepleasant sound. Symbol duration impactsthe sound produced and, obviously, thetransmission bit rate. Figure 1 shows thespectrogram of sound of the messages fora 100-ms symbol duration. The soundresembles a piece of music played by sev-eral soprano flutes, and the data rate is80 bps. For a 20-ms symbol duration, thesound is similar to that of grasshoppers,and the data rate is 400 bps.

Harmonic ASK. A second sound designuses 128 frequencies, all 70-Hz har-monics, starting at 700 Hz. For a 100-ms symbol duration, the resulting soundis like an electronic-music drum beat.The large number of carriers results in ahigher bit rate—1,280 bps—than theprevious example.

Frequency shift keyIn FSK, the modem fixes the sinu-

soidal carrier’s amplitude and codes themessage in its frequency. The number ofbits per symbol determines the numberof distinct frequencies allowed for thecarrier. Our designs use FSK either aloneor with ASK.

Harmonic FSK. To transmit 8-bit symbols,we use 256 20-Hz harmonic frequenciesstarting at 1,000 Hz. For a 20-ms sym-bol duration, the sound resembles a sin-

gle grasshopper with a bit rate of 400 bitsper second. When we increase the symbolduration to 100 ms the sound is like abird, and the data rate drops to 80 bps.Figure 2 shows the resulting spectrogram.

Although the data rates for modemsusing pentatonic scale and harmonicFSK are the same (because both code 8bits per symbol), the resulting sounds arequite different, as Figures 1 and 2 indi-cate. In pentatonic scale, several tonesusually occur at each time slot, whereasin harmonic FSK, only one tone soundsat each time slot. The human ear cannotdistinguish differences in tone for smallsymbol duration values, such as 20 ms.Differences only become clear at 100 ms.


Human speech is not always the most efficient

encoding, and speech recognition engenders a

huge computational overhead in the

decoding devices.

Chords. In this example, we code 7-bitsymbols. We use the seven tones of the Amajor diatonic scale as the keys to formchords (A is 440 Hz). The chords can beeither major or minor (indicated by thepresence of the third major or thirdminor) and can include the seventh tone.

Given a 7-bit value b6b5b4b3b2b1b0,we establish a mapping that simultane-ously uses FSK and ASK:

• b6 determines whether the chordincludes the seventh tone

• b5 determines the mode (major orminor)

• b4b3b2 determine the chord’s key• b1b0 determine which inversion to use

(we assume four possible inversions,the fourth being the same as the firstbut one octave above)

For 7-bit ASCII characters, this mappingproduces sounds that are identifiable byhumans as corresponding to numbers ver-sus letters (presence or absence of the sev-enth tone) and capital (minor) versus low-ercase (major) letters. To enhance thesound, we include a silence every sevensymbols. The result is a sequence of famil-iar chords with a 4/4 rhythm that, al-though in arbitrary sequence, makes the

message sound like an ordinary musicalcomposition. With this coding scheme,we get a 35-bps bit rate (for 7-bit ASCIIcharacters, this means 5 characters persecond) for a 200-ms symbol duration.

Robustness through redundancy. Toincrease the robustness of our chord-coding scheme, we use a redundant code.The modem sends the redundant infor-mation on the chord key’s fourth andsixth harmonics, encoding the sameinformation but in a different form. Itencodes bits b4b3b2 (the key) in the har-monics’ frequency values—for example,if C is the key, the presence of C′′ (thefourth harmonic) and G′′ (the sixth har-monic) will verify that. The modemencodes the inversion, the mode, and thepresence or absence of the seventh har-monic (the remaining four bits) in thetime slot at which the two harmonics areplayed—phase modulation. The result isthe same musical composition overlaidwith a xylophone-like melodic line thatsounds slightly out of tempo.

Frequency hoppingFrequency hopping changes a chan-

nel’s carrier frequency as a function oftime. Researchers have used this mech-

anism extensively since WWII, both tosend messages that enemy receivers can-not detect (unless they know the hop-ping sequence) and to minimize colli-sions in multi-user radio channels (suchas in cell phone networks). In aerialacoustic communications, frequencyhopping is a key factor in producingmelodic messages. We’ve implementedtwo frequency hopping designs.

Melody. This design’s hopping code is amelodic line from the movie, CloseEncounters of the Third Kind: B′–C#′–A′–A–E–E. The tones last for about onesecond—that is, the system hops everysecond—and are included in the final sig-nal. The data is B-FSK-modulated usingthe hopping code frequency’s fourth andeighth harmonics, representing 0 and 1,respectively. The symbol duration is 10ms, and the bit rate is 100 ms. The resultis a relatively slow melodic line with fasttemporal variations in the two higherharmonics of each tone.

Bach’s piece. The second design usesJohann Sebastian Bach’s Badinerie,shown in Figure 3, as its hopping code.We use B-ASK on the base notes’ firstfour harmonics. The result is the well-known melody line with fast temporalvariations. The duration of each symbolis 32 ms, resulting in a bit rate of 125 bps.

Synthesized musical instruments Musical instrument synthesis is an

active research area in the computermusic field, and methods for instrumentsynthesis have been studied and used fordecades.7 Our approach modulates in-formation in piano, bell, and clarinetsounds. To do this, we synthesize theinstruments’ sounds and then map theinformation to musical notes.



0

3,500

3,000

2,500

2,000

1,500

1,000

1 2 3Time4 5 6 7

Freq

uenc

y

8

Figure 2. Spectrogram of the sound codingan FSK-modulated 8-bit text string. Thesinusoidal carrier frequencies are 20-Hzharmonics and the symbol duration is 100ms. This acoustic message sounds like abird.

The challenge of designing air mo-dems is reproducing the instruments’main sound characteristics so the syn-thesized acoustic waveforms can codethe information without requiring spe-cific and complex coding schemes and,consequently, very expensive receivers.As earlier work describes,8 to achieve thisgoal we synthesized the musical instru-ments’ audio spectra by either explicitlychoosing the sound’s spectral compo-nents (in the case of the piano) or usingfrequency modulation techniques (in thecase of the bells and clarinet).9,10

The modulation scheme assigns eachsymbol to a synthesized musical note ofa given fundamental frequency. Thescheme chooses the notes so their spec-tra do not collide. This way the receivercan perceive the transmitted notes by sim-ply detecting their fundamental frequen-cies, just like a standard sinusoidal re-ceiver. In addition to its computationalsimplicity, the receiver doesn’t even needto know which instrument the transmit-ter is synthesizing; the instrument beingsynthesized can change any time withoutaffecting the transmission. The drawbackof this versatility is the low spectral effi-ciency and consequent very low bit rate.

Bio-inspired modulationtechniques

Several sound designs target the emu-lation of communication systems usedby imaginary and real creatures. Exam-ples of these sounds are at http://birds.cornell.edu/brp.

Imaginary creatures We made R2D2, the robot from the

Star Wars movies, talk by mapping char-acters to R2D2-like sounds so the soundswould convey text messages. To achievethis, we first made a time-frequency

analysis of R2D2’s sounds. Using thatanalysis, we identified three major soundtypes: beeps, chirps, and grunts. TheR2D2ness of the sounds consists of amixture of the three sound types, withbeeps slightly prevalent. To preserve thatnature, we defined the mapping fromtext symbols to acoustic signals accord-ing to Table 1.

With this symbol mapping, our R2D2modem transmits at an average bit rateof 35 bps. Obviously, the actual bit ratedepends on the text message’s structure(percentage of punctuation charactersand numbers). Figure 4 illustrates theR2D2 modulator by showing the spec-trogram of a certain text message.

Animals We also used sounds from nature,

such as bird and cricket sounds. As withthe R2D2 modem, we first made a time-frequency analysis of the sounds. Wethen used the analysis to synthesizesounds resembling the original ones,modulating information in amplitude,frequency, and phase.

Bird. We analyzed a real bird’s songobtained from a bioacoustic databaseby looking for different sound patterns.We synthesized the patterns, embed-ding information in their amplitudeand frequency.

Figure 5 illustrates the process. Figure5a displays the time evolution of theamplitude of the bird song’s acoustic sig-nal. Figure 5b shows three typical seg-ments of that song and the correspond-ing spectrograms; and Figure 5c shows asynthesized signal emulating the birdsong. We modulated the information,represented by a sequence of ASCII char-acters, in the amplitude and frequencyof the song’s acoustic units. This codingand modulation scheme yields a 35-bpstransmission bit rate.

Cricket. Cricket songs have a particularcharacteristic structure consisting ofsequences of three fast sound impulses,as Figure 6a shows. To produce soundswith this structure, our cricket modula-tor codes the information, representedby ASCII characters, in amplitude and


Figure 3. Excerpt from Johann SebastianBach’s Orchestral Suite No. 2 in B minor(Badinerie), which we used as frequencyhopping code.

TABLE 1R2D2 symbol mapping scheme.

Symbol Acoustic signal type Duration (sec)

A–Z FSK—that is, beeps of 26 different frequencies 0.1Space, ?, !,. Chirps starting and ending at different frequencies 0.250–9 Multi-ary (M-ary) ASK using relatively low frequencies 0.2

phase. Figure 6b represents the time evo-lution of the acoustic signal correspond-ing to a modulated sound. The figureclearly shows the amplitude modula-tion—each impulse has one of two dis-tinct magnitudes.

In phase modulation, the impulsetriples occur once in a specified timeperiod, in this case 0.5 seconds. Themodulator divides the time period intofour equal slots, letting the informationsymbols determine in which of the fourslots to transmit the impulse triples. Thistype of phase modulation randomizesthe sound, making it more natural.Although subtler, the phase modulationeffect is still visible in Figure 6b. Thismodem encodes information at a bit rateof 22 bps.

Practical mattersOur experiments show that con-

structing aerial acoustic modems in soft-ware and using the existing hardwareinfrastructure for sound is feasible. Infact, our modems produce sounds thatare relatively pleasing to humans overshort time periods. We have informallydemonstrated the modems to more than250 people, ranging from audiences of20 to 50 people to one-on-one demon-strations. In none of these informaldemonstrations did anyone associate thesounds with computer communications,



0

10,000

8,000

6,000

4,000

2,500

0,0000.2 0.4 0.6

Time0.8 1 1.2 1.4

Freq

uenc

y

1.6 1.8 2

Figure 4. Spectrogram of the sound cod-ing the message, “This is R2D2!” The first-time bin is a hail frequency. Subsequentbins stand for T-H-I-S-space-I-S-.... Noticein particular the longer bins correspond-ing to the chirps that code the spaces andthe ! and the low frequency sounds co-ding the 2s.

10.80.60.40.2

0–0.2–0.4–0.6–0.8

–1

(b)

(c)

(a)0 0.5 1 1.5 2 2.5 3

–10–3

0.80.60.40.2

0–0.2–0.4–0.6–0.8

0 0.5 1 1.5 2 2.5–10–3

Time

Time

Time

Time

Freq

uenc

y

Ampl

itude

Ampl

itude

Ampl

itude

Freq

uenc

yAm

plitu

de

Freq

uenc

yAm

plitu

de

Time

Figure 5. Graphic representations of a bird song: (a) time evolution of the bird’s acoustic signal; (b) magnified representative segments withcorresponding spectrograms; and (c)time evolution of the synthesized birdsound, resembling the real bird’s song.

and most people were surprised whenwe decoded the sounds into messages. Inthe one-on-one demonstrations, 90 per-cent of the people had trouble believingthe decodings, and some looked for ahidden trick or infrared and radio links.Some sound designs “fool” human lis-teners better than others. For example,the bird and the cricket are some of theleast suspicious sounds because to thehuman ear, they sound exactly like theirnatural counterparts.

ReceiversEarlier work describes the receiver

implementation.4,8 Most of our designsrequire simple signal-processing tech-niques. Many of our modulations arebased on the presence or absence of cer-tain frequencies, and as such, the re-ceivers simply perform a number of com-putations of the form

where r(n) is the received signal, fi

the expected frequency, and N the num-ber of samples in a symbol duration.Although an explanation of this tech-nique falls outside this article’s scope,this equation illustrates the computa-tional overhead at the receiver for thesimple designs.

Experimental environment and error rates

In our experiments, we’ve been usingdesktops and laptops with processingspeeds above 800 MHz running Win-dows and have been satisfied with theirperformance. The decoder for all thecoding schemes is based on a simplecorrelation receiver.11 For the penta-tonic scale and bird song designs, weimplement the decoders in Java, and wecan decode all messages in real timewith practically no errors. We get theseresults while running several otherapplications, such as MicroSoft Out-look and Internet Explorer. We’ve alsoimplemented the pentatonic design in

C++ for two versions of the iPAQ—the36 series (a StrongARM processor at206 MHz) and the 39 series (an XScaleprocessor at 400 MHz)—both runningWindows. In these implementations,the receiver program first records themessage and then decodes it. The 36-series iPAQ takes about 6 seconds todecode a 20-character (160-bit) mes-sage, whereas the 39 series takes about2 seconds. Again, the error rate is prac-tically zero.

Error rates, in general, depend onmany factors. As expected, error rates arelower for lower bit rates at the source.For example, for modems described inthe “Musically oriented variations ofcommon modulation techniques” sec-tion, the harmonic ASK design (bit rate of1,280 bps) produced error rates ofaround 30 percent, whereas the penta-tonic scale and harmonic FSK designs(400-bps bit rates) generated no errors ina relatively quiet office environment. Ingeneral, our modems operate with almostno errors at bit rates up to 800 bps.

R r n j f n

N

i in

N

==

∑ ( )exp( )

,

20

π


0.6

0.4

0.2

0

–0.2

–0.4

–0.6

–0.8

(a)0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000

0.8

0.6

0.4

0.2

0

–0.2

–0.4

–0.6

–0.8

–1

(b)0.5 1 1.5 2 2.5

Time Time3 3.5 4.54

× 10–3

Ampl

itude

Ampl

itude

Figure 6. Graphic representations of a cricket’s song: (a) acoustic signal of the song, showing sequences of three fast soundimpulses, and (b) a cricket modulator.

LimitationsThe channel we are dealing with in-

cludes the air—the transmissionmedia—as well as the hardware used forproducing and capturing the sound.Because we use ordinary speakers, micro-phones, and sound cards, this channel isfar from ideal, and channel characteris-tics vary from device to device. Most ofthe existing hardware can handle soundsfor humans (voice, music, and so on)with a quality acceptable to the humanear. The human ear is almost deaf com-pared to artificial decoders, especiallythose modeled after traditional digitalcommunication receivers. Therefore,

channels that target humans can copewith several simplifications and tolerateseveral imperfections.

For example, even though the humanaudible sound band extends to 20 KHz,many existing audio channels, such astelephones, TVs, and desktop comput-ers, are band-limited to lower frequen-cies—typically 4, 8, and 12 KHz, respec-tively. This occurs because the lowerbands can carry speech and music signalswith acceptable perceptual quality. Weconsidered these limitations when design-ing our modems: all of the modems oper-ate below 12 KHz, with almost all ofthem operating below 8 KHz (exceptionsare the harmonic ASK and Bach designs),and several operating below 4 KHz.

Experiments involving video equip-ment and TVs attest to the feasibility ofthese modems. We’ve recorded imageswith our sounds embedded (pentatonicscale sounds) using an ordinary videocamera and later played it on a regular TVset while running the decoder on a laptop.

The decoder performed with no errors.These channels’ limitations are not

restricted to band. The existing hardwareis usually nonuniform with respect to fre-quency responses and often introducesnonlinearities in the channel. This makesit harder to implement the decoders, be-cause we cannot rely on consistentresponses when we vary the devices.

These problems are solvable, however.For example, the transmission power atcertain frequencies is not always withinthe same limits for all hardware. Weinclude calibration sequences in thebeginning of the signals and normalizethe decoding to those values. This solves

two problems at once: the inconsistencyof the hardware response and the effectof the distance between speaker andmicrophone.

In designing aerial acoustic modemsfor ubicomp, we must accept atrade-off between data rate and aes-thetics: the faster the transmission,

the more likely the sounds will be annoy-ing, and vice-versa. Rather than seeingthis as a setback, we see this as a chal-lenge for designing pleasant sounds thatare robust enough to survive transmis-sion. We expect that sound designers—from music composers to HCI develop-ers—will affect the kinds of soundsacoustic modems will use.

We envision applications of theseacoustic modems in several domains,including security (information warfare),entertainment, ubiquitous Internet, andmedical informatics. This will require alot more work, and we’ve clearly only

started exploring the possibilities.Future work might examine the effect

of other sounds on the data rate. Aerialacoustic channels can be very noisy. Thesound of a TV in the living room, forexample, competes with the sounds ofpeople talking, children playing, objectsfalling on the floor, and so on. Whentransmission frequencies collide, errorrates increase. Hotel rooms and officesare usually much quieter than homes,with only occasional bursts of sound asobjects move.

Microphone quality can also affect datarate. The more sensitive the microphone,the more noise it picks up. Thus, a cheapmicrophone closer to the sound sourcemight be better than a high-quality micro-phone as it will naturally filter most of theambient noise.

We also need good noise models forthese channels. The additive white Gauss-ian noise model, used widely in commu-nications, misses these channels’ mostimportant noise components. We need tostudy the channels more carefully so wecan design sounds that are robust withrespect to the most likely ambient noisefor each application.

Future work might also explore the roleof human perception and its effect onsound design. What kinds of sounds cantransmit information to machines while“leaking” some of the information to peo-ple? On the other hand, what sounds areeasiest to ignore? For this, we can borrowsome findings from auditory studies, butwe plan to conduct our own studies, espe-cially with respect to the disclosing ofauditory information to people.

ACKNOWLEDGMENTSMany thanks to our group of students who have im-plemented some of the modems: Ling Bao, Patriciode la Cuadra, Natacha Domingues, João Lacerda,Peter Hebden, and Jon Froehlich. Also thanks toDiana Smetters and the security group at PARC forinteresting discussions concerning security applica-tions. Part of this work was performed while Lopeswas a PARC research staff member.



In designing aerial acoustic modems for

ubicomp, we must accept a tradeoff between

data rate and aesthetics.

REFERENCES1. M. Weiser, “The Computer for the 21st

Century,” Scientific Am., vol. 265, no. 3,Sept. 1991, pp. 94–104.

2. J. Foerster et al., “Ultra-Wideband Technol-ogy for Short- or Medium-Range WirelessCommunications,” Intel Technology J., 2ndquarter 2001, http://intel.com/technology/itj/q22001/articles/art_4.htm.

3. R. Scholtz and M. Win, “Impulse Radio,”Proc. IEEE Int’l Symp. Personal, Indoor,and Mobile Radio Comm., IEEE Press,1997.

4. C. Lopes and P. Aguiar, “Aerial AcousticCommunications,” Proc. IEEE Int’l Work-shop Applications of Signal Processing inAudio and Acoustics, IEEE Press, 2001, pp.219–222.

5. D. Balfanz et al., “Talking to Strangers:Authentication in Ad Hoc Wireless Net-works,” Proc. Network and DistributedSystems Security Symp., Internet Soc., 2002;http://www.isoc.org/isoc/conferences/ndss/02/proceedings/papers/balfan.pdf.

6. G. Jones, Music Theory, Harper & Row,1974.

7. J. Moorer, “Signal Processing Aspects ofComputer Music—A Survey,” ComputerMusic J., vol. 1, no. 1, 1977, pp. 4–37.

8. N. Domingues et al., “Aerial Communica-tions Using Piano, Clarinet, and Bells,”Proc. IEEE Int’l Workshop Multimedia Signal Processing, IEEE Press, 2002, pp.460–463.

9. J. Chowning, “The Synthesis of ComplexAudio Spectra by Means of FrequencyModulation,” J. Audio Eng. Soc., vol. 21,no. 7, 1973, pp. 526–534.

10. J. McClellan, R. Schafer, and M. Yoder,DSPfirst: A Multimedia Approach, Pren-tice-Hall, 1999.

11. B. Sklar, Digital Comm., Prentice-Hall,1988.

For more information on this or any other comput-ing topic, please visit our Digital Library athttp://computer.org/publications/dlib.

the AUTHORS

Cristina Videira Lopes is anassistant professor in theSchool of Information andComputer Science, Univer-sity of California, Irvine. Sheconducts research in ubiqui-tous computing and in pro-gramming languages. She

has a PhD in computer science from Northeast-ern University. She is a member of the IEEE andthe ACM. Contact her at the School of Informa-tion and Computer Science, Univ. of Calif.,Irvine, CA 92697; [email protected].

Pedro M.Q. Aguiar is aresearcher in the Institute forSystems and Robotics, Lis-bon, Portugal, and an auxil-iary professor at the InstitutoSuperior Técnico (IST), Tech-nical University of Lisbon.His research interests are in

the statistical processing of multimedia signals.Aguiar has a PhD in electrical and computerengineering from IST. He is a member of theIEEE. Contact him at Instituto Superior Técnico/ ISR, Av. Rovisco Pais, 1049-001 Lisboa, Portu-gal; [email protected].

Don’t miss special issues on

Developing with Open Source Software■

The State of Software Engineering Practice

■

Return on Investment SE

ht tp : / /computer.org/subscribeTo subscribe, visit

http: / /computer.org/subscribeor contact our Customer Service Department:

+1 800 272 6657 (toll-free in the US and Canada)

+1 714 821 8380 (phone)

+1 714 821 4641 (fax)

Stay in the software game!Stay in the software game!

Documents

Acoustic Modems for Ubiquitous Computingomni.isr.ist.utl.pt › ~aguiar › 2003-PERVASIVE.pdf · enabling communication among small computing devices and humans in a ubiquitous computing