Remote voice Web browser for people with sight impairment

Computer Science

Computer Networks

Piotr Leszczyński

Book No. s4207

Remote voice Web browser for people with

sight impairment

Zdalna głosowa przeglądarka WWW dla osób

niewidomych

Engineering Thesis

Written under the advice of

Ph.D. Eng. Przemysław Skurowski

Bytom September 2009

Contents

1 Introduction............................................................................... 7

2 A brief review of speech synthesis ................................................ 9

2.1 Human speech synthesis ......................................................... 9

2.2 Text-To-Speech systems overview .......................................... 10

2.2.2 Concatenation Speech Systems ...................................... 11

2.2.3 Articulator Speech Systems ............................................ 11

2.2.4 History ........................................................................ 12

3 Application modeling and implementation .................................... 14

3.1 Application concept ............................................................... 14

3.2 Functional requirements ........................................................ 15

3.3 Non-Functional requirements ................................................. 16

3.4 Feasibility analysis ................................................................ 16

3.5 Technical limitations ............................................................. 17

3.5.1 Accessibility ................................................................. 17

3.5.2 Speech synthesis .......................................................... 18

3.5.3 Interpretation of Web pages ........................................... 18

4 Technology .............................................................................. 19

4.1 Java .................................................................................... 19

4.2 Java Web Start ..................................................................... 20

4.3 FreeTTS speech system ......................................................... 20

4.4 Netbeans framework ............................................................. 21

5 Design .................................................................................... 24

5.1 Communication .................................................................... 24

4

5.2 Client .................................................................................. 26

5.2.1 Modular design ............................................................. 26

5.2.2 Portability .................................................................... 26

5.2.3 Model-View-Controller architecture pattern ...................... 26

5.2.4 Class model.................................................................. 27

5.2.5 Sequence diagram ........................................................ 30

5.2.6 Dependency model ........................................................ 33

5.2.7 Interface ...................................................................... 33

5.2.8 Compatibility ................................................................ 35

5.3 Server ................................................................................. 35

5.3.1 Class model.................................................................. 35

5.3.2 Compatibility ................................................................ 37

5.4 Development challenges ........................................................ 37

6 Testing ................................................................................... 39

6.1 Live web testing ................................................................... 39

6.1.1 Method ........................................................................ 39

6.1.2 Results ........................................................................ 39

6.1.3 Feedback ..................................................................... 40

6.2 Synthetic testing .................................................................. 40

6.2.1 Method ........................................................................ 40

6.2.2 The results ................................................................... 42

7 Conclusions ............................................................................. 47

8 Summery in Polish .................................................................... 49

9 Bibliography ............................................................................ 51

A Installation and use .................................................................. 53

A.1 Installation ........................................................................ 53

Introduction 5

A.2 Use .................................................................................. 54

B Client compatibility list .............................................................. 56

C Server compatibility list ............................................................. 61

6

Introduction 7

1 Introduction

Human beings posses five senses, according to Aristotelian psychology

these senses are sight, hearing, smell, taste and touch [1]. We mainly use

only the hearing and the sight when interfacing with computers. There are

projects introducing smell into the equation but those have not gone

mainstream yet.

Though the main burden of communication lies on the sight, we use

hearing to argument multimedia, enhance system and application

communication. In most cases, it would be impossible to even enter the

operating system without the sense of sight not to mention doing anything

else. That is why the situation of people with sight impairment is so

difficult when it comes to interfacing with computers. It requires often a

very expensive set of software with the top of the line costing between

$500-$1300 [2].A combination of a screen reader whose task is to identify

and interpret the output of a computer screen, a task that’s being done by

a computer monitor and a human brain in case of people without sight

impairment, and a text-to-speech or Braille output device. Text-to-speech

is the preferred and the most natural method of representing interpreted

text.

I have decided to take on the problem of accessibility of sight impairment

enabled computers in public places like schools, public administration

offices, airports, libraries - in specific World Wide Web access. There are

major obstacles in adjusting computers to the use of sight impaired

persons. Starting with the high cost of buying the software which in a

places like schools, universities, libraries with hundreds of computers

could rise to astronomical levels not to mention not many schools can

afford it when they cannot even afford buying all the needed computers.

8

The next issue, the software needs to be installed, configured and

maintained which adds to the already high costs. Those were all the issues

I wanted to either deal with or alleviate.

The idea behind this project was to provide a Web based application which

would identify, interpret a WWW page and represent the output to the

user with text-to-speech technology. An application that would not need

an installation or configuration would be easy to run and handle by

persons with sight impairment and could be installed on any computer

plugged into the internet regardless of its architecture or operating

system. I would like to present you the end result of that idea in this

document and a working application on the attached disk. I hope you find

it interesting and useful.

A brief review of speech synthesis 9

2 A brief review of speech synthesis

In this chapter the basic information needed to understand how speech

synthesis works on human and mechanical levels are introduced.

2.1 Human speech synthesis

Modern researchers believe Humans possessed speech abilities as early as

300,000 years ago after the Neanderthals evolution [13] yet documented

Human speech synthesis was a subject of research for only a century now

and the biggest breakthroughs happened only in the last hundred years or

even in the last 20 years.

There are two major centers responsible for the process of human speech

creation: lungs and larynx, with vocal cords and glottis. Humans create

sound when the air pumped by the lungs moves over the vocal cords and

is made to vibrate. Changing sound into speech is a much more

complicated process though; it involves creation of phonation in the glottis

and modifying it into different vowels and consonants. Prepared speech is

then modified by a complex movement of lips, tongue and soft palate

which purpose is to filter out some of the frequencies and resonate some

of the others. [14]

10

Fig. 2.1: Human vocal tract 1

2.2 Text-To-Speech systems overview

The simplest description of a Text-To-Speech system would be an

application that can reproduce speech sequence from a supplied text.

There are generally three kinds of speech synthesis methods:

concatenation, articulator and formant. Despite the differences they follow

the model of human speech production.

Fig. 2.2: Human speech model 2

1 Human vocal tract, picture taken from the May-June 2008 issue of Duke Magazine.


2.2.2 Concatenation Speech Systems

It is the most popular and widely used method of speech synthesis. There

are two methods of concatenation; the first one concatenates single words

or parts of sentences from a database to create the speech. They are

called “Voice Response Systems” and their usability is limited to situations

where rich vocabulary is not needed and sentence structure is predefined

to a strict structure. For example telephone automated systems or train

stations arrival and departure announcement systems. That method

generally produces one of the best speech qualities at the cost of

versatility.

The second one concatenates diphones or phones, two smallest segments

of speech in English needed to pronounce text [3].This method allows

pronouncing almost anything in the vocabulary at the cost of a lower

quality output. Prime examples being Festival and FreeTTS open source

solutions and Ivona a commercial solution made in Poland , considered

the best voice quality synthesizer in the world [4][5].

2.2.3 Articulator Speech Systems

It is the most complex and natural sounding method of speech synthesis.

It tries to mimic the human vocal track as best as it’s possible for today’s

technology by creating computational model of every element in the vocal

track and its articulation processes. Speech is generated by simulating

airflow through the model. There are a few working articulatory speech

systems , gnuspeech an open source system and NeXT a commercial

2 Human speech model, picture taken from The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

http://www.dspguide.com/ch22/6.htm

12

speech system currently belonging to Apple, both having their sources in

system orginally developed by Trillium Sound Research.

Concatenative Speech

Synthesis

Formant Speech

Synthesis

Articulative

Speech

Synthesis

Complexity Simplest Moderate Complex

Quality Variation from poor to

good speech quality

with minor sound

artefacts

Natural and

good quality

speech

Very good

quality and most

Natural speech

Fig. 2.3: Comparison of speech synthesis methods

2.2.4 History

One of the first practical applications of speech synthesis was British

Telephone Company’s speaking clock, an equivalent to Polish Zegarynka.

It was concatenating words from an optical storage to form a "At the third

stroke, the time from BT will be (hour) (minute) and (second) seconds"

sequence. It still is even after 72 years with storage method upgrades and

four different voices over the years with Sara Mendes da Costa as its last

and permanent voice [5][7].

In 1939 Bell Laboratories developed a mechanical device operated by

movement of pedals and mechanical keys, Voder (Voice Operating

Demonstrator). It was considered to be the first true speech synthesizer.

It was based on Vocoder (Voice Coder), a device used for analyzing

speech in order to reconstruct the approximation of it. It required a very

skilled operator but the output could almost sound like a speech. Despite

its shortcomings, it was the first device that showed the potential of

artificial speech systems and opened the way for further developments. [7]


Fig. 2.4: Voder schematics 3

3 Voder schematics, picture takken from the “History and Development of Speech synthesis” article at

http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2.html

14

3 Application modeling and implementation

This chapter introduces the analysis and implementation phases of a

Remote Voice Web Browser application development. A Client-Server

application providing universal Web access for people with sight

impairment on any PC class computer plugged into the Internet regardless

of its architecture and operating system.

First the project requirements and their analysis to meet the needs of

people with sight impairment are presented. They are followed by a

technical limitations and technology summary used in the development of

the application.

3.1 Application concept

Fig. 3.1: Applications concept

1. A person with sight impairment asks a nearby bystander to open a

Web page on a public computer

2. Opens a Web page containing the application

Application modeling and implementation 15

3. A request to deploy the client application is made to the deployment

server

4. The client application is deployed

5. Application starts and guides a person with sight impairment about

its use

6. Application is instructed to open a Web page

7. A task is delegated to an application server

8. A request is made to a WWW server somewhere on the Internet for

the content of a Web page

9. Content of the requested Web page is returned to the application

server

10. Content of a Web page is parsed into text, normalized and

formatted for speech synthesis then send back to the client

application

11. Content of the requested Web page is further parsed then it is

outputted to the user as synthesized speech.

3.2 Functional requirements

The Voice Web Browser application has to fulfill the following functional

requirements:

Interpret any general Web pages

Create a voice navigation table for each Web page

Output interpreted Web pages through voice system

Provide a voice navigation system

Provide a voice feedback system

Manage focus and navigation of GUI elements as required by

the needs of people with sight impairment

Work as a Web based application

Work without installation and configuration

Be platform independent

16

Be easy to use and intuitive for people with sight impairment

3.3 Non-Functional requirements

The Voice Web Browser application has to fulfill the following non

functional requirements:

Provide a modular design based on Netbeans API

Support easy maintainability and modifiability

Client side application should provide high enough efficiency to meet

performance requirements of office type computers

Server side application should provide scalability for hundreds of

users on a typical home WWW server

Be based on Open Source libraries

Be distributed under Open Source license ( only on condition that

promoter and University authorities give their permit)

3.4 Feasibility analysis

Platform independency should be achieved by writing the client and

server applications in an interpreted platform independent language-

Java was chosen as the most suitable solution accompanied by Java

native libraries for the subsystems.

Interpretation of any general Web pages should be achieved with

the help of a Text Web browser – After a short review Lynx was

chosen because it is known to be a long time standard text browser

for website development.

All the sound systems and sound output of Web pages should be

achieved with the help of Java Speech API and a feasible speech

Application modeling and implementation 17

synthesizer – After analysis of the requirements FreeTTS speech

synthesizer API was chosen by meeting most of the requirements.

Modularity was should be achieved with the help of Netbeans API

and following design patterns like Model View Controller.

Web deployability should be achieved with a technology that

provides portability of Web based applications while retaining

capabilities of desktop applications – Java Web start proofed to be

the most suitable choice

3.5 Technical limitations

3.5.1 Accessibility

The major technical limitation was providing an application that could

work on any public computer without the need for installation and

configuration. The solution was a Web based application but each Web

technology has its limitations and finding a suitable one was another

challenge in itself.

Java applets were one solution but they have several limitations compared

to desktop applications. Applets cannot:

Read or write to the local file system

Cannot make connections except to the servers on which they

were deployed

Cannot access native libraries

Cannot Create processes on the local machine

[15]

The solution to the problem proofed to be Java Web Start. It possesses all

the benefits of Web based applications while retaining desktop

18

applications capabilities. More information about Java Web Start can be

found in the technology section of this thesis.

3.5.2 Speech synthesis

Design assumed using a free open source speech synthesizer, their

number is limited though and they had to meet specific requirements. In

the end most of the needs could not be met by open source synthesizers

so the requirements were toned down to providing average quality of

speech, being written in Java and compatibility with Java Web Start

technology. The solution was FreeTTS, an open source concatenation

based speech synthesizer.

3.5.3 Interpretation of Web pages

This was the most challenging barrier. While interpreting RSS feeds is

fairly easy, general Interpretation of Web pages for speech synthesis

could be a topic of a master thesis in itself. It is a complicated process

partially taken care by large applications like Web browsers. Another part

of the problem is general Web pages are written for graphical displaying of

data not for sound reproduction. The solution to this problem was very

limited and was provided by an old technology, text Web browsers from

20 or more years ago. I used Lynx a text based web browser developed in

1992 by a team of students at a French university to distribute campus

information.

Technology 19

4 Technology

This section describes all the technologies used in development of the Web

Browser application.

4.1 Java

“A high-level programming language developed by Sun Microsystems.

Java was originally called OAK, and was designed for handheld devices

and set-top boxes. Oak was unsuccessful so in 1995 Sun changed the

name to Java and modified the language to take advantage of the

burgeoning World Wide Web.

Java is an object-oriented language similar to C++, but simplified to

eliminate language features that cause common programming errors. Java

source code files (files with a .java extension) are compiled into a format

called bytecode (files with a .class extension), which can then be executed

by a Java interpreter. Compiled Java code can run on most computers

because Java interpreters and runtime environments, known as Java

Virtual Machines (VMs), exist for most operating systems, including UNIX,

the Macintosh OS, and Windows. Bytecode can also be converted directly

into machine language instructions by a just-in-time compiler (JIT).

Java is a general purpose programming language with a number of

features that make the language well suited for use on the World Wide

Web. Small Java applications are called Java applets and can be

downloaded from a Web server and run on your computer by a Java-

compatible Web browser, such as Netscape Navigator or Microsoft

Internet Explorer.” [8]

20

With today’s progress of Java development it is suited for both desktop,

corporate, server applications and even games as well. On the list of Java-

compatible Web browser there are also Mozilla Firefox, Apple Safari,

Google Chrome and many embedded browsers of Linux graphical desktop

environments.

4.2 Java Web Start

It is a technology developed for deploying Java applications over the

Internet based on the Java Network Launching Protocol API (JNLP). JNLP

provides a browser-independent architecture for deploying applications.

Programmer is only required to write an XML file with .jnlp extension

describing all the needed jars and their locations, everything else is done

on the client machine by Web Start application which is installed with

every modern Java distribution. It’s a very powerful system providing Web

based applications the capabilities of Java desktop applications while

retaining the portability of Web based applications. Java Web Start can

also update Java Runtime Environments on the client machine if it’s

required by the deployed application thus guaranteeing proper working of

the application on all client machines. [9]

4.3 FreeTTS speech system

FreeTTS is a speech synthesis system based on Flite, synthesis engine

developed at Carnegie Mellon University, and written in Java. Flite is

derived from the Festival Speech Synthesis System from the University of

Edinburgh and the Carnegie Mellon UniversityUniversity of Edinburgh and

the FestVox project from Carnegie Mellon University. [16]

While it does not support JSML or any other markup language,

implementation with Netbeans API requires a rewrite of source code,

quality of speech is average and implementation of Polish voice while

Technology 21

using Java Web Start proofed impossible, it still is the best free open

source speech synthesizer written in Java with the best voice quality.

4.4 Netbeans framework

Netbeans is a generic framework for Swing applications that provides

flexible and reliable application architecture. A framework that saves time

by reliving a person from writing all the boilerplate code for tabbed views,

menus, explorer and pallet types of views, saving state, connecting

actions to menu items, toolbar items, keyboard shortcuts, a window

management. It also encourages the use of good design pattern solutions,

for example the Lookup API, Dependency system and a lot of elements

using MVC model. [10]

Netbeans API provides many out of the box components that make the

development much quicker and easier, can be reused during development

and a lot of other benefits for example the module system providing easy

modifiability and maintainability. Some of the benefits of the Netbeans API

are:

Modular Runtime Container

Netbeans runtime container provides lifecycle services to Swing

applications allowing for composing a set of modules into a single Swing

application. This modularity allows developers to organize the application

code into separated versioned modules. Only the modules that have

explicitly declared dependencies are able to use code from other exposed

packages. This model helps greatly when developing or maintaining large

applications developed by teams of engineers. There are benefits for the

end users of the application as well, they are able to install modules into

the running application because modules are pluggable. Summarizing, the

22

NetBeans runtime container provides an environment for modules that

handles their lifecycle and enables them to interact with each other. [17]

Loose Coupling & Context Sensitivity Management

NetBeans provides an equivalent to JDK6 ServiceLoader class an

implementation of Service Locator design pattern. The Lookup API, with

the same functionality but being more suited for Netbeans platform by

providing dependency injection among the other benefits. It enables

modules to communicate with each other in a type-safe uncoupled way, it

allows the use of objects defined in one module in another without the

need to depend on each other. [17]

System FileSystem

The NetBeans filesystem offers the ability to install folders and files into

the application filesystem for example settings which enables them to be

read by all the modules in the application. [17]

Window System

The NetBeans Window System API provides a multiply window GUI with

tabs and modes that can be maximize/minimize, dock/undock, and drag-

and-drop out of the box. It also takes care of interactions between all the

windows in the system[17]

Data Management

The NetBeans Nodes API provides a generic model for Swing components

like jLists, jTables which can be used in every component without the

need of rewriting the model. Nodes can be also used to display data in

Technology 23

several Swing components provided Netbeans Explorer & Properties Sheet

API. [17]

24

5 Design

The application consists of two parts, a server and a client. The server

application retrieves, interprets, normalizes and formats Web pages for

speech synthesis and sends them to the client applications. The client

applications task is to change interpreted data into speech and provide it

through a voice interface consisting of voice navigation and voice feedback

systems to the user of the application.

5.1 Communication

The communication between the server and the client applications is

based on Http (Hypertext Transfer Protocol) with MIME type application/x-

java-serialized-object which is used for transferring serialized java

objects. Http is a stateless protocol it does not need to store session data

about users. The server is designed in a stateless architecture as well, all

the recourses are committed and destroyed after the transmission.

Stateless server in addition to the stateless protocol results in a lower

resources usage and higher efficiency by the server.

Fig. 5.1: Communication diagram

5.2 Client

5.2.1 Modular design

The client part of the application is designed based on a modular system.

This design allows easy maintainability and modifiability and prevents

creation of spaghetti effect. Existing modules can be removed during the

runtime and new modules can be installed without stopping the

application. GUI modules of the application are uncoupled from the Model

modules and logic modules. Any module can be removed and the

application will still work just without the capability that was provided by

the removed module. By using Lookup you can even remove a module

and have another module automatically taking care of the removed

module tasks without any changes to the code. Those are just the main

benefits of the modular design.

5.2.2 Portability

The application is fully written in Java and all the external libraries are

native Java. There are no dependencies on system resources. The

application should run on any system and hardware architecture

supporting Java. The only limitation could be the ability to reproduce two

sound streams at the same time for completely correct working of the

application for example some Linux distributions. The application was

tested on Windows XP, Windows Vista, Windows 7 by myself and Mac OS

X 10.5.8 by Fabrizio Giudici and proofed to be working flawlessly.

5.2.3 Model-View-Controller architecture pattern

Model-View-Controller is an architectural pattern designed to uncouple the

model functionality from the presentation and control logic of the

application. It allows different presentation layers to share the same data

Design 27

model otherwise the model would need to be written twice or more times

for every presentation layer increasing the work of engineers. Using the

pattern not only safes time but allows for easy implementation and

maintainability of the application. [18]

Fig. 5.2: Model-View-Controller overview diagram4

5.2.4 Class model

This section contains UML Class model of the client part of the application.

The model is showing packages, classes and the relations between them

and is used as an overview of the application design.

4 Model-View-Controller overview diagram, picture taken from the Sun Java Blueprints http://java.sun.com/blueprints/patterns/MVC-

detailed.html

Fig. 5.3: Class diagram of the client application

Model and nodemodel are the Model part of the Model-View-Controller

architectural pattern. Their purpose is to shape and encapsulate the data

on which the application operates.

Model module encapsulates Web data received from the server

part of the application

NodeModel encapsulates and provides MyNode objects which

inherit from AbstractNode, a part of Netbeans API responsible for

presentation layer.

Communication, Controller and VoiceSynthesizer modules are the

Controller part of MVC architectural pattern. Controller is responsible for

the business logic of the application.

Communication module is responsible for the data transfer between

the server and the client parts of the application. Data is transferred

as serialized objects both ways using the URL class. The address of

a Web page to interpret is sent to the server on a user request and

server sends back the interpreted and formatted Web page. Both

the server and client close their sessions, in addition the server

commits all the resources following a stateless architecture.

Controller does the business logic part of application and is an

intermediate layer between the GUI and Model. Some things that

are done by Controller are for example informing modules of new

tasks, controlling the communication, voice synthesis.

VoiceSynthesizer module is a Controller part of MVC pattern. It is

responsible for interfacing with FreeTTS and Java Speech API’s,

controlling the synthesizer parameters and speech flow, factoring

synthesizer objects

30

GUIOptions, GUIBrowser are GUIExplorer are the View parts of the MVC

architectural pattern. Their task is to provide a GUI for the user which

used to display and input infromation. Considering users have sight

impairment GUI’s are only used for inputting data and providing a

template and event management for the sound navigation system.

GUIBrowser is the main interface window. It allows the user to type

the Web address, inner Web number and to pause and resume the

speech queue. It also listens for mouse wheel to adjust the sound

volume.

GUIOptions is an option panel for the application, it shows the

current Web server address and allows changing it, in the future it

will contain more settings as the application grows. It stores settings

in the user home directory

GuiExplorer module is responsible for the presentation of Web data

under the form of Nodes. It is mainly used for development

purposes together with the property sheet like adjusting Web data

formatting but in the future it will be reused for a Web page

templates creator used to increase the quality of interpreted Web

pages.

5.2.5 Sequence diagram

Scenario 1 Use Case

The user opens a Web page Fig. 5.4

Scenario 2 Use Case

The user listens to speech synthesis, changes the speaking rate and

volume, pause speech synthesis then resumes it but before the Web

page finishes cancels the speech Fig. 5.5

Fig. 5.4: Sequence diagram presenting first scenario Use Case

Fig. 5.5: Sequence diagram presenting second scenario Use Case

5.2.6 Dependency model

Dependency model shows dependencies between modules in the

application. The application is divided into eight modules and into three

layers. There are no mutual dependencies which in addition to modular

system mean if one module is removed it will only affect one place in the

application instead of the whole system like In the common “spaghetti”

architecture applications.

Fig. 5.6: Client applications dependency model and layer separation

5.2.7 Interface

Interface is designed with people with sight impairment in mind. It

consists of two text fields, three buttons , 4 sliders, a text area and 2

hidden panels for development purposes.

High contrast text area showing

Web pages with large font

Text field used for inputing Web address

Pause and resume buttons used for controlling speech synthesis

Text field used for inner Web page navigatgion

Voice synthesizer settings

Options button

Web address field – used for inputting Web address of a desired

Web page

Inner Web page field – used for navigation inside a Web page

Pause and resume buttons – used for controlling speech synthesis

Speech synthesizer settings – used for controlling main and

interface synthesizers settings like volume or speaking speed

Options button – used for entering options panel of the application

5.2.8 Compatibility

Application is compatible with all the operating systems and hardware

supported by Java Se 6.0+ and Java Web Start. See Annex B for the

current compatibility list from Sun resources.

5.3 Server

5.3.1 Class model

This section contains UML Class model of the server part of the

application. The model is showing packages, classes and their

relationships. The server application is not designed as a modular

application like the client part. It consists of a servlet, a view part of the

MVC model, responsible for communication with the client application and

a controller class for business logic like opening Lynx process, normalizing

and formatting Web data.

36

Fig. 5.7: Server applications UML model

Design 37

5.3.2 Compatibility

Server side compatibility consists of two things:

Lynx compatibility

Supported by most Linux distributions and Windows versions

through Cygwin

List of Servlet compatible WWW servers, see Annex C

5.4 Development challenges

Focus management on Netbeans platform

Focus management is very important for applications designed for

people with sight impairment in mind. You need the application to

work exactly as you planned it to work and focus is a key player

here. The application has to start with the right window active in the

right tab with the predefined component focused in every

circumstance. The transfer of focus has to happen according to a

planned traversal policy. I used the Netbeans API in the

development for its many advantages but focus management was

not one of them. It was a very hard task to complete. There is not

much documentation on this and most of them just point to SWING

focus management. The solution to this problem took a lot of time to

complete but thought me a lot about SWING applications design and

architecture.

38

Implementation of FreeTTS on Netbeans platform Web Start

application

FreeTTS library creates its own classloader that tries to find all the

needed jars. It is not a job of library to provide its own custom

classloader, it is a bad design but it is even worse when designing

application on netbeans API. It has strict dependency policy

enforcement and a wrapper libraries system. Basically FreeTTS

would not work without source code rewrite. This issue took weeks

to resolve including many attempts to change the default class

loader in freetts.jar, all ending in disaster duo to lack of experience

with class loaders. I have finally resolved the issue by weeks of

tinkering with Netbeans.

Testing 39

6 Testing

6.1 Live web testing

6.1.1 Method

In order to test performance, find bugs in the client and the server

application and get professional feedback I have created a suitable native

Web environment by inviting members of the Netbeans mailing group

[email protected] into the testing phase. It is a group

created for computer science faculty students that have taken a 16 hour

training course in Netbeans API designed for exchanging new application

ideas and sharing knowledge about the API. I have taken the curse in May

which was organized by Silesian JUG.

GlassFish, an open source application server, was set for the members of

the group. They were asked to try the application over the internet and

provide feedback while I was monitoring the server. Over 30 different IP’s

have connected from all over the world including a polish student whose

master thesis was a blind and sight impairment aid system.

6.1.2 Results

The application server crashed 10 minutes after starting tests duo to too

many open local processes. After fixing the problem, by committing all the

streams opened by processes and destroying the said processes after

each transaction, the server continued to work flawlessly without any

crashes for five straight days while continued observations of its

parameters and CPU / memory usage were done.

40

6.1.3 Feedback

“I'm sitting with Toni in Geneva right before the next training day and

we've just been listening to my blog in your application! Great! However,

we thought that the app should be more visually pleasing... but then we

realized the target audience is blind. :-) So, absence of progress bar

integration isn't a problem (except, maybe the progress bar could be

integrated anyway and then elevator should be played during progress of

accessing the requested site).”5

“Started correctly on Mac OS X 10.5.8 and it's speaking right now

Pretty cool. It's the first Java talking application that I try.”6

6.2 Synthetic testing

6.2.1 Method

Profiler tool was used for the performance measurements. It is used to

monitor important information about the runtime behavior of applications,

such as CPU performance, memory usage, thread states, while imposing

low overhead. Below are the parameters for the performance tests.

Hardware specification

CPU: Intel T4200 @ 2.0 GHZ

Memory: 3,00 GB

System: Microsoft Vista

Software specification

5 Geertjan Wielenga – a technical writer and trainer for Netbeans

6 Fabrizio Giudici - Java Architect, Project Manager

Testing 41

Operating System: Windows Vista Home Basic

Java Development Kit: 1.6.0 update 14

Netbeans Platform: 6.7.1

Profiler Calibration results

Approximate time in one methodEntrey()/methodExit() call pair:

When getting absolute timestamp only: 2,7497 microseconds

When getting thread CPU timestamp only: 1,1512 microseconds

When getting both timestamps:: 3,682 microseconds

Approximate time in one methodEntrey()/methodExit() call pair in sampled

instrumentation mode: 0,2211 microseconds

Profiler performance test settings

Scope: Entire application

Filter: Profile project and subprojects

Method tracking: Exact call tree and timing

Exclude time spent in Thread.sleep() and Object.wait()

Limit number of profiled threads: 32

Instrumentation scheme: Total

Instrument: Method.invoke()

Testing scenario

Open a predefined list of websites and execute them while doing minor GUI operations

like pausing / resuming and changing volume.

http://www.theinquirer.net/ and its sub pages

http://www.bbc.co.uk/ and its sub pages

http://mmorpg.com/ and its sub pages

http://www.gazeta.pl/ and its sub pages

42

6.2.2 The results

Tests took an hour and a half and their purpose was to show the behavior

of application during normal use. During the tests several important

factors were taken into the consideration following a model of application

profiling and two testing patterns. First pattern assumed reading the full

content of a Web page before opening new one, the second one took only

3-4 minutes and assumed opening new Web pages in short concussive

intervals to simulate browsing part of the test. Below you will find the test

results with their analysis.

6.2.2.1 Surviving generations / Garbage Collector CPU time

Surviving generations shows how many garbage collections, objects

allocated on the JVM heap space survived since the start of the

application. Generally the number of surviving generations rises during

the application startup but it stabilizes after the application is done loading

when all the temporary objects are destroyed. If the number of surviving

generations continues to rise instead of stabilizing it might mean that

objects that are put on the heap are not removed from it by the Java

garbage collector, which is called a leak and is an unwanted feature that

may lead to inefficient resources management, stability problems or even

out of memory crashes.

Fig. 6.1: Surviving generations and Garbage Collector diagram showing the first 90 seconds of

application execution

Testing 43

During the first two minutes of application execution graph shows a rise in

surviving generations as expected from a launching application. The

results then begin to stabilize not showing any signs of memory leaks.

Fig. 6.2: Surviving generations and Garbage Collector diagram showing the full 90 minutes of testing

cycle

During the next hour and half graph shows a steady number of surviving

Generations. It is not until the application was executing for an hour and

ten minutes that a sharp rise was noted. The rise has soon has stabilized

and was not a sign of a memory leak. It happened due to heavy load

created by a change in testing pattern, which was a concussive opening of

Web pages in short intervals.

6.2.2.2 Total heap size / Used heap size

Heap is an area in the memory used for storage, for example objects,

during the runtime.

Fig. 6.3: Total/Unused heap size chart

44

The graph shows two things. The heap size has enough margin space

allowing for an effective work of garbage collection and heap size is

consistent, staying at a fairly even level between 50 and 60 megabytes.

6.2.2.3 Threads / Loaded classes

Fig. 6.4: Number of Threads and loaded classes chart

The graph shows the number of active threads and loaded classes. The

data is consistent and does not show anything out of order.

6.2.2.4 CPU performance analyze

This graph shows how much relative time of the CPU each method in the

application has taken during the one hour and thirty minutes.

Testing 45

Fig. A.1: Details of two speech synthesizer threads that have used the most CPU time during testing and the only module thread that have used significant CPU time, the communication method sendAndReciveData

Fig. 6.5: Top 35 packages that have used the most CPU time

46

The Results show that almost 80%+ of the CPU relative time was taken by

the FreeTTS and Java Sound API while only 5% was taken by the network

communication. This shows that the application introduces very small

overhead and considering the fact that the application has no memory

leaks it shows that the architecture is very performance efficient.

7 Conclusions

The Project was designed to provide Web access for people with sight

impairment in limited environments like public places. Even with the

numerous technical limitations and a few development problems the

project can be considered successful. It has fulfilled all the requirements

put ahead of it with one minor exception, to run the application there is a

need for a person with a healthy sight to type the Web address of an

application in a Web browser but it is simple, takes seconds to do and can

be done by anybody around which is not a problem in public places. I am

working on a solution for it but considering public computers have limited

access combined with already limited access of Java Web Start to the local

resources, it is a very difficult task and not high on my priority list.

While the project was successful there are a few things that can be

improved. I want to continue the project as my master thesis or at least

release it under open source license to improve the application so it can

one day become a true solution to many persons with sight impairment

problems.

The voice synthesizer supports only English and is of average quality. Not

only that but it does not support JSML thus limiting its use. Without JSML

support you cannot control the way synthesizer speaks or even introduce

small pauses since it is not supported and Thread operations on it are

unsafe and unpredictable. The solution to it is using Ivona synthesizer, it

provides one of the best commercial voices on the market but SDK alone

costs around 200+ Euros and it will require support from the University or

organization helping people with sight impairment.

48

General Web parsing could be improved too. The issue is that Web pages

are read from up to bottom and it is not done in the way a healthy person

would read a Web page. Another problem is many contents of the main

page are replicated on the sub pages like for example menus and other

elements thus limiting the experience and wasting time of people using

the application. The solution to these problems would be writing a

template creator for Web pages to suit them further for the needs of the

application. The editor would allow volunteers or students to write

templates for the most popular pages. I have started doing it in Netbeans

API with the use of Visual Library and even finished part of it but the

development was put on hold after realizing FreeTTS does not support

JSML and lacks any sort of control mechanism for the output. I would

definitely like to restart development with the use of Ivona.

Summery in Polish 49

8 Summery in Polish

Istoty ludzkie posiadają pięć zmysłów, zgodnie z psychologią Arystotelesa

tymi zmysłami są wzrok, słuch, węch, smak i dotyk [1]. Do interakcji z

komputerami używa się głównie zmysłu wzroku i słuchu. Są projekty

wprowadzające generowanie zapachu dla komputerów ale projekty te nie

dojrzały jeszcze do komercyjnej implementacji.

Główny ciężar komunikacji między człowiekiem a komputerem spoczywa

na zmyśle wzroku natomiast zmysł słuchu jest przeważnie używany tylko

do argumentacji tej komunikacji. W większości przypadków niemożliwym

byłoby nawet uruchomienie systemu operacyjnego bez zmysłu wzroku.

Dlatego sytuacja osób z zaburzeniami wzroku jest tak trudne w kwesti

interakcji z komputerami. Wymaga ona często bardzo drogich zestawów

oprogramowania, których to ceny wachają się w przedziale od 500 do

1300 dolarów za pakiety z wyższyej półki [2]. Oprogramowanie to składa

się na kombinacje czytnika ekranu (z ang. screen reader) i urządzenia

wyjściowego. Czytnik ekranu wykonuje interpretacje i analizę , zadania

które to normalnie są wykonywane przez monitor komputerowy i ludzki

mózg. Urządzeniem wyjściowym jest syntezator mowy lub urzadzenie

wyjściowe Braille’a.

Zdecydowałem się podjąć problemu powszechnej dostępności komputerów

przystosowanych do pracy z osobami niewidomym pod kontem dostępu do

Internetu w miejscach publicznych takich jak szkoły, urzędy administracji

publicznej, lotniska, biblioteki. Występuje wiele poważnych przeszków w

dostosowaniu komputerów w miejscach publicznych do potrzeb osób

niewidomych. Pierwszą z nich jest wysoki koszt zakupu oprogramowania,

które w miejscach takich jak szkoły, uniwersytety, biblioteki z setką

50

komputerów mogą przekraczać nawet koszt same infrastruktury IT, nie

wspominając już o tym że wiele szkół nie może sobie na to pozwolić, jeżeli

nie są wstanie zakupić wszystkich potrzebnych komputerów, przynajmniej

w krajach takich ak Polska. Następną kwestią jest, że oprogramowanie

musi być zainstalowane, skonfigurowane i konserwowane, to wszystko

podnosi koszty, które już i tak są wysokie. Były by to wszystkie kwestie,

którymi chciałem się zająć podczas realizacji tego projektu.

Ideą tego projektu było dostarczenie aplikacji Web’owej, która

zinterpretuje dowolną stronę WWW bez względu na jej format i przekaże

jej treść pod postacia syntetycznej mowy. Aplikacja, która nie

potrzebowała by instalacji, konfiguracji, była łatwa w uruchamianiu i

obsłudze przez osoby niewidome oraz mogła być uruchominona na

dowolonym komputerze podłączonym do internetu bez względu na system

operacyjny czy też architekture.

Bibliography 51

9 Bibliography

1. Kaufmann Kohler, Isaac Broyde. Senses, The Five. Jewish

Encyclopedia. [Online]

http:\\www.jewishencylopedia.com/view.jsp?artid=479&letter=S.

2. Screen Readers. Enable Mart. [Online]

http://www.enablemart.com/Catalog/Screen-Readers.

3. George H.Shames, Elisabeth H. Wiig. Human Communication

Disorders. s.l. : Bell & Howel Company, 1982. 0-675-09837-8.

4. Christina L. Bennett, Alan W Black. The Blizzard Challenge 2006.

festvox. [Online]

http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf.

5. Robert A. J. Clark, Monika Podsiadło, Mark Fraser, Catherine

Mayo, Simon King. Statistical analysis of the Blizzard Challenge 2007

listening test results. festvox. [Online]

http://www.festvox.org/blizzard/bc2007/blizzard_2007/full_papers/blz3_0

03.pdf.

6. Speaking Clock. Telephones Uk. [Online]

http://www.telephonesuk.co.uk/speaking_clock.htm.

7. Lemmetty, Sami. Review of Speech Synthesis Technology. Helsinki

University of Technology. [Online]

http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2

.html.

8. WebMediaBrands Inc. Java. Webopedia. [Online]

http://www.webopedia.com/TERM/J/Java.html.

9. Sun Microsystems, Inc. Java Platform, Standard Edition (Java SE) -

Java Web Start Overview. Developer Resources for Java Technology.

[Online]

52

http://java.sun.com/javase/technologies/desktop/javawebstart/overview.

html.

10. Sun Microsystems, Inc. Netbeans platfrom. Netbeans. [Online]

http://bits.netbeans.org/dev/javadoc/index.html.

11. Sun Microsystems, Inc. . JavaTM SE 6 Release Notes - Supported

System Configurations. Developer Resources for Java Technology.

[Online] http://java.sun.com/javase/6/webnotes/install/system-

configurations.html.

12. Hunter, Jason. Standalone Servlet Engines. Servlets. [Online]

http://www.servlets.com/engines/.

13. Science Blog. Earlier Human Speech? Science Blog. [Online]

http://www.scienceblog.com/community/older/1998/B/199801121.html.

14. Vorländer, Michael. Auralization. Fundamentals of Acoustics,

Modelling, Simulation, Algorithms and Acoustic Virtual Reality. s.l. :

Springer, 2008.

15. Michael Girdley, Kathryn A. Jone. Web Programming with Java

16. FreeTTS 1.2 - A speech synthesizer written entirely in the JavaTM

programming language. SourceForge. [Online]

http://freetts.sourceforge.net/docs/index.php#what_is_freetts.

17. Sun Microsystems. Netbeans platform. Netbeans. [Online]

http://platform.netbeans.org/description.html.

Installation and use 53

A Installation and use

A.1 Installation

Client application does not require installation. To open the application a

person without sight impairment is required but it is only 2 easy steps that

anyone can do and takes roughly 10-20 seconds. Following steps have to

be taken:

Open a Web browser and type the applications WWW address in the

address field. For example www.webbrowser.org/webbrowser.jnlp

Wait for the application to finish up loading, for the first time it can take

up to 2 minutes on 1mbit connection to load

Accept the security pop up, you can disable it for every concussive use by

ticking “always trust content from this publisher”

54

A.2 Use

After the application is opened persons with sight impairment can take

over. The use of application is very easy and users are guided by a voice

navigation and feedback system.

Mouse wheel is used for increasing or decrease the sound volume.

Tab key is used to change between:

Web address field - used for inputting the address of a Web

page. Every letter input on the keyboard will be echoed back

through the voice feedback system and hitting enter will

commit the typed address. The user will be informed if the

address is correct or the WWW server is offline.

Link number field - used for inner Web page navigation. User

can access any inner parts of the Web page by typing their

Client compatibility list 55

corresponding number. Those numbers will be spoken during

the main Web page speech output. It will be automatically

enabled and focused if the Web page was successfully

retrieved.

Pause button – used to pause the Web paged speech output.

Activated by pressing enter

Resume button – used to resume the Web page speech

output. Activated by pressing enter

56

B Client compatibility list

Platform

Operating

System

Version

Desktop

Managers Browsers JRE JDK

SolarisTM Operating System, 32-bit and 64-bit

Solaris

Sparc

(32) Solaris 10

JDS-2

(Gnome-

Metacity),

CDE-dtwm

Mozilla 1.4x,

1.7+

32-bit

Install

32-bit

Install Solaris 9

Gnome-

Metacity

2.4.34 or later

CDE-dtwm

Solaris 8 CDE-dtwm,

Openwin-olwm

Solaris

x86

(32)

Solaris 10 Gnome-

Metacity, CDE

Mozilla 1.4x,

1.7+

32-bit

install

32-bit

Install

Solaris 9

Gnome-

Metacity, CDE

Solaris 8 CDE, Openwin

OpenSolaris GNOME 2.24.0 Firefox 3

Windows 32-bit

Windows

Intel IA32

Windows XP

Professional

Windows/Active

for Windows

IE 6 SP1+,

IE 7, IE 8

32-bit

Install

32-bit


Windows XP

Home

Mozilla 1.4.X

or 1.7+,

Netscape

7.X, Firefox

1.06 - 3

Disk

space

Install

Disk

space

Windows

Server 2003

Windows

2000

Professional

IE 6 SP1+,

Mozilla 1.4.X

or 1.7+,

Netscape

7.X, Firefox

1.06 - 3

Windows

2000 Server

Windows

Vista

IE 7 or IE 8

Windows

Server 2008

Windows 64-bit

Windows

x64

32-bit mode

Windows XP

Windows/Active

for Windows

IE 6 SP1+, IE

7, IE 8

Mozilla 1.4.X

or 1.7+,

Netscape 7.X,

Firefox 1.06 -

3

32-bit

Install

Disk

space

32-bit

Install

Disk

space

Windows

Server 2003

IE 6 SP1+, IE

7, IE 8

Mozilla 1.4.X

or 1.7+,

Netscape 7.X,

Firefox 1.06 -

3

58

Windows

Vista IE 7 or IE 8

Windows

Server 2008

Windows

x64

64-bit mode

Windows XP

Windows/Active

for Windows

64bit OS,

32bit

Browsers:

IE 6 SP1+, IE

7, IE 8

Mozilla 1.4.X

or 1.7+,

Netscape 7.X,

Firefox 1.06 –

3+

64-bit

Install

32-bit

Install

Disk

space

64-bit

Install

32-bit

Install

Disk

space

Windows

Server 2003

Windows

Vista

64bit mode,

64bit

Browsers:

IE 7 or IE 8

Windows

Server 2008

Linux 32-bit

Linux

IA32 Red Hat 2.1,

Red Hat

Enterprise

Linux 3.0,

4.0, 5.0 -

5.2

Gnome1.4-

sawfish 1.0 or

later

Gnome 2.2 -

metacity 2.4 or

later

Mozilla 1.4.x

or 1.7+,

Firefox 1.06

- 3

32-bit

Install

32-bit

Install

Suse

Enterprise

Linux Server

8, Suse

Enterprise

Linux Server

9, Suse

Enterprise

Gnome2.0.5-

Metacity 2.6.2

or later

(default: 2.4)


Linux Server

10, Suse

Enterprise

Linux

Desktop

Turbo Linux

10 (ONLY

Chinese and

Japanese

Locale. No

english.)

Gnome-sawfish

1.0 or later

Linux 64-bit

Linux x64

32-bit

mode

Suse

Enterprise

Linux Server

8, Suse

Enterprise

Linux Server

9, Suse

Enterprise

Linux Server

10, Suse

Enterprise

Linux

Desktop

Gnome2.0.5-

Metacity 2.6.2

or later

(default: 2.4)

Mozilla 1.4.x

or 1.7+,

Firefox 1.06

- 3

32-bit

Install

32-bit

Install

Red Hat Gnome2.0.5-

60

Enterprise

Linux 3.0,

4.0, 5.0 -

5.2

Metacity 2.6.2

or later

(default: 2.4)

Turbo Linux

10 (ONLY

Chinese and

Japanese

Locale. No

english.)

Gnome-sawfish

1.0 or later

Linux x64

64-bit

mode

Suse

Enterprise

Linux Server

8, Suse

Enterprise

Linux Server

9, Suse

Enterprise

Linux Server

10, Suse

Enterprise

Linux

Desktop

Gnome2.0.5-

Metacity 2.6.2

or later

(default: 2.4)

64bit OS,

32bit

Browsers:

Mozilla 1.4.x

or 1.7+,

Firefox 1.06

- 3

64bit mode,

64bit

Browsers:

64-bit

Install

32-bit

Install

64-bit

Install

32-bit

Install

Red Hat

Enterprise

Linux 3.0,

4.0, 5.0

Gnome 2.2 -

metacity 2.4 or

later

Fig. A.2: Server application compatibility list [16]

Server compatibility list 61

C Server compatibility list

Tomcat server

IBM's WebSphere Application Server

BEA Weblogic Application Server

Caucho's Resin Server

Adobe's JRun Web Server

Orion Application Server

Oracle Application Server

ATG Dynamo Application Server

Pramati J2EE Server

Borland AppServer

Jetty Server

The World Wide Web Consortium's Jigsaw Server

Zeus Web Server

iPlanet (Netscape) Web Server Enterprise Edition

iPlanet (Netscape) Web Server Enterprise Edition for Linux

Netscape Enterprise Server 3.5.1 and 3.6

GemStone/J Application Server

Gefion Software's LiteWebServer

CtO-Jstar

M5 Web Server

Servertec's iServer

Lotus's Domino Go WebServer

Paperclips Java Servlet Server 2.0

jo! Web Server

KonaSoft Enterprise Server

NGASI (Next Generation Application Server)

Avenida Web Server

62

vqServer

Serfler

WebEasy WEASEL Application Server

Tandem's iTP WebServer

Novocode's NetForge

Enhydra [12]

Documents

Remote voice Web browser for people with sight impairment