Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Computer Science
Computer Networks
Piotr Leszczyński
Book No. s4207
Remote voice Web browser for people with
sight impairment
Zdalna głosowa przeglądarka WWW dla osób
niewidomych
Engineering Thesis
Written under the advice of
Ph.D. Eng. Przemysław Skurowski
Bytom September 2009
Contents
1 Introduction............................................................................... 7
2 A brief review of speech synthesis ................................................ 9
2.1 Human speech synthesis ......................................................... 9
2.2 Text-To-Speech systems overview .......................................... 10
2.2.2 Concatenation Speech Systems ...................................... 11
2.2.3 Articulator Speech Systems ............................................ 11
2.2.4 History ........................................................................ 12
3 Application modeling and implementation .................................... 14
3.1 Application concept ............................................................... 14
3.2 Functional requirements ........................................................ 15
3.3 Non-Functional requirements ................................................. 16
3.4 Feasibility analysis ................................................................ 16
3.5 Technical limitations ............................................................. 17
3.5.1 Accessibility ................................................................. 17
3.5.2 Speech synthesis .......................................................... 18
3.5.3 Interpretation of Web pages ........................................... 18
4 Technology .............................................................................. 19
4.1 Java .................................................................................... 19
4.2 Java Web Start ..................................................................... 20
4.3 FreeTTS speech system ......................................................... 20
4.4 Netbeans framework ............................................................. 21
5 Design .................................................................................... 24
5.1 Communication .................................................................... 24
4
5.2 Client .................................................................................. 26
5.2.1 Modular design ............................................................. 26
5.2.2 Portability .................................................................... 26
5.2.3 Model-View-Controller architecture pattern ...................... 26
5.2.4 Class model.................................................................. 27
5.2.5 Sequence diagram ........................................................ 30
5.2.6 Dependency model ........................................................ 33
5.2.7 Interface ...................................................................... 33
5.2.8 Compatibility ................................................................ 35
5.3 Server ................................................................................. 35
5.3.1 Class model.................................................................. 35
5.3.2 Compatibility ................................................................ 37
5.4 Development challenges ........................................................ 37
6 Testing ................................................................................... 39
6.1 Live web testing ................................................................... 39
6.1.1 Method ........................................................................ 39
6.1.2 Results ........................................................................ 39
6.1.3 Feedback ..................................................................... 40
6.2 Synthetic testing .................................................................. 40
6.2.1 Method ........................................................................ 40
6.2.2 The results ................................................................... 42
7 Conclusions ............................................................................. 47
8 Summery in Polish .................................................................... 49
9 Bibliography ............................................................................ 51
A Installation and use .................................................................. 53
A.1 Installation ........................................................................ 53
Introduction 5
A.2 Use .................................................................................. 54
B Client compatibility list .............................................................. 56
C Server compatibility list ............................................................. 61
6
Introduction 7
1 Introduction
Human beings posses five senses, according to Aristotelian psychology
these senses are sight, hearing, smell, taste and touch [1]. We mainly use
only the hearing and the sight when interfacing with computers. There are
projects introducing smell into the equation but those have not gone
mainstream yet.
Though the main burden of communication lies on the sight, we use
hearing to argument multimedia, enhance system and application
communication. In most cases, it would be impossible to even enter the
operating system without the sense of sight not to mention doing anything
else. That is why the situation of people with sight impairment is so
difficult when it comes to interfacing with computers. It requires often a
very expensive set of software with the top of the line costing between
$500-$1300 [2].A combination of a screen reader whose task is to identify
and interpret the output of a computer screen, a task that’s being done by
a computer monitor and a human brain in case of people without sight
impairment, and a text-to-speech or Braille output device. Text-to-speech
is the preferred and the most natural method of representing interpreted
text.
I have decided to take on the problem of accessibility of sight impairment
enabled computers in public places like schools, public administration
offices, airports, libraries - in specific World Wide Web access. There are
major obstacles in adjusting computers to the use of sight impaired
persons. Starting with the high cost of buying the software which in a
places like schools, universities, libraries with hundreds of computers
could rise to astronomical levels not to mention not many schools can
afford it when they cannot even afford buying all the needed computers.
8
The next issue, the software needs to be installed, configured and
maintained which adds to the already high costs. Those were all the issues
I wanted to either deal with or alleviate.
The idea behind this project was to provide a Web based application which
would identify, interpret a WWW page and represent the output to the
user with text-to-speech technology. An application that would not need
an installation or configuration would be easy to run and handle by
persons with sight impairment and could be installed on any computer
plugged into the internet regardless of its architecture or operating
system. I would like to present you the end result of that idea in this
document and a working application on the attached disk. I hope you find
it interesting and useful.
A brief review of speech synthesis 9
2 A brief review of speech synthesis
In this chapter the basic information needed to understand how speech
synthesis works on human and mechanical levels are introduced.
2.1 Human speech synthesis
Modern researchers believe Humans possessed speech abilities as early as
300,000 years ago after the Neanderthals evolution [13] yet documented
Human speech synthesis was a subject of research for only a century now
and the biggest breakthroughs happened only in the last hundred years or
even in the last 20 years.
There are two major centers responsible for the process of human speech
creation: lungs and larynx, with vocal cords and glottis. Humans create
sound when the air pumped by the lungs moves over the vocal cords and
is made to vibrate. Changing sound into speech is a much more
complicated process though; it involves creation of phonation in the glottis
and modifying it into different vowels and consonants. Prepared speech is
then modified by a complex movement of lips, tongue and soft palate
which purpose is to filter out some of the frequencies and resonate some
of the others. [14]
10
Fig. 2.1: Human vocal tract 1
2.2 Text-To-Speech systems overview
The simplest description of a Text-To-Speech system would be an
application that can reproduce speech sequence from a supplied text.
There are generally three kinds of speech synthesis methods:
concatenation, articulator and formant. Despite the differences they follow
the model of human speech production.
Fig. 2.2: Human speech model 2
1 Human vocal tract, picture taken from the May-June 2008 issue of Duke Magazine.
A brief review of speech synthesis 11
2.2.2 Concatenation Speech Systems
It is the most popular and widely used method of speech synthesis. There
are two methods of concatenation; the first one concatenates single words
or parts of sentences from a database to create the speech. They are
called “Voice Response Systems” and their usability is limited to situations
where rich vocabulary is not needed and sentence structure is predefined
to a strict structure. For example telephone automated systems or train
stations arrival and departure announcement systems. That method
generally produces one of the best speech qualities at the cost of
versatility.
The second one concatenates diphones or phones, two smallest segments
of speech in English needed to pronounce text [3].This method allows
pronouncing almost anything in the vocabulary at the cost of a lower
quality output. Prime examples being Festival and FreeTTS open source
solutions and Ivona a commercial solution made in Poland , considered
the best voice quality synthesizer in the world [4][5].
2.2.3 Articulator Speech Systems
It is the most complex and natural sounding method of speech synthesis.
It tries to mimic the human vocal track as best as it’s possible for today’s
technology by creating computational model of every element in the vocal
track and its articulation processes. Speech is generated by simulating
airflow through the model. There are a few working articulatory speech
systems , gnuspeech an open source system and NeXT a commercial
2 Human speech model, picture taken from The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.
http://www.dspguide.com/ch22/6.htm
12
speech system currently belonging to Apple, both having their sources in
system orginally developed by Trillium Sound Research.
Concatenative Speech
Synthesis
Formant Speech
Synthesis
Articulative
Speech
Synthesis
Complexity Simplest Moderate Complex
Quality Variation from poor to
good speech quality
with minor sound
artefacts
Natural and
good quality
speech
Very good
quality and most
Natural speech
Fig. 2.3: Comparison of speech synthesis methods
2.2.4 History
One of the first practical applications of speech synthesis was British
Telephone Company’s speaking clock, an equivalent to Polish Zegarynka.
It was concatenating words from an optical storage to form a "At the third
stroke, the time from BT will be (hour) (minute) and (second) seconds"
sequence. It still is even after 72 years with storage method upgrades and
four different voices over the years with Sara Mendes da Costa as its last
and permanent voice [5][7].
In 1939 Bell Laboratories developed a mechanical device operated by
movement of pedals and mechanical keys, Voder (Voice Operating
Demonstrator). It was considered to be the first true speech synthesizer.
It was based on Vocoder (Voice Coder), a device used for analyzing
speech in order to reconstruct the approximation of it. It required a very
skilled operator but the output could almost sound like a speech. Despite
its shortcomings, it was the first device that showed the potential of
artificial speech systems and opened the way for further developments. [7]
A brief review of speech synthesis 13
Fig. 2.4: Voder schematics 3
3 Voder schematics, picture takken from the “History and Development of Speech synthesis” article at
http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2.html
14
3 Application modeling and implementation
This chapter introduces the analysis and implementation phases of a
Remote Voice Web Browser application development. A Client-Server
application providing universal Web access for people with sight
impairment on any PC class computer plugged into the Internet regardless
of its architecture and operating system.
First the project requirements and their analysis to meet the needs of
people with sight impairment are presented. They are followed by a
technical limitations and technology summary used in the development of
the application.
3.1 Application concept
Fig. 3.1: Applications concept
1. A person with sight impairment asks a nearby bystander to open a
Web page on a public computer
2. Opens a Web page containing the application
Application modeling and implementation 15
3. A request to deploy the client application is made to the deployment
server
4. The client application is deployed
5. Application starts and guides a person with sight impairment about
its use
6. Application is instructed to open a Web page
7. A task is delegated to an application server
8. A request is made to a WWW server somewhere on the Internet for
the content of a Web page
9. Content of the requested Web page is returned to the application
server
10. Content of a Web page is parsed into text, normalized and
formatted for speech synthesis then send back to the client
application
11. Content of the requested Web page is further parsed then it is
outputted to the user as synthesized speech.
3.2 Functional requirements
The Voice Web Browser application has to fulfill the following functional
requirements:
Interpret any general Web pages
Create a voice navigation table for each Web page
Output interpreted Web pages through voice system
Provide a voice navigation system
Provide a voice feedback system
Manage focus and navigation of GUI elements as required by
the needs of people with sight impairment
Work as a Web based application
Work without installation and configuration
Be platform independent
16
Be easy to use and intuitive for people with sight impairment
3.3 Non-Functional requirements
The Voice Web Browser application has to fulfill the following non
functional requirements:
Provide a modular design based on Netbeans API
Support easy maintainability and modifiability
Client side application should provide high enough efficiency to meet
performance requirements of office type computers
Server side application should provide scalability for hundreds of
users on a typical home WWW server
Be based on Open Source libraries
Be distributed under Open Source license ( only on condition that
promoter and University authorities give their permit)
3.4 Feasibility analysis
Platform independency should be achieved by writing the client and
server applications in an interpreted platform independent language-
Java was chosen as the most suitable solution accompanied by Java
native libraries for the subsystems.
Interpretation of any general Web pages should be achieved with
the help of a Text Web browser – After a short review Lynx was
chosen because it is known to be a long time standard text browser
for website development.
All the sound systems and sound output of Web pages should be
achieved with the help of Java Speech API and a feasible speech
Application modeling and implementation 17
synthesizer – After analysis of the requirements FreeTTS speech
synthesizer API was chosen by meeting most of the requirements.
Modularity was should be achieved with the help of Netbeans API
and following design patterns like Model View Controller.
Web deployability should be achieved with a technology that
provides portability of Web based applications while retaining
capabilities of desktop applications – Java Web start proofed to be
the most suitable choice
3.5 Technical limitations
3.5.1 Accessibility
The major technical limitation was providing an application that could
work on any public computer without the need for installation and
configuration. The solution was a Web based application but each Web
technology has its limitations and finding a suitable one was another
challenge in itself.
Java applets were one solution but they have several limitations compared
to desktop applications. Applets cannot:
Read or write to the local file system
Cannot make connections except to the servers on which they
were deployed
Cannot access native libraries
Cannot Create processes on the local machine
[15]
The solution to the problem proofed to be Java Web Start. It possesses all
the benefits of Web based applications while retaining desktop
18
applications capabilities. More information about Java Web Start can be
found in the technology section of this thesis.
3.5.2 Speech synthesis
Design assumed using a free open source speech synthesizer, their
number is limited though and they had to meet specific requirements. In
the end most of the needs could not be met by open source synthesizers
so the requirements were toned down to providing average quality of
speech, being written in Java and compatibility with Java Web Start
technology. The solution was FreeTTS, an open source concatenation
based speech synthesizer.
3.5.3 Interpretation of Web pages
This was the most challenging barrier. While interpreting RSS feeds is
fairly easy, general Interpretation of Web pages for speech synthesis
could be a topic of a master thesis in itself. It is a complicated process
partially taken care by large applications like Web browsers. Another part
of the problem is general Web pages are written for graphical displaying of
data not for sound reproduction. The solution to this problem was very
limited and was provided by an old technology, text Web browsers from
20 or more years ago. I used Lynx a text based web browser developed in
1992 by a team of students at a French university to distribute campus
information.
Technology 19
4 Technology
This section describes all the technologies used in development of the Web
Browser application.
4.1 Java
“A high-level programming language developed by Sun Microsystems.
Java was originally called OAK, and was designed for handheld devices
and set-top boxes. Oak was unsuccessful so in 1995 Sun changed the
name to Java and modified the language to take advantage of the
burgeoning World Wide Web.
Java is an object-oriented language similar to C++, but simplified to
eliminate language features that cause common programming errors. Java
source code files (files with a .java extension) are compiled into a format
called bytecode (files with a .class extension), which can then be executed
by a Java interpreter. Compiled Java code can run on most computers
because Java interpreters and runtime environments, known as Java
Virtual Machines (VMs), exist for most operating systems, including UNIX,
the Macintosh OS, and Windows. Bytecode can also be converted directly
into machine language instructions by a just-in-time compiler (JIT).
Java is a general purpose programming language with a number of
features that make the language well suited for use on the World Wide
Web. Small Java applications are called Java applets and can be
downloaded from a Web server and run on your computer by a Java-
compatible Web browser, such as Netscape Navigator or Microsoft
Internet Explorer.” [8]
20
With today’s progress of Java development it is suited for both desktop,
corporate, server applications and even games as well. On the list of Java-
compatible Web browser there are also Mozilla Firefox, Apple Safari,
Google Chrome and many embedded browsers of Linux graphical desktop
environments.
4.2 Java Web Start
It is a technology developed for deploying Java applications over the
Internet based on the Java Network Launching Protocol API (JNLP). JNLP
provides a browser-independent architecture for deploying applications.
Programmer is only required to write an XML file with .jnlp extension
describing all the needed jars and their locations, everything else is done
on the client machine by Web Start application which is installed with
every modern Java distribution. It’s a very powerful system providing Web
based applications the capabilities of Java desktop applications while
retaining the portability of Web based applications. Java Web Start can
also update Java Runtime Environments on the client machine if it’s
required by the deployed application thus guaranteeing proper working of
the application on all client machines. [9]
4.3 FreeTTS speech system
FreeTTS is a speech synthesis system based on Flite, synthesis engine
developed at Carnegie Mellon University, and written in Java. Flite is
derived from the Festival Speech Synthesis System from the University of
Edinburgh and the Carnegie Mellon UniversityUniversity of Edinburgh and
the FestVox project from Carnegie Mellon University. [16]
While it does not support JSML or any other markup language,
implementation with Netbeans API requires a rewrite of source code,
quality of speech is average and implementation of Polish voice while
Technology 21
using Java Web Start proofed impossible, it still is the best free open
source speech synthesizer written in Java with the best voice quality.
4.4 Netbeans framework
Netbeans is a generic framework for Swing applications that provides
flexible and reliable application architecture. A framework that saves time
by reliving a person from writing all the boilerplate code for tabbed views,
menus, explorer and pallet types of views, saving state, connecting
actions to menu items, toolbar items, keyboard shortcuts, a window
management. It also encourages the use of good design pattern solutions,
for example the Lookup API, Dependency system and a lot of elements
using MVC model. [10]
Netbeans API provides many out of the box components that make the
development much quicker and easier, can be reused during development
and a lot of other benefits for example the module system providing easy
modifiability and maintainability. Some of the benefits of the Netbeans API
are:
Modular Runtime Container
Netbeans runtime container provides lifecycle services to Swing
applications allowing for composing a set of modules into a single Swing
application. This modularity allows developers to organize the application
code into separated versioned modules. Only the modules that have
explicitly declared dependencies are able to use code from other exposed
packages. This model helps greatly when developing or maintaining large
applications developed by teams of engineers. There are benefits for the
end users of the application as well, they are able to install modules into
the running application because modules are pluggable. Summarizing, the
22
NetBeans runtime container provides an environment for modules that
handles their lifecycle and enables them to interact with each other. [17]
Loose Coupling & Context Sensitivity Management
NetBeans provides an equivalent to JDK6 ServiceLoader class an
implementation of Service Locator design pattern. The Lookup API, with
the same functionality but being more suited for Netbeans platform by
providing dependency injection among the other benefits. It enables
modules to communicate with each other in a type-safe uncoupled way, it
allows the use of objects defined in one module in another without the
need to depend on each other. [17]
System FileSystem
The NetBeans filesystem offers the ability to install folders and files into
the application filesystem for example settings which enables them to be
read by all the modules in the application. [17]
Window System
The NetBeans Window System API provides a multiply window GUI with
tabs and modes that can be maximize/minimize, dock/undock, and drag-
and-drop out of the box. It also takes care of interactions between all the
windows in the system[17]
Data Management
The NetBeans Nodes API provides a generic model for Swing components
like jLists, jTables which can be used in every component without the
need of rewriting the model. Nodes can be also used to display data in
Technology 23
several Swing components provided Netbeans Explorer & Properties Sheet
API. [17]
24
5 Design
The application consists of two parts, a server and a client. The server
application retrieves, interprets, normalizes and formats Web pages for
speech synthesis and sends them to the client applications. The client
applications task is to change interpreted data into speech and provide it
through a voice interface consisting of voice navigation and voice feedback
systems to the user of the application.
5.1 Communication
The communication between the server and the client applications is
based on Http (Hypertext Transfer Protocol) with MIME type application/x-
java-serialized-object which is used for transferring serialized java
objects. Http is a stateless protocol it does not need to store session data
about users. The server is designed in a stateless architecture as well, all
the recourses are committed and destroyed after the transmission.
Stateless server in addition to the stateless protocol results in a lower
resources usage and higher efficiency by the server.
Fig. 5.1: Communication diagram
5.2 Client
5.2.1 Modular design
The client part of the application is designed based on a modular system.
This design allows easy maintainability and modifiability and prevents
creation of spaghetti effect. Existing modules can be removed during the
runtime and new modules can be installed without stopping the
application. GUI modules of the application are uncoupled from the Model
modules and logic modules. Any module can be removed and the
application will still work just without the capability that was provided by
the removed module. By using Lookup you can even remove a module
and have another module automatically taking care of the removed
module tasks without any changes to the code. Those are just the main
benefits of the modular design.
5.2.2 Portability
The application is fully written in Java and all the external libraries are
native Java. There are no dependencies on system resources. The
application should run on any system and hardware architecture
supporting Java. The only limitation could be the ability to reproduce two
sound streams at the same time for completely correct working of the
application for example some Linux distributions. The application was
tested on Windows XP, Windows Vista, Windows 7 by myself and Mac OS
X 10.5.8 by Fabrizio Giudici and proofed to be working flawlessly.
5.2.3 Model-View-Controller architecture pattern
Model-View-Controller is an architectural pattern designed to uncouple the
model functionality from the presentation and control logic of the
application. It allows different presentation layers to share the same data
Design 27
model otherwise the model would need to be written twice or more times
for every presentation layer increasing the work of engineers. Using the
pattern not only safes time but allows for easy implementation and
maintainability of the application. [18]
Fig. 5.2: Model-View-Controller overview diagram4
5.2.4 Class model
This section contains UML Class model of the client part of the application.
The model is showing packages, classes and the relations between them
and is used as an overview of the application design.
4 Model-View-Controller overview diagram, picture taken from the Sun Java Blueprints http://java.sun.com/blueprints/patterns/MVC-
detailed.html
Fig. 5.3: Class diagram of the client application
Model and nodemodel are the Model part of the Model-View-Controller
architectural pattern. Their purpose is to shape and encapsulate the data
on which the application operates.
Model module encapsulates Web data received from the server
part of the application
NodeModel encapsulates and provides MyNode objects which
inherit from AbstractNode, a part of Netbeans API responsible for
presentation layer.
Communication, Controller and VoiceSynthesizer modules are the
Controller part of MVC architectural pattern. Controller is responsible for
the business logic of the application.
Communication module is responsible for the data transfer between
the server and the client parts of the application. Data is transferred
as serialized objects both ways using the URL class. The address of
a Web page to interpret is sent to the server on a user request and
server sends back the interpreted and formatted Web page. Both
the server and client close their sessions, in addition the server
commits all the resources following a stateless architecture.
Controller does the business logic part of application and is an
intermediate layer between the GUI and Model. Some things that
are done by Controller are for example informing modules of new
tasks, controlling the communication, voice synthesis.
VoiceSynthesizer module is a Controller part of MVC pattern. It is
responsible for interfacing with FreeTTS and Java Speech API’s,
controlling the synthesizer parameters and speech flow, factoring
synthesizer objects
30
GUIOptions, GUIBrowser are GUIExplorer are the View parts of the MVC
architectural pattern. Their task is to provide a GUI for the user which
used to display and input infromation. Considering users have sight
impairment GUI’s are only used for inputting data and providing a
template and event management for the sound navigation system.
GUIBrowser is the main interface window. It allows the user to type
the Web address, inner Web number and to pause and resume the
speech queue. It also listens for mouse wheel to adjust the sound
volume.
GUIOptions is an option panel for the application, it shows the
current Web server address and allows changing it, in the future it
will contain more settings as the application grows. It stores settings
in the user home directory
GuiExplorer module is responsible for the presentation of Web data
under the form of Nodes. It is mainly used for development
purposes together with the property sheet like adjusting Web data
formatting but in the future it will be reused for a Web page
templates creator used to increase the quality of interpreted Web
pages.
5.2.5 Sequence diagram
Scenario 1 Use Case
The user opens a Web page Fig. 5.4
Scenario 2 Use Case
The user listens to speech synthesis, changes the speaking rate and
volume, pause speech synthesis then resumes it but before the Web
page finishes cancels the speech Fig. 5.5
Fig. 5.4: Sequence diagram presenting first scenario Use Case
Fig. 5.5: Sequence diagram presenting second scenario Use Case
5.2.6 Dependency model
Dependency model shows dependencies between modules in the
application. The application is divided into eight modules and into three
layers. There are no mutual dependencies which in addition to modular
system mean if one module is removed it will only affect one place in the
application instead of the whole system like In the common “spaghetti”
architecture applications.
Fig. 5.6: Client applications dependency model and layer separation
5.2.7 Interface
Interface is designed with people with sight impairment in mind. It
consists of two text fields, three buttons , 4 sliders, a text area and 2
hidden panels for development purposes.
High contrast text area showing
Web pages with large font
Text field used for inputing Web address
Pause and resume buttons used for controlling speech synthesis
Text field used for inner Web page navigatgion
Voice synthesizer settings
Options button
Web address field – used for inputting Web address of a desired
Web page
Inner Web page field – used for navigation inside a Web page
Pause and resume buttons – used for controlling speech synthesis
Speech synthesizer settings – used for controlling main and
interface synthesizers settings like volume or speaking speed
Options button – used for entering options panel of the application
5.2.8 Compatibility
Application is compatible with all the operating systems and hardware
supported by Java Se 6.0+ and Java Web Start. See Annex B for the
current compatibility list from Sun resources.
5.3 Server
5.3.1 Class model
This section contains UML Class model of the server part of the
application. The model is showing packages, classes and their
relationships. The server application is not designed as a modular
application like the client part. It consists of a servlet, a view part of the
MVC model, responsible for communication with the client application and
a controller class for business logic like opening Lynx process, normalizing
and formatting Web data.
36
Fig. 5.7: Server applications UML model
Design 37
5.3.2 Compatibility
Server side compatibility consists of two things:
Lynx compatibility
Supported by most Linux distributions and Windows versions
through Cygwin
List of Servlet compatible WWW servers, see Annex C
5.4 Development challenges
Focus management on Netbeans platform
Focus management is very important for applications designed for
people with sight impairment in mind. You need the application to
work exactly as you planned it to work and focus is a key player
here. The application has to start with the right window active in the
right tab with the predefined component focused in every
circumstance. The transfer of focus has to happen according to a
planned traversal policy. I used the Netbeans API in the
development for its many advantages but focus management was
not one of them. It was a very hard task to complete. There is not
much documentation on this and most of them just point to SWING
focus management. The solution to this problem took a lot of time to
complete but thought me a lot about SWING applications design and
architecture.
38
Implementation of FreeTTS on Netbeans platform Web Start
application
FreeTTS library creates its own classloader that tries to find all the
needed jars. It is not a job of library to provide its own custom
classloader, it is a bad design but it is even worse when designing
application on netbeans API. It has strict dependency policy
enforcement and a wrapper libraries system. Basically FreeTTS
would not work without source code rewrite. This issue took weeks
to resolve including many attempts to change the default class
loader in freetts.jar, all ending in disaster duo to lack of experience
with class loaders. I have finally resolved the issue by weeks of
tinkering with Netbeans.
Testing 39
6 Testing
6.1 Live web testing
6.1.1 Method
In order to test performance, find bugs in the client and the server
application and get professional feedback I have created a suitable native
Web environment by inviting members of the Netbeans mailing group
[email protected] into the testing phase. It is a group
created for computer science faculty students that have taken a 16 hour
training course in Netbeans API designed for exchanging new application
ideas and sharing knowledge about the API. I have taken the curse in May
which was organized by Silesian JUG.
GlassFish, an open source application server, was set for the members of
the group. They were asked to try the application over the internet and
provide feedback while I was monitoring the server. Over 30 different IP’s
have connected from all over the world including a polish student whose
master thesis was a blind and sight impairment aid system.
6.1.2 Results
The application server crashed 10 minutes after starting tests duo to too
many open local processes. After fixing the problem, by committing all the
streams opened by processes and destroying the said processes after
each transaction, the server continued to work flawlessly without any
crashes for five straight days while continued observations of its
parameters and CPU / memory usage were done.
40
6.1.3 Feedback
“I'm sitting with Toni in Geneva right before the next training day and
we've just been listening to my blog in your application! Great! However,
we thought that the app should be more visually pleasing... but then we
realized the target audience is blind. :-) So, absence of progress bar
integration isn't a problem (except, maybe the progress bar could be
integrated anyway and then elevator should be played during progress of
accessing the requested site).”5
“Started correctly on Mac OS X 10.5.8 and it's speaking right now
Pretty cool. It's the first Java talking application that I try.”6
6.2 Synthetic testing
6.2.1 Method
Profiler tool was used for the performance measurements. It is used to
monitor important information about the runtime behavior of applications,
such as CPU performance, memory usage, thread states, while imposing
low overhead. Below are the parameters for the performance tests.
Hardware specification
CPU: Intel T4200 @ 2.0 GHZ
Memory: 3,00 GB
System: Microsoft Vista
Software specification
5 Geertjan Wielenga – a technical writer and trainer for Netbeans
6 Fabrizio Giudici - Java Architect, Project Manager
Testing 41
Operating System: Windows Vista Home Basic
Java Development Kit: 1.6.0 update 14
Netbeans Platform: 6.7.1
Profiler Calibration results
Approximate time in one methodEntrey()/methodExit() call pair:
When getting absolute timestamp only: 2,7497 microseconds
When getting thread CPU timestamp only: 1,1512 microseconds
When getting both timestamps:: 3,682 microseconds
Approximate time in one methodEntrey()/methodExit() call pair in sampled
instrumentation mode: 0,2211 microseconds
Profiler performance test settings
Scope: Entire application
Filter: Profile project and subprojects
Method tracking: Exact call tree and timing
Exclude time spent in Thread.sleep() and Object.wait()
Limit number of profiled threads: 32
Instrumentation scheme: Total
Instrument: Method.invoke()
Testing scenario
Open a predefined list of websites and execute them while doing minor GUI operations
like pausing / resuming and changing volume.
http://www.theinquirer.net/ and its sub pages
http://www.bbc.co.uk/ and its sub pages
http://mmorpg.com/ and its sub pages
http://www.gazeta.pl/ and its sub pages
42
6.2.2 The results
Tests took an hour and a half and their purpose was to show the behavior
of application during normal use. During the tests several important
factors were taken into the consideration following a model of application
profiling and two testing patterns. First pattern assumed reading the full
content of a Web page before opening new one, the second one took only
3-4 minutes and assumed opening new Web pages in short concussive
intervals to simulate browsing part of the test. Below you will find the test
results with their analysis.
6.2.2.1 Surviving generations / Garbage Collector CPU time
Surviving generations shows how many garbage collections, objects
allocated on the JVM heap space survived since the start of the
application. Generally the number of surviving generations rises during
the application startup but it stabilizes after the application is done loading
when all the temporary objects are destroyed. If the number of surviving
generations continues to rise instead of stabilizing it might mean that
objects that are put on the heap are not removed from it by the Java
garbage collector, which is called a leak and is an unwanted feature that
may lead to inefficient resources management, stability problems or even
out of memory crashes.
Fig. 6.1: Surviving generations and Garbage Collector diagram showing the first 90 seconds of
application execution
Testing 43
During the first two minutes of application execution graph shows a rise in
surviving generations as expected from a launching application. The
results then begin to stabilize not showing any signs of memory leaks.
Fig. 6.2: Surviving generations and Garbage Collector diagram showing the full 90 minutes of testing
cycle
During the next hour and half graph shows a steady number of surviving
Generations. It is not until the application was executing for an hour and
ten minutes that a sharp rise was noted. The rise has soon has stabilized
and was not a sign of a memory leak. It happened due to heavy load
created by a change in testing pattern, which was a concussive opening of
Web pages in short intervals.
6.2.2.2 Total heap size / Used heap size
Heap is an area in the memory used for storage, for example objects,
during the runtime.
Fig. 6.3: Total/Unused heap size chart
44
The graph shows two things. The heap size has enough margin space
allowing for an effective work of garbage collection and heap size is
consistent, staying at a fairly even level between 50 and 60 megabytes.
6.2.2.3 Threads / Loaded classes
Fig. 6.4: Number of Threads and loaded classes chart
The graph shows the number of active threads and loaded classes. The
data is consistent and does not show anything out of order.
6.2.2.4 CPU performance analyze
This graph shows how much relative time of the CPU each method in the
application has taken during the one hour and thirty minutes.
Testing 45
Fig. A.1: Details of two speech synthesizer threads that have used the most CPU time during testing and the only module thread that have used significant CPU time, the communication method sendAndReciveData
Fig. 6.5: Top 35 packages that have used the most CPU time
46
The Results show that almost 80%+ of the CPU relative time was taken by
the FreeTTS and Java Sound API while only 5% was taken by the network
communication. This shows that the application introduces very small
overhead and considering the fact that the application has no memory
leaks it shows that the architecture is very performance efficient.
7 Conclusions
The Project was designed to provide Web access for people with sight
impairment in limited environments like public places. Even with the
numerous technical limitations and a few development problems the
project can be considered successful. It has fulfilled all the requirements
put ahead of it with one minor exception, to run the application there is a
need for a person with a healthy sight to type the Web address of an
application in a Web browser but it is simple, takes seconds to do and can
be done by anybody around which is not a problem in public places. I am
working on a solution for it but considering public computers have limited
access combined with already limited access of Java Web Start to the local
resources, it is a very difficult task and not high on my priority list.
While the project was successful there are a few things that can be
improved. I want to continue the project as my master thesis or at least
release it under open source license to improve the application so it can
one day become a true solution to many persons with sight impairment
problems.
The voice synthesizer supports only English and is of average quality. Not
only that but it does not support JSML thus limiting its use. Without JSML
support you cannot control the way synthesizer speaks or even introduce
small pauses since it is not supported and Thread operations on it are
unsafe and unpredictable. The solution to it is using Ivona synthesizer, it
provides one of the best commercial voices on the market but SDK alone
costs around 200+ Euros and it will require support from the University or
organization helping people with sight impairment.
48
General Web parsing could be improved too. The issue is that Web pages
are read from up to bottom and it is not done in the way a healthy person
would read a Web page. Another problem is many contents of the main
page are replicated on the sub pages like for example menus and other
elements thus limiting the experience and wasting time of people using
the application. The solution to these problems would be writing a
template creator for Web pages to suit them further for the needs of the
application. The editor would allow volunteers or students to write
templates for the most popular pages. I have started doing it in Netbeans
API with the use of Visual Library and even finished part of it but the
development was put on hold after realizing FreeTTS does not support
JSML and lacks any sort of control mechanism for the output. I would
definitely like to restart development with the use of Ivona.
Summery in Polish 49
8 Summery in Polish
Istoty ludzkie posiadają pięć zmysłów, zgodnie z psychologią Arystotelesa
tymi zmysłami są wzrok, słuch, węch, smak i dotyk [1]. Do interakcji z
komputerami używa się głównie zmysłu wzroku i słuchu. Są projekty
wprowadzające generowanie zapachu dla komputerów ale projekty te nie
dojrzały jeszcze do komercyjnej implementacji.
Główny ciężar komunikacji między człowiekiem a komputerem spoczywa
na zmyśle wzroku natomiast zmysł słuchu jest przeważnie używany tylko
do argumentacji tej komunikacji. W większości przypadków niemożliwym
byłoby nawet uruchomienie systemu operacyjnego bez zmysłu wzroku.
Dlatego sytuacja osób z zaburzeniami wzroku jest tak trudne w kwesti
interakcji z komputerami. Wymaga ona często bardzo drogich zestawów
oprogramowania, których to ceny wachają się w przedziale od 500 do
1300 dolarów za pakiety z wyższyej półki [2]. Oprogramowanie to składa
się na kombinacje czytnika ekranu (z ang. screen reader) i urządzenia
wyjściowego. Czytnik ekranu wykonuje interpretacje i analizę , zadania
które to normalnie są wykonywane przez monitor komputerowy i ludzki
mózg. Urządzeniem wyjściowym jest syntezator mowy lub urzadzenie
wyjściowe Braille’a.
Zdecydowałem się podjąć problemu powszechnej dostępności komputerów
przystosowanych do pracy z osobami niewidomym pod kontem dostępu do
Internetu w miejscach publicznych takich jak szkoły, urzędy administracji
publicznej, lotniska, biblioteki. Występuje wiele poważnych przeszków w
dostosowaniu komputerów w miejscach publicznych do potrzeb osób
niewidomych. Pierwszą z nich jest wysoki koszt zakupu oprogramowania,
które w miejscach takich jak szkoły, uniwersytety, biblioteki z setką
50
komputerów mogą przekraczać nawet koszt same infrastruktury IT, nie
wspominając już o tym że wiele szkół nie może sobie na to pozwolić, jeżeli
nie są wstanie zakupić wszystkich potrzebnych komputerów, przynajmniej
w krajach takich ak Polska. Następną kwestią jest, że oprogramowanie
musi być zainstalowane, skonfigurowane i konserwowane, to wszystko
podnosi koszty, które już i tak są wysokie. Były by to wszystkie kwestie,
którymi chciałem się zająć podczas realizacji tego projektu.
Ideą tego projektu było dostarczenie aplikacji Web’owej, która
zinterpretuje dowolną stronę WWW bez względu na jej format i przekaże
jej treść pod postacia syntetycznej mowy. Aplikacja, która nie
potrzebowała by instalacji, konfiguracji, była łatwa w uruchamianiu i
obsłudze przez osoby niewidome oraz mogła być uruchominona na
dowolonym komputerze podłączonym do internetu bez względu na system
operacyjny czy też architekture.
Bibliography 51
9 Bibliography
1. Kaufmann Kohler, Isaac Broyde. Senses, The Five. Jewish
Encyclopedia. [Online]
http:\\www.jewishencylopedia.com/view.jsp?artid=479&letter=S.
2. Screen Readers. Enable Mart. [Online]
http://www.enablemart.com/Catalog/Screen-Readers.
3. George H.Shames, Elisabeth H. Wiig. Human Communication
Disorders. s.l. : Bell & Howel Company, 1982. 0-675-09837-8.
4. Christina L. Bennett, Alan W Black. The Blizzard Challenge 2006.
festvox. [Online]
http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf.
5. Robert A. J. Clark, Monika Podsiadło, Mark Fraser, Catherine
Mayo, Simon King. Statistical analysis of the Blizzard Challenge 2007
listening test results. festvox. [Online]
http://www.festvox.org/blizzard/bc2007/blizzard_2007/full_papers/blz3_0
03.pdf.
6. Speaking Clock. Telephones Uk. [Online]
http://www.telephonesuk.co.uk/speaking_clock.htm.
7. Lemmetty, Sami. Review of Speech Synthesis Technology. Helsinki
University of Technology. [Online]
http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2
.html.
8. WebMediaBrands Inc. Java. Webopedia. [Online]
http://www.webopedia.com/TERM/J/Java.html.
9. Sun Microsystems, Inc. Java Platform, Standard Edition (Java SE) -
Java Web Start Overview. Developer Resources for Java Technology.
[Online]
52
http://java.sun.com/javase/technologies/desktop/javawebstart/overview.
html.
10. Sun Microsystems, Inc. Netbeans platfrom. Netbeans. [Online]
http://bits.netbeans.org/dev/javadoc/index.html.
11. Sun Microsystems, Inc. . JavaTM SE 6 Release Notes - Supported
System Configurations. Developer Resources for Java Technology.
[Online] http://java.sun.com/javase/6/webnotes/install/system-
configurations.html.
12. Hunter, Jason. Standalone Servlet Engines. Servlets. [Online]
http://www.servlets.com/engines/.
13. Science Blog. Earlier Human Speech? Science Blog. [Online]
http://www.scienceblog.com/community/older/1998/B/199801121.html.
14. Vorländer, Michael. Auralization. Fundamentals of Acoustics,
Modelling, Simulation, Algorithms and Acoustic Virtual Reality. s.l. :
Springer, 2008.
15. Michael Girdley, Kathryn A. Jone. Web Programming with Java
16. FreeTTS 1.2 - A speech synthesizer written entirely in the JavaTM
programming language. SourceForge. [Online]
http://freetts.sourceforge.net/docs/index.php#what_is_freetts.
17. Sun Microsystems. Netbeans platform. Netbeans. [Online]
http://platform.netbeans.org/description.html.
Installation and use 53
A Installation and use
A.1 Installation
Client application does not require installation. To open the application a
person without sight impairment is required but it is only 2 easy steps that
anyone can do and takes roughly 10-20 seconds. Following steps have to
be taken:
Open a Web browser and type the applications WWW address in the
address field. For example www.webbrowser.org/webbrowser.jnlp
Wait for the application to finish up loading, for the first time it can take
up to 2 minutes on 1mbit connection to load
Accept the security pop up, you can disable it for every concussive use by
ticking “always trust content from this publisher”
54
A.2 Use
After the application is opened persons with sight impairment can take
over. The use of application is very easy and users are guided by a voice
navigation and feedback system.
Mouse wheel is used for increasing or decrease the sound volume.
Tab key is used to change between:
Web address field - used for inputting the address of a Web
page. Every letter input on the keyboard will be echoed back
through the voice feedback system and hitting enter will
commit the typed address. The user will be informed if the
address is correct or the WWW server is offline.
Link number field - used for inner Web page navigation. User
can access any inner parts of the Web page by typing their
Client compatibility list 55
corresponding number. Those numbers will be spoken during
the main Web page speech output. It will be automatically
enabled and focused if the Web page was successfully
retrieved.
Pause button – used to pause the Web paged speech output.
Activated by pressing enter
Resume button – used to resume the Web page speech
output. Activated by pressing enter
56
B Client compatibility list
Platform
Operating
System
Version
Desktop
Managers Browsers JRE JDK
SolarisTM Operating System, 32-bit and 64-bit
Solaris
Sparc
(32) Solaris 10
JDS-2
(Gnome-
Metacity),
CDE-dtwm
Mozilla 1.4x,
1.7+
32-bit
Install
32-bit
Install Solaris 9
Gnome-
Metacity
2.4.34 or later
CDE-dtwm
Solaris 8 CDE-dtwm,
Openwin-olwm
Solaris
x86
(32)
Solaris 10 Gnome-
Metacity, CDE
Mozilla 1.4x,
1.7+
32-bit
install
32-bit
Install
Solaris 9
Gnome-
Metacity, CDE
Solaris 8 CDE, Openwin
OpenSolaris GNOME 2.24.0 Firefox 3
Windows 32-bit
Windows
Intel IA32
Windows XP
Professional
Windows/Active
for Windows
IE 6 SP1+,
IE 7, IE 8
32-bit
Install
32-bit
Client compatibility list 57
Windows XP
Home
Mozilla 1.4.X
or 1.7+,
Netscape
7.X, Firefox
1.06 - 3
Disk
space
Install
Disk
space
Windows
Server 2003
Windows
2000
Professional
IE 6 SP1+,
Mozilla 1.4.X
or 1.7+,
Netscape
7.X, Firefox
1.06 - 3
Windows
2000 Server
Windows
Vista
IE 7 or IE 8
Windows
Server 2008
Windows 64-bit
Windows
x64
32-bit mode
Windows XP
Windows/Active
for Windows
IE 6 SP1+, IE
7, IE 8
Mozilla 1.4.X
or 1.7+,
Netscape 7.X,
Firefox 1.06 -
3
32-bit
Install
Disk
space
32-bit
Install
Disk
space
Windows
Server 2003
IE 6 SP1+, IE
7, IE 8
Mozilla 1.4.X
or 1.7+,
Netscape 7.X,
Firefox 1.06 -
3
58
Windows
Vista IE 7 or IE 8
Windows
Server 2008
Windows
x64
64-bit mode
Windows XP
Windows/Active
for Windows
64bit OS,
32bit
Browsers:
IE 6 SP1+, IE
7, IE 8
Mozilla 1.4.X
or 1.7+,
Netscape 7.X,
Firefox 1.06 –
3+
64-bit
Install
32-bit
Install
Disk
space
64-bit
Install
32-bit
Install
Disk
space
Windows
Server 2003
Windows
Vista
64bit mode,
64bit
Browsers:
IE 7 or IE 8
Windows
Server 2008
Linux 32-bit
Linux
IA32 Red Hat 2.1,
Red Hat
Enterprise
Linux 3.0,
4.0, 5.0 -
5.2
Gnome1.4-
sawfish 1.0 or
later
Gnome 2.2 -
metacity 2.4 or
later
Mozilla 1.4.x
or 1.7+,
Firefox 1.06
- 3
32-bit
Install
32-bit
Install
Suse
Enterprise
Linux Server
8, Suse
Enterprise
Linux Server
9, Suse
Enterprise
Gnome2.0.5-
Metacity 2.6.2
or later
(default: 2.4)
Client compatibility list 59
Linux Server
10, Suse
Enterprise
Linux
Desktop
Turbo Linux
10 (ONLY
Chinese and
Japanese
Locale. No
english.)
Gnome-sawfish
1.0 or later
Linux 64-bit
Linux x64
32-bit
mode
Suse
Enterprise
Linux Server
8, Suse
Enterprise
Linux Server
9, Suse
Enterprise
Linux Server
10, Suse
Enterprise
Linux
Desktop
Gnome2.0.5-
Metacity 2.6.2
or later
(default: 2.4)
Mozilla 1.4.x
or 1.7+,
Firefox 1.06
- 3
32-bit
Install
32-bit
Install
Red Hat Gnome2.0.5-
60
Enterprise
Linux 3.0,
4.0, 5.0 -
5.2
Metacity 2.6.2
or later
(default: 2.4)
Turbo Linux
10 (ONLY
Chinese and
Japanese
Locale. No
english.)
Gnome-sawfish
1.0 or later
Linux x64
64-bit
mode
Suse
Enterprise
Linux Server
8, Suse
Enterprise
Linux Server
9, Suse
Enterprise
Linux Server
10, Suse
Enterprise
Linux
Desktop
Gnome2.0.5-
Metacity 2.6.2
or later
(default: 2.4)
64bit OS,
32bit
Browsers:
Mozilla 1.4.x
or 1.7+,
Firefox 1.06
- 3
64bit mode,
64bit
Browsers:
64-bit
Install
32-bit
Install
64-bit
Install
32-bit
Install
Red Hat
Enterprise
Linux 3.0,
4.0, 5.0
Gnome 2.2 -
metacity 2.4 or
later
Fig. A.2: Server application compatibility list [16]
Server compatibility list 61
C Server compatibility list
Tomcat server
IBM's WebSphere Application Server
BEA Weblogic Application Server
Caucho's Resin Server
Adobe's JRun Web Server
Orion Application Server
Oracle Application Server
ATG Dynamo Application Server
Pramati J2EE Server
Borland AppServer
Jetty Server
The World Wide Web Consortium's Jigsaw Server
Zeus Web Server
iPlanet (Netscape) Web Server Enterprise Edition
iPlanet (Netscape) Web Server Enterprise Edition for Linux
Netscape Enterprise Server 3.5.1 and 3.6
GemStone/J Application Server
Gefion Software's LiteWebServer
CtO-Jstar
M5 Web Server
Servertec's iServer
Lotus's Domino Go WebServer
Paperclips Java Servlet Server 2.0
jo! Web Server
KonaSoft Enterprise Server
NGASI (Next Generation Application Server)
Avenida Web Server
62
vqServer
Serfler
WebEasy WEASEL Application Server
Tandem's iTP WebServer
Novocode's NetForge
Enhydra [12]