View
230
Download
0
Category
Preview:
Citation preview
8/4/2019 Speech Recog Report - For Merge
1/78
1
PART-I
Preliminary Investigation
8/4/2019 Speech Recog Report - For Merge
2/78
2
1.1 Introduction
This report is a System analysis and design project, which is a study of global
positioning system software receiver Technology. In this project we studied
how gps receiver will works and processed the signal get desired location
,time and position . We start with with gps and its various components
,process and receiver tracking system . Hence, this system makes it possible
tracking the location of things which consists gps receiver. This processes
changes the signal to digits.. The process involves many models and theoriesthat makes the gps successful.
Gps is used in large number of areas. For examples mobile phone
tracking,vehicle tracking system information providing using automated call ,
defence uses, robotics, etc. It facilitates the human computer interaction and
also provides a way to communicate with satellite communication.
The ultimate goal of the technology is to be able to produce a system that can
recognize with 100% accuracy the time and location .. Even after years of
research in this area, the best gps software applications still cannot recognize
location with 100% accuracy. Some applications are able to recognize over 95%
position when environment factors are constant.
Computer software that tracks the location of real world objects enable user to
have conversations with the satellite.
8/4/2019 Speech Recog Report - For Merge
3/78
3
2. Objective
To study global positioning system and its various hardware
components and software used for this. . In this project our aim is to:
Working of mobile phone gps receiver
Hardware components of gps
Software used for gps receiver
Algorithms used for software
8/4/2019 Speech Recog Report - For Merge
4/78
4
3. Problem definitionSoftware GPS receivers can provide full access to base
Band signal processing inside the receiver channels. Thus,
It has become the key component when investigating and
Developing advanced GPS signal processing techniques.
In this presentation, a pure software gps receiver, developed in the
plan group of the university of Calgary , It consists of receivers thatdecode the signals from the satellites.
The receiver performs following tasks:
Selecting one or more satellites
Acquiring GPS signals
Measuring and tracking
Recovering navigation data
8/4/2019 Speech Recog Report - For Merge
5/78
5
4.Working of gps For those who are unfamiliar with the term, GPS stands for Global
Positioning System, and is a way of locating a receiver in three dimensionalspace anywhere on the Earth, and even in orbit about it.
GPS is arguably one of the most important inventions of our time, and has
so many different applications that many technologies and ways of working
are continually being improved in order to make the most of it.
To understand exactly why it is so useful and important, we should first
look at how GPS works. More importantly, looking at what technological
achievements have driven the development of this fascinating positioning
system.
4.1Signals
In order for GPS to work, a network of satellites was placed into orbit
around planet Earth, each broadcasting a specific signal, much like a
normal radio signal. This signal can be received by a low cost, lowtechnology aerial, even though the signal is very weak.
Rather than carrying an actual radio or television program, the signals that
are broadcast by the satellites carry data that is passed from the aerial,
decoded and used by to the GPS software.
8/4/2019 Speech Recog Report - For Merge
6/78
6
The information is specific enough that the GPS software can identify the
satellite, its location in s pace, and calculate the time that the signal took to
travel from the satellite to the GPS receiver.
Using different signals from different satellites, the GPS software is able to
calculate the position of the receiver. The principle is very similar to that
which is used in orienteering if you can identify three places on your map,
take a bearing to where they are, and draw three lines on the map, then you
will find out where you are on the map.
The lines will intersect, and, depending on the accuracy of the bearings, the
triangle that they form where they intersect will approximate your position,
within a margin of error.
GPS software performs a similar kind of exercise, using the known
positions of the satellites in space, and measuring the time that the signalhas taken to travel from the satellite to Earth.
The result of the trilateration (the term used when distances are used
instead of bearings) of at least three satellites, assuming that the clocks are
all synchronized enables the software to calculate, within a margin of error,
where the device is located in terms of its latitude (East-West) and
longitude (North-South) and distance from the center of the Earth.
8/4/2019 Speech Recog Report - For Merge
7/78
7
4.2Timing & Correction
In a perfect world, the accuracy should be absolute, but there are many
different factors which prevent this. Principally, it is impossible to ensure
that the clocks are all synchronized.
Since the satellites each contain atomic clocks which are extremely
accurate, and certainly accurate with respect to each other, we can assumethat most of the problem lies with the clock inside the GPS unit itself.
Keeping the cost of the technology down to a minimum is a key part of the
success of any consumer device, and it is simply not possible to fit each
GPS unit with an atomic clock costing tens of thousands of dollars. Luckily,
in creating the system, the designers designed GPS to work whether the
receivers clock is accurate or not.
There are a few solutions. However the solution that was chosen uses a
fourth satellite to provide a cross check in the trilateration process. Since
trilateration from three signals should pinpoint the location exactly, adding
a fourth will move that location; that is, it will not intersect with the
calculated location.
This indicates to the GPS software that there is a discrepancy, and so it
performs an additional calculation to find a value that it can use to adjust all
the signals so that the four lines intersect.
8/4/2019 Speech Recog Report - For Merge
8/78
8
Usually, this is as simple as subtracting a second (for example) from each
of the calculated travel times of the signals. Thus, the GPS software can
also update its own internal clock; and means that not only do we have an
accurate positioning device, but also an atomic clock in the palm of our
hands.
4.3 Mapping
Knowing where the device is in space is one thing, but it is fairly useless
information without something to compare it with. Thus, the mapping part
of any GPS software is very important; it is how GPS works our possible
routes, and allows the user to plan trips in advance.
In fact, it is often the mapping data which elevates the price of the GPS
solution; it must be accurate and updated reasonably frequently. There are,
however, several kinds of map, and each is intended for different users,
with different needs.
Road users, for example, require that their mapping data contains accurate
information about the road network in the region that they will be traveling
in, but will not require detailed information about the lie of the land they
do not really worry about the height of hills and so forth.
On the other hand, hiking GPS users might wish to have a detailed map of
the terrain, rivers, hills and so forth, and perhaps tracks and trails, but not
roads. They might also like to adorn their map with specific icons of things
8/4/2019 Speech Recog Report - For Merge
9/78
9
that they find along the way and that they wish to keep a record of not to
mention waypoints; locations to make for on their general route.
Finally, marine users need very specific information relating to the sea bed,navigable channels, and other pieces of maritime data that enables them to
navigate safely. Of course, the sea itself is reasonably featureless, but
underneath quite some detail is needed to be sure that the boat will not
become grounded.
Fishermen also use marine GPS to locate themselves and track themovement of shoals of fish both in real time, and to predict where they will
be the next day. The advent of GPS fixing has also meant that co-operative
fishing has become much easier, where there are several boats all relaying
their locations to each other while they locate the best fishing waters.
Special kinds of marine GPS, known as fishfinders, also combine severalfunctions in one to help fishermen. A fishfinder comprises GPS and also
sonar, along with advanced tracking functions and storage for various kinds
of fishing and maritime information.
5.Requirements of gps
5.1Hardware components
Antenna
RF Board
RF Front End
8/4/2019 Speech Recog Report - For Merge
10/78
10
RF/IF down-conversion board (with FPGA)
DSP Board
DSP
5.2Software components Firmware
RF Board FPGA
DSP Board FPGA
SW
Signal Processing SW
Navigation SW
Hardware components:
8/4/2019 Speech Recog Report - For Merge
11/78
11
2: Architecture of Signal Tap
AntennaThe GPS antenna combines a planar antenna and a frequency converter,
which translates the high-frequency phase-modulated spread spectrum
signal of the GPS system to an intermediate frequency. This way a
standard coaxial cable (e.g. RG58) can be used for the connection with the
GPS clock and a distance of up to 300 meters (with RG58) or even 700
meters (with a low-loss cable type like RG213) between receiver and
antenna is possible without additional amplifier.
Ambient temperature: -40 ... 65C Warranty: Three-Year Warranty RoHS-
Status of the product: This product is fully RoHS compliant WEEE status of
the product: This product is handled as a B2B category product. In order to
8/4/2019 Speech Recog Report - For Merge
12/78
12
secure a WEEE compliant waste disposal it has to be returned to the
manufacturer. Any transportation expenses for returning this product (at
2 RF BOARD
RF board stands for Radio Frequency Printed Circuit Boards. The
frequency for RF board is normally between 300MHz ~ 3GHz, or much
bigger, so normally FR4 board cannot meet the requirements, so we need to
use special material to achieve the high frequency and we named this kind
of boards as RF boards. RF board is excellent in high frequency
performance due to its low dielectric tolerance and loss of material.
RF board is ideal for applications with higher operating frequency
requirements. Right now, we normally use following material:The fabricate
process is similar like FR4, but the copper plating is more complex than
FR4, because material characteristics, its much harder to metalize the
through hole (copper plating), and other process is complex than FR4, so
need unique handling method and experienced workers.e from the computerfans, squeaking chairs, or heavy breathing. e.g., creative sound cards, intel
sound cards, acer sound card, philips sound cards.
3 RF FRONT:
In a radio receiver circuit, the RF front end is a generic term for all the
circuitry between the antenna and the first intermediate frequency (IF)
http://www.bestpcbs.com/products/FR4-pcb.htmhttp://www.bestpcbs.com/products/FR4-pcb.htmhttp://www.bestpcbs.com/products/FR4-pcb.htmhttp://en.wikipedia.org/wiki/Radio_receiverhttp://en.wikipedia.org/wiki/Radio_receiverhttp://en.wikipedia.org/wiki/Electrical_circuithttp://en.wikipedia.org/wiki/Electrical_circuithttp://en.wikipedia.org/wiki/Antenna_%28radio%29http://en.wikipedia.org/wiki/Antenna_%28radio%29http://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Antenna_%28radio%29http://en.wikipedia.org/wiki/Electrical_circuithttp://en.wikipedia.org/wiki/Radio_receiverhttp://www.bestpcbs.com/products/FR4-pcb.htm8/4/2019 Speech Recog Report - For Merge
13/78
13
stage. It consists of all the components in the receiver that process the
signal at the original incoming radio frequency (RF), before it is converted
to a lower intermediate frequency (IF). In microwave and satellite receivers
it is often called the low-noise block (LNB) or low-noise downconverter
(LND) and is often located at the antenna, so that the signal from the
antenna can be transferred to the rest of the receiver at the more easily
handled intermediate frequency.
For most super-heterodyne architectures, the RF front end consists of:
An impedance matching circuit to match the input impedance of the
receiver with the antenna, so the maximum power is transferred
from the antenna;
A 'gentle' band-pass filter (BPF) to reduce input noise and image
frequency response; An RF amplifier , often called the low-noise amplifier (LNA). Its
primary responsibility is to increase the sensitivity of the receiver by
amplifying weak signals without contaminating them with noise, so
they are above the noise level in succeeding stages. It must have a
very low noise figure (NF).
The mixer , which mixes the incoming signal with the signal from a
local oscillator (LO) to convert the signal to the intermediate
frequency (IF).
RF/IF DOWN CONVERSION:
http://en.wikipedia.org/wiki/Radio_frequencyhttp://en.wikipedia.org/wiki/Radio_frequencyhttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Low-noise_block_converterhttp://en.wikipedia.org/wiki/Low-noise_block_converterhttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Band-pass_filterhttp://en.wikipedia.org/wiki/Band-pass_filterhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Amplifierhttp://en.wikipedia.org/wiki/Amplifierhttp://en.wikipedia.org/wiki/Low-noise_amplifierhttp://en.wikipedia.org/wiki/Low-noise_amplifierhttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Frequency_mixerhttp://en.wikipedia.org/wiki/Frequency_mixerhttp://en.wikipedia.org/wiki/Local_oscillatorhttp://en.wikipedia.org/wiki/Local_oscillatorhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Local_oscillatorhttp://en.wikipedia.org/wiki/Frequency_mixerhttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Low-noise_amplifierhttp://en.wikipedia.org/wiki/Amplifierhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Band-pass_filterhttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Low-noise_block_converterhttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Radio_frequency8/4/2019 Speech Recog Report - For Merge
14/78
14
The LBC-4000 L-Band IF to 70 MHz IF (140 MHz optional) indoor
converter is a 1RU 19-inch chassis with
two front panel accessible up converter or down converter modules.
It contains two diode OR -ed internal
power supplies, for increased reliability and microprocessor-based
Monitor & Control (M&C) functions.
The LBC-4000 up converter module translates a 70 MHz IF input
signal (140 MHz optional) up to a userselected
frequency at L-Band (950 to 2000 MHz). The L-Band output candrive the input of the Comtech EF
Data MBT-4000 block up converter or other RF equipment with an L-
Band input.
The LBC-4000 down converter module translates an L-Band (950 to
2000 MHz) IF input signal down to a
user selected frequency in the 70 MHz (140 MHz optional) IF band.
The LBC-4000 can be locked to an
internal reference or an external 5 or 10 MHz reference signal. The
LBC-4000 is an excellent choice for
interfacing legacy 70 or 140 MHz equipment to quad-band or tri-
band block converters.
DSP BOARD:
DSP boards or digital signal processor computer boards are central to the
implementation of high-performance industrial systems. They collect and
process digital data from many sources, and distribute the results to other
8/4/2019 Speech Recog Report - For Merge
15/78
15
elements of the system. There are three main sources of data in a real
system: signals (in and out from the DSP processor), messages to
communicate with system controllers, and messages to communicate with
other DSP boards. Important features of DSP boards include a fast
processor and good communication channels as DSP boards need to collect
and distribute data from/to many different sources.
Computer backplane or bus choices for DSP boards include PCI , ISA or
EISA, PCMCIA, PC/104, Mac PCI, SUN Sbus, PMC bus , PXI bus,
Multibus, STD bus, VME bus, VXI or MXI bus, and DT-connect I and II
interface. PCI is a local bus system designed for high-end computer
systems. ISA is a standard for I/O buses that was set back in 1984 when
IBM was the standard. PCMCIA devices (PC Cards) are credit-card-sized
peripherals predominantly used in laptop computers. PC/104 gets its name
from the desktop personal computers designed by IBM (PCs), and from thenumber of pins used to connect the cards together (104). Mac PCI is a local
bus standard developed by the Intel Corporation. Designed by Sun in 1989,
the SBus board was the standard I/O inter-connect for Sun computers,
which typically run under the Solaris or SunOS flavor of the UNIX
operating system. The PMC Bus is actually a form factor, not a bus -- it is
electrically the same as the PCI Bus, but the shape of the card and the bus
connectors are different. PXI is a superset of CompactPCI and adds timing
and triggering functions, imposes requirements for documenting
environmental tests, and establishes a standard Windows-based software
framework. STD bus is often referred to as the "Blue Collar Bus" because
of its rugged design and small size, the STD Bus was originally designed
http://www.globalspec.com/datasheets/76/areaspec/bus_pcihttp://www.globalspec.com/datasheets/76/areaspec/bus_pcihttp://www.globalspec.com/datasheets/76/areaspec/bus_pmchttp://www.globalspec.com/datasheets/76/areaspec/bus_pmchttp://www.globalspec.com/datasheets/76/areaspec/bus_pmchttp://www.globalspec.com/datasheets/76/areaspec/bus_pci8/4/2019 Speech Recog Report - For Merge
16/78
16
for factory and industrial environments. It uses 16-bit architecture. VME
bus is a 32-bit bus used in industrial, commercial and military
applications. Motorola developed the VME standard, with others, in the
late 1970s. DT-connect I and II is Data Translation's DT-Connect
Interface.
Important processor or DSP performance specifications to consider for DSP
boards include number of processors, clock speed, floating point
performance, integer performance, operations, maximum addressable
memory, and operating temperature. General features and options to
consider when looking for DSP boards include real-time clock, interrupt
controller, memory management unit, dual port memory, and direct
memory access. Communications options include serial I/O ports, parallel
I/O ports, on board A/D converter, and on board D/A converter. Some DSP
boards can accept daughter boards and some DSP boards are daughterboards. An important environmental parameter to consider when searching
for DSP boards is the operating temperature.
DSP:-
Digital signal processing algorithms typically require a large number of
mathematical operations to be performed quickly and repetitively on a set
of data. Signals (perhaps from audio or video sensors) are constantly
converted from analog to digital, manipulated digitally, and then converted
again to analog form, as diagrammed below. Many DSP applications have
constraints on latency ; that is, for the system to work, the DSP operation
http://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Latency_%28engineering%29http://en.wikipedia.org/wiki/Latency_%28engineering%29http://en.wikipedia.org/wiki/Latency_%28engineering%29http://en.wikipedia.org/wiki/Algorithm8/4/2019 Speech Recog Report - For Merge
17/78
17
must be completed within some fixed time, and deferred (or batch)
processing is not viableA simple digital processing system
Most general-purpose microprocessors and operating systems can executeDSP algorithms successfully, but are not suitable for use in portable devices
such as mobile phones and PDAs because of power supply and space
constraints. A specialized digital signal processor, however, will tend to
provide a lower-cost solution, with better performance, lower latency, and
no requirements for specialized cooling or large batteries.
The architecture of a digital signal processor is optimized specifically for
digital signal processing. Most also support some of the features as an
applications processor or microcontroller, since signal processing is rarely
the only task of a system. Some useful features for optimizing DSP
algorithms are outlined below.
SOFTWARE COMPONENTS:-
FIRMWARE:
Firmware is software that is embedded in hardware. You can update your
firmware in most GPS receivers. Firmware is the software that controls
how hardware works and responds to inputs. Its called firmware instead of
software because users generally arent supposed to play around with it. But
youre not just any old user, are you? Almost all electronic hardware
contains some form of firmware. A television remote control containsfirmware that controls what signals are sent via IR depending on what
http://en.wikipedia.org/wiki/Firmwarehttp://en.wikipedia.org/wiki/Firmwarehttp://en.wikipedia.org/wiki/Firmware8/4/2019 Speech Recog Report - For Merge
18/78
18
button is pressed. A cell phone contains a lot of firmware controlling cell
access, phone books, security, and much, much more.
A GPS contains a lot of firmware controlling many of the key functions of
the device (as shown in Figure 6-1):
Reception of satellite data
Decoding of positional information
Processing of data
Conversion of data into different formats
Interpretation and display of information
External communication with devices
Storing and managing route/waypoint data
RFPGA:- The FPGA (Field-Programmable Gate Array)
implementation of an adaptive filter for narrow band
interference excision in Global Positioning Systems is
described. The algorithm implemented is a delayed LMS
(Least Mean Squares) adaptive algorithm improved by
incorporating a leakage factor, rounding and constant
resetting of the filter weights. This was necessary as the
original adaptive algorithm had stability problems : the
filter weights did not remain fixed, and tended to drift
until they overflowed, causing the filter response to
degrade. Each model was first tested in Simulink,
implemented in VHDL (Verilog Hardware Description
Language) and then downloaded to an FPGA board forfinal testing. Experimental measurements of anti-jam
8/4/2019 Speech Recog Report - For Merge
19/78
19
margins were obtained
Single channel adaptive filtering techniques have been
shown to be an effective technique for mitigating
multiple narrowband interferences to GPS systems
(Robert, 1999, Landry et al., 1997). Since they can be
seamlessly inserted between the existing GPS antenna
and receiver.
they offer a cost effective solution that involves minimum
system disruption. However to become a fully practicalsolution the size and power demands of their hardware
implementation should be minimised. FPGAs (Field-
Programmable Gate Arrays) offer the potential for
achieving the goals of small size, weight and power
consumption and in this paper the implementation of an
adaptive filter using an FPGA device is described.
In Section 2 an experimental system, termed mini-
GISMO, is described and an overview of the system
architecture is presented. The use of interpolation and
decimation filters within the FPGA is also described.
The main adaptive algorithm implemented is the delayed
LMS (Least Mean Squares) adaptive algorithm (Haykin,
2002). As discussed in Section 3 this algorithm is well
suited to FPGA implementations. However, particularly
in the presence of strong interferences, the original
adaptive algorithm had stability problems (Sethares et al.,1986), as on convergence, the filter weights did not
8/4/2019 Speech Recog Report - For Merge
20/78
20
remain fixed, and tended to drift until they overflowed,
causing the filter response to degrade. In Section 4 it is
shown that incorporating a leakage term (Nascimento et
al.,1999) and rounding instead of truncating resulted in
the weights remaining near the optimal values. However,
this solution introduced memory effects, which produced
a second null when the interference frequency was
changed. Resetting the weights every second removed
this problem and appeared to have the least stabilityeffects, as a short pulse in the output every second didnt
cause any undesirable results in this algorithm. Also, the
bit allocations were optimised to reduce the quantisation
error. By reducing the quantisation noise power a smaller
leakage factor is required to stabilise the adaptive
algorithm resulting in a slower drift of the weight towards
DIGITAL SIGNAL :-
Digital signal processing has traditionally been done using enhanced
microprocessors. While the high volume of generic product provides a low cost
solution, the performance falls seriously short for many applications. Until recently,
the only alternatives were to develop custom hardware (typically board level or
ASIC designs), buy expensive fixed function processors (eg. an FFT chip), or use
an array of microprocessor.
Signal processing:
The antenna preamplifier of a GPS receiver generally converts the incoming signal (see Figure 1below) to a signal of a lower frequency. This intermediate frequency is obtained by mixing the
8/4/2019 Speech Recog Report - For Merge
21/78
21
incoming signal with a pure sinusoidal signal generated by the local oscillator (the quartz "clock").
The frequency of this beat frequency is the difference between the original (doppler-shifted) received
carrier frequency and the local oscillator. The intermediate or beat frequency is then processed by
the signal tracking e
NEVIGATIONAL SIGNAL PROCEESING:
Digital signal processing is the processing of digitised discrete time
sampled signals. Processing is done by general-purpose computers or by
digital circuits such as ASICs , field-programmable gate arrays or
specialized digital signal processors (DSP chips). Typical arithmetical
operations include fixed-point and floating-point , real-valued and complex-
valued, multiplication and addition. Other typical operations supported by
the hardware are circular buffers and look-up tables . Examples of
algorithms are the Fast Fourier transform (FFT), finite impulse response
(FIR) filter, Infinite impulse response (IIR) filter, and adaptive filters such
as the Wiener and Kalman filters .
http://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Circular_bufferhttp://en.wikipedia.org/wiki/Circular_bufferhttp://en.wikipedia.org/wiki/Look-up_tablehttp://en.wikipedia.org/wiki/Look-up_tablehttp://en.wikipedia.org/wiki/Fast_Fourier_transformhttp://en.wikipedia.org/wiki/Fast_Fourier_transformhttp://en.wikipedia.org/wiki/Finite_impulse_responsehttp://en.wikipedia.org/wiki/Finite_impulse_responsehttp://en.wikipedia.org/wiki/Infinite_impulse_responsehttp://en.wikipedia.org/wiki/Infinite_impulse_responsehttp://en.wikipedia.org/wiki/Adaptive_filterhttp://en.wikipedia.org/wiki/Adaptive_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Adaptive_filterhttp://en.wikipedia.org/wiki/Infinite_impulse_responsehttp://en.wikipedia.org/wiki/Finite_impulse_responsehttp://en.wikipedia.org/wiki/Fast_Fourier_transformhttp://en.wikipedia.org/wiki/Look-up_tablehttp://en.wikipedia.org/wiki/Circular_bufferhttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Computer8/4/2019 Speech Recog Report - For Merge
22/78
22
Statistical signal processing analyzing and extracting information from
signals and noise based on their stochastic properties
Audio signal processing for electrical signals representing sound,such as speech or music
Speech signal processing for processing and interpreting spoken
words
Image processing in digital cameras, computers, and various
imaging systems Video processing for interpreting moving pictures
Array processing for processing signals from arrays of sensors
Time-frequency signal processing for processing non-stationary
signals [3]
Filtering used in many fields to process signals
Software based receiver:
Global Navigation Satellite System has become a necessity tool for navigation and positioning
in both civilian and military field and applications. Global Positioning System (GPS) is a
satellite-based navigation system. It is based on the computation of range from the receiver to
multiple satellites by multiplying the time delay that a GPS signal needs to travel from the
satellites to the receiver by velocity of light. GPS has already been used widely both in civilian
and military community for positioning, navigation, timing and other position related
applications. The system has already proved its reliability, availability and good accuracy for
many applications. Due to this nature, in future, other countries like Europe are going to launch
new satellite-based navigation system called Galileo. There is also a proposal to launch Quasi
Zenith Satellite System for navigation in Japan.
It is necessary to simulate and analyze new signal structures for the development of new
satellite-based navigation systems. In the research community, many researchers come outwith
http://en.wikipedia.org/wiki/Statistical_signal_processinghttp://en.wikipedia.org/wiki/Statistical_signal_processinghttp://en.wikipedia.org/wiki/Audio_signal_processinghttp://en.wikipedia.org/wiki/Audio_signal_processinghttp://en.wikipedia.org/wiki/Speech_signal_processinghttp://en.wikipedia.org/wiki/Speech_signal_processinghttp://en.wikipedia.org/wiki/Image_processinghttp://en.wikipedia.org/wiki/Image_processinghttp://en.wikipedia.org/wiki/Video_processinghttp://en.wikipedia.org/wiki/Video_processinghttp://en.wikipedia.org/wiki/Array_processinghttp://en.wikipedia.org/wiki/Array_processinghttp://en.wikipedia.org/wiki/Time-frequency_analysishttp://en.wikipedia.org/wiki/Time-frequency_analysishttp://en.wikipedia.org/wiki/Signal_processing#cite_note-2http://en.wikipedia.org/wiki/Signal_processing#cite_note-2http://en.wikipedia.org/wiki/Filter_%28signal_processing%29http://en.wikipedia.org/wiki/Filter_%28signal_processing%29http://en.wikipedia.org/wiki/Filter_%28signal_processing%29http://en.wikipedia.org/wiki/Signal_processing#cite_note-2http://en.wikipedia.org/wiki/Time-frequency_analysishttp://en.wikipedia.org/wiki/Array_processinghttp://en.wikipedia.org/wiki/Video_processinghttp://en.wikipedia.org/wiki/Image_processinghttp://en.wikipedia.org/wiki/Speech_signal_processinghttp://en.wikipedia.org/wiki/Audio_signal_processinghttp://en.wikipedia.org/wiki/Statistical_signal_processing8/4/2019 Speech Recog Report - For Merge
23/78
23
new ideas and algorithms for better accuracy of GPS by mitigating or minimizing various types
of errors and effects like multipath. However, it is quite difficult to implement the user
developed algorithms in the current hardware-based GPS receivers. The hardware-based GPS
receivers contain ASICs that provide the least user flexibility. Thus, it is necessary to have
Software-based GPS receivers, at least in the research community for easy and quickimplementation, simulation and analysis of algorithms, parameters and threshold values. Since,
the CPU processing power is increasing with reduced cost, it is now possible to build real-time
software-based GPS receivers at least for static or low dynamic environments. As predicted by
Moors Law, the CPU power is increasing and we hope that this trend will continue in future as
well and hence, it will be possible to develop real-time all environment software-based GPS
receivers. In this paper, we briefly introduce the architecture of a SGR, signal processing
technique and give some examples of simulation using SGR.
2 SOFTWARE-BASED GPS RECEIVER ARCHITECTURE
The architecture of a conventional GPS receiver is shown in Figure 1. It consists of RF front-
end and signal processor that are all built upon IC chips. The outputs of the signal processor
are either displayed directly on the receiver display unit or fed to a PC for further processing or
integration with other devices. Since, the signal processing is all done inside the hardware
chips,users have limited access to change the parameters or install new algorithms. Figure 2 shows
architecture of a software-based GPS receiver (SGR). It consists of a RF front-end device,
which is still a hardware component. The rest of the signal processing is done using high level
programming language like C/C++, Matlab etc. If we compare Figure 1 and Figure 2, the only
difference we see is the replacement of hardware components by software tools for signal
processing. We still need RF front-end since the present capacity of CPU is still not able to
process the signal directly from the antenna at 1.5GHz. Figure 3 shows the merits and demerits
of using hardware-based and software-based receiver. A hardware-based receiver is fastest in
signal processing however, it has the least level of flexibility, where as a software-based
receiver has the highest level of flexibility but is the slowest in processing speed. There are
products using FPGA-based receivers which is the compromise between the two.
Processing
8/4/2019 Speech Recog Report - For Merge
24/78
24
GPS SIGNAL PROCESSING:
8/4/2019 Speech Recog Report - For Merge
25/78
25
L1 band GPS signal is transmitted at 1.5 Ghz and since the receiver can not process the signal
directly at this frequency, the RF front-end device down converts from 1.5Ghz to a much lower
frequency of about 4Mhz. This frequency is called Intermediate Frequency (IF). During this
conversion process, the signal is also digitized (A/D conversion) at 1bit, 2bit or higher rate and
sampled at some frequency, e.g. 16Mhz. We use the down-converted signal for further
processing. The first task of signal processing is to identify the visible satellites by finding the
satellite code phase and Doppler frequency. The code phase provides the beginning of C/A
code.
Since, the satellites are moving all the time (and probably the receiver may also move) we
always have some Doppler frequency. The rough estimation process of code phase and
Doppler frequency is called acquisition. Basically, for acquisition, we generate C/A code for the
satellite and modulate with the carrier wave. This receiver generated signal is then correlatedwith incoming signal and the correlation value is evaluated to make decision whether a satellite
visible. If we think that the satellite is visible, then the code phase value and Doppler frequency
is noted. Once, we complete acquisition successfully, we know the satellites that are visible at
that time.
In the next step, we track the visible satellites continuously for fine tuning of the code phase and
Doppler frequency. This process is called tracking. The tracking process removes the C/A code
and carrier wave from the GPS signal and hence the remaining signal represents navigation
data and some noise. Thus, from navigation output, we can extract navigation data parameters
which are necessary to compute pseudorange from the receiver to satellite. Please refer, [2] for
details on GPS signal processing. Figure 4 (a) shows raw GPS data collected from antenna and
downconverted
to IF. This data just looks like noise and no information can be known unless we
perform acquisition and tracking on the data. This is due to the fact that the GPS signal level is
below the noise level or the signal is weaker than the noise. Figure 4 (b) shows the result of
acquisition from raw data shown in Figure 4 (a). The acquisition output shows the code phase(beginning point of C/A code) and Doppler frequency. Figure 5 shows tracking results. The
8/4/2019 Speech Recog Report - For Merge
26/78
26
tracking result extracts navigation data bits as shown in Figure 6, which are simply the
sequence
8/4/2019 Speech Recog Report - For Merge
27/78
27
SGR AS RESEARCH AND SIMULATION TOOL
We mentioned earlier that SGR has much flexibility compared to
conventional receiver. We will discuss and give some examples how
these flexibilities of SGR are used to extract information that are
otherwise not possible in conventional GPS receiver. Figure 7 shows
8/4/2019 Speech Recog Report - For Merge
28/78
28
SGR AS RESEARCH AND SIMULATION TOOL
some of the fundamental parameters of signal processing in SGR. IF
frequency and sampling frequency are fixed for a particular front-enddevice. By changing these two values, we can use
the same software tool for different types of frontend device that
acquire GPS signal from the
antenna. Below we will discuss some of the flexibilities point by
point.
8/4/2019 Speech Recog Report - For Merge
29/78
29
Weak Signal Processing:
The Doppler frequency search step, code period acquisition integration time,noise bandwidth code period tracking integration time depends on the signal
quality. If the signal level is normal, we can use 1000Hz Doppler frequency
step and 1ms code period integration time for acquisition.
However, if the signal is weak, and then we need Figure 7: Basic parameters
that can be changed by a user in SGR for various types of signal processing
and simulation to reduce the Doppler frequency search step and increase the
code period integration time in acquisition. For example, if we integrate raw
data for 3ms for acquisition then we need to reduce the Doppler frequency
search step to 300Hz. This will increase processing speed but help us in
detecting weak signals. Also, we need to increase the integration time in
tracking loop. This type of signal processing by changing the parameter values
is not possible in conventional GPS receiver. Figure 8 shows an example for
increase in integration time from 1ms to 3ms. When the integration time is
1ms, the correlation peak is not clear enough to make a decision for satellite
visibility. But, when the integration time is increased to 3ms, we can see a very
8/4/2019 Speech Recog Report - For Merge
30/78
30
clear correlation peak and we can make a decision that a particular satellite is
now detected. Figure 8: (a) Signal acquisition using 1ms integration time. The
result is not so clear with multiple peaks. (b) Signal acquisition using 3ms
integration time with the same data as in (a). Now, the correlation peak is quiteclear and a decision can be made regarding visibility of satellite.
Multipath Mitigation Technique
In spite of continuing improvements in GPS receivers and antenna technology,
multipath signal has remained a major source of error in GPS positioning. Inorder to minimize the error due to multipath, we need to understand the
multipath behaviour and corresponding signal characteristics. In order to
understand the effect of multipath we can analyze the signal by using
various types of correlators (narrow, wide etc) by defining chip delay (listed in
Figure 7) between early and late chips. We can compute the correlation peak
for every code period. A correlation peak will appear as a perfect triangle had
8/4/2019 Speech Recog Report - For Merge
31/78
31
there been no effect from multipath. Due to multipath, the two sides of the
triangle will be neither symmetrical nor straight lines. The shape
and amplitude of the triangle is deformed by the amount of multipath and
some other noise. Thus by analysing the correlation peak (triangularshape) we
can estimate the amount of multipathand hence develop a technique
to minimize or mitigate the multipath. In this regard, we are
conducting research using left hand and right hand circular polarized
GPS antenna to analyze how the reflected signal (which accounts
formultipath) affects a correlation peak. Refer [1] for details of this
experiment. Figure 9 shows a correlation peak obtained by
processing a raw GPS signal shown in Figure 4. Correlation peak
computed from raw GPS signal for 0.5 chip delay. The peak shape is
not a perfect triangular due to effect
8/4/2019 Speech Recog Report - For Merge
32/78
32
from multipath
Remote Sensing using GPS Signal :
Recently, GPS signals have been used for remote sensing purpose.
GPS signals are transmitted at 1.2Ghz and 1.5Ghz in two different
bands. This is similar to microwave remote sensing. GPS signals are
transmitted with right hand circular polarization. When, this signal is
reflected by some object the polarization may change from righthand to left hand and vice versa. Thus by observing the reflected
signal together with two different types of antennas with right hand
and left hand polarization, we can predict the object type that reflects
the GPS signal. Using this technique, soil moisture and wind velocity
has been estimated. Refer [3] for details on this research. In order to
conduct this type of analysis, we need software-based receiver so
that we can process the received signal with different parameter
values using our own algorithms. The reflected signals are much
weaker than direct signal and hence a conventional receiver can not
be used. Also, we need to compute many intermediate values like
shape of the correlation peak and it s amplitude rather than theposition of the GPS antenna itself. This is possible only in software-based receivers.Besides these analysis and simulation listed above,we need software-based receiver for analyzing noise andinterference (jamming), simulate new codes, limitation of navigationdata length and many other things. In current GPS signal, thenavigation data length is limited to20ms. This impose a restriction ondata integration beyond 20ms during the tracking process.
8/4/2019 Speech Recog Report - For Merge
33/78
33
However, for tracking very weak signal, we do need to integratelonger data period. Thus weneed to see what will happen if wechange the navigation data length from 20ms to somethingelse in our new design. On the other hand we can also have a data
less component of the signal in one of the phases of the signal whichis now implemented in new forthcoming GPS signals.This assists the receiver in processing weak signals and hencemake the receiver capable of indoor positioning. All these can besimulated if we have software-based receivers. In SGR, wecan generate different types of signals for interference analysis. Thiswill help us how different types of signal with different level ofstrength affect GPS signal processing. For example, we can
simulate the effect of a TV signal on GPS or we can analyze theeffect of other GNSS signals on GPS or vice versa. These are againpossible in software-based .
part -11
8/4/2019 Speech Recog Report - For Merge
34/78
34
Flow chart of gps working:
8/4/2019 Speech Recog Report - For Merge
35/78
35
Models for gps:
8/4/2019 Speech Recog Report - For Merge
36/78
36
SIGNAL ANALYSIS TOOL:
8/4/2019 Speech Recog Report - For Merge
37/78
37
Some speech recognizers support the ability todynamically adjust to the voice of a speaker andoften the ability to store adaptation data for that voicefor future use. The speaker data may also includelists of words more often spoken by the user
Speech Recognizer configuration:
It holds some standard setting and functions forrecognizer.
Lexicons :Grammar holds the pronunciation of the wordsreferenced by grammar.
Other speech Processing Capabilities :Grammar has a capability of recognizing language,speaker identification and verification. Thesecapabilities may be associated with the recognizer.
1.7.3 Speech Recognizer :-
It is software which performs the tasks involved in speechrecognition. The speech recognizer software may be available as a freeproduct or may have to buy. This software varies from platform to platform.
e.g., For windows:
Dragon Natural Speaking, Microsoft Speech Recognition Voice Assist for window from creative labs.
For LINUX:
IN CUBE Pure Speech
8/4/2019 Speech Recog Report - For Merge
38/78
38
Myers HMM Software
For integrated circuit and dedicated hardware:
Speech commander Voice control system Recognition
1.8 Applications
Speech recognition is emerging technology in computer science. It
has some weakness but despite of that it is used in many areas to solve
problems. These are listed below:
Playing back simple information: In many call centres customers
require quick information and do not actually want to speak to
like operator. So speech recognition is useful to provide such
quick information.
Call steering: By introducing speech recognition, you can allowcallers to cho ose a self -service route or alternatively say what
they want and are directed to the correct department or
individual.
Defence uses: Speech recognition is also used in defence
applications. It is used to quickly perform some action by
8/4/2019 Speech Recog Report - For Merge
39/78
39
responding to voice rather than pressing the buttons or other
input methods.
Artificial intelligence: Used in many applications of artificial
intelligence and is most useful in robotics to interact with robots
and machines. In fact speech recognition is a part of artificial
intelligence.
Hands Free computing: Speech recognition is used for handsfree computing because it can provide a user interface in which
user interact with computer by dictation.
Language learning: The person who wants to learn a new
language can use speech recognition system.
People with disabilities: People with physical disabilities can
benefit from speech recognition system. It is especially useful
for computer who has difficulties using their hands or paralysed
people.
Court reporting: For replacing the court reporter by computer.
8/4/2019 Speech Recog Report - For Merge
40/78
40
1.9 Feasibility Study
Economical feasibility :
To design a speech recognition system we require following things:-
1. Microphone(Rs.600- 5000)
2. Sound card(Rs.1200 - 25000)3. Computer(min. 400 MHz processor Rs.15000 or above)
4. Good programmers.
So it makes Rs.16800 + programmers pay hence it is feasibleeconomically.
Social Feasibility:
We can make computer very decent by adding the vocabulary
that is socially feasible.
Technical feasibility:
The microphones available today are sufficient
Processing speed of todays processors is more than enough
8/4/2019 Speech Recog Report - For Merge
41/78
41
The sound cards available can perform A/D conversion very
efficiently
PART -II
System Analysis
8/4/2019 Speech Recog Report - For Merge
42/78
42
2.1 Components of speech recognition system:
FIG 2.1 : components of speech recognition system
Speech representation can be done by:
representation,
modelling and
searching
Here three models are used to recognize speech. One of the three model is
used to match correct word is used. These models are:-
Acoustic Model Lexical Models
Input speech Output
8/4/2019 Speech Recog Report - For Merge
43/78
43
Language Models
2.1.1 Acoustic Model:
In this type of model we have a stored pattern of representation for each
word. This technique uses this pattern to match with the pattern that is
obtained after processing. This technique selects that pattern which has
minimum acoustical difference from stored pattern. Every processed
pattern has a probability associated with it such that it can occur in speech.The word which has maximum probability is chosen in speech. This type of
model uses pattern matching for recognizing speech.
2.1.2 Lexical Model:-
Lexical means related to words or dictionary. It is a neural
network based approach to model the lexicon of the language with a limited
amount training data. The training data is necessarily a database of a
language with the phone set of the language. The neural network learns
how the phones of the language vary with different instances of context.
The trained network is capable of recognizing the pronunciation of a word
given its native phonetic composition.
8/4/2019 Speech Recog Report - For Merge
44/78
44
Example:
Consider the following words:
START S-T-AA-R-TD
STARTING S-T-AA-R-DX-IX-NG
STARTED S-T-AA-R-DX-IX-DD
STARTUP S-T-AA-R-T-AX-PD
START-UP S-T-AA-R-T-AX-PD
FIG 2.2 Lexical Tree Structure of above words
8/4/2019 Speech Recog Report - For Merge
45/78
45
8/4/2019 Speech Recog Report - For Merge
46/78
46
2.1.3 Language Model :-
The language model attempts to convey the behaviour of the language. It
aims to predict the occurrence of specific word sequences possible in the
language. From the perspective of the recognition system, the language
model helps narrow down the search space for a valid combination of
words. Most Speech Recognition systems use the stochastic language
models. SLMs use the N-gram LM where it is assumed that the probability
of occurrence of a word is dependent only on the past N-1 words.
Language Models help a speech recognizer figure out how likely a word
sequence is, independent of the acoustics. A lot of candidates can be
eliminated and it is possible to give other words higher probabilities. This
lets the recognizer make the right guess when two different sentences sound
the same.
For example:
Its fun to r ecognize speech?
Its fun to wreck a nice beach?
Another type of language model is Hidden Markov Model.
8/4/2019 Speech Recog Report - For Merge
47/78
47
2.2Flow chart of the System
Working of the Speech Recognition System
FIG 2.3 Flow Chart of the System
8/4/2019 Speech Recog Report - For Merge
48/78
48
In the matching and comparison step, we may obtain two or more than two
units of a words, phone or utterance depending upon the approach in use.
These matched units are stored in memory and various models are applied
to select appropriate unit, which forms a recognized output. Depending
upon the result of the matching and comparison unit, corresponding action
can be performed.
8/4/2019 Speech Recog Report - For Merge
49/78
49
1.3 Data Flow Diagrams For Speech Recognition:
2.3 .1 Level 0 DFD:
FIG 2.4 : Level 0 Data Flow diagram for speech Recognition
8/4/2019 Speech Recog Report - For Merge
50/78
50
2.3.2 Level 1 DFD:
FIG 2.5 : Level 1 Data Flow Diagram for Speech Recognition
8/4/2019 Speech Recog Report - For Merge
51/78
51
2.3.3 Level 2 DFD:
8/4/2019 Speech Recog Report - For Merge
52/78
52
FIG 2.6 : Level 2 Data Flow Diagram for Pattern matching
8/4/2019 Speech Recog Report - For Merge
53/78
53
2.4 Training data types:-
The speech grammar can be designed in different ways and these vary on
the basis of the size of the grammar and the accuracy by which you want
your speech to be recognized. These models are:-
1. Whole-word models :-
Whole words fea tures are stored in the grammar, so while extracting
the features of the sound signal the whole words feature are
calculated and compared. This type of model is suitable for small
vocabulary recognition. With whole word model high accuracy rate
can be attained.
2. Phone models :-
In this small set of speech sounds that can be distinguished by the
speakers of a particular language are used for speech grammar. This is
suitable for large vocabulary recognition. The accuracy rate for this
model is very low.
3. Syllable models :-
In this model the units larger then phone are used to do feature extraction.
This model can be used for large language grammar with high accuracy
rate.
8/4/2019 Speech Recog Report - For Merge
54/78
54
PART-III
System Design
8/4/2019 Speech Recog Report - For Merge
55/78
55
3.1 Interface Design :
This project report is a case study of existing speech recognition system. In
this project we have taken the reference of Dragon Natural Speaking, a
software package for speech recognition in windows. The latest version of
this software is version 10.0. Which has following interface design:-
Icon Design:
FIG 3.1 : Startup Interface Design:
FIG 3.2 : Training Interface Design:
8/4/2019 Speech Recog Report - For Merge
56/78
56
3.2 Using the interface of Dragon Naturally speaking :-
When you start Dragon Naturally Speaking we have to perform followingsteps:-
FIG 3.3 : Create a user profile for a user. The interface is as shown:
After creating a user if you have selected training, you have to dictate
few text on your specified microphones.
This will help the software to recognize user.
After this, the software prompt user to check your microphones by
making user dictates little text.
8/4/2019 Speech Recog Report - For Merge
57/78
57
After this the software will create some user files and prepare the
software for first use.
8/4/2019 Speech Recog Report - For Merge
58/78
58
3.3 Utilities of Dragon Naturally Speaking software:
3.3.1 Tasks Performed by Dragon Naturally Speaking software :-
Dragon Naturally Speaking software performs following tasks:-
Speech to text conversion
Have some inbuilt commands which perform some tasks
Dictation
3.3.2 Additional tools in Dragon Naturally Speaking software:-
- Add a new user
- Add a new command
- Managing Users( creating deleting, other changes)
- Train a User
This software is 97% accurate. That is why it is mostly used speech
recognition package.
3.4 Technical Features of Dragon Natural ly Speaking:
- Sampling rate 512 samples(16 Khz Sampling Rate)
- 30 ms of window for frequency domain analysis.
- It is programmed in C language and Uses Hidden Markov Model and
viterbi search.
- It contains following Basic files:
-mdef.c definition of basic phones on the basis of HMM in
form of matrix.
8/4/2019 Speech Recog Report - For Merge
59/78
59
-dict.c It is a pronunciation dictionary
-lextree.c Lexical tree Search
-hmm.h contains implementation of HMM using Viterbi
Search.
PART-IV
Appendices
8/4/2019 Speech Recog Report - For Merge
60/78
60
Appendix A
Hidden Markov Model (HMM):-
HMM is a statistical model in which the system being modelled is
assumed to be a Markov process with unobserved state. Hidden
Markov models are especially known for their application in temporal
pattern recognition such as speech, handwriting, gesturerecognition.so input and output of a HMM will be:
Input: A sequence of feature vectors.
Output: Words with highest probability being spoken.
There are following four things in a HMM:-
States (words, phones or syllables)
State transition probabilities
Symbol emission probabilities
Observations (features of the signal)
In HMM we will find the most probable state (words) on the basis of the
observations (audio input).
8/4/2019 Speech Recog Report - For Merge
61/78
61
FIG A.1Probabilistic parameters of a hidden Markov model (example)
x states y possible observations
a state transition probabilities
b output probabilities
The diagram below shows the general architecture of an instantiated HMM.Each oval shape represents a random variable that can adopt any of a
8/4/2019 Speech Recog Report - For Merge
62/78
62
number of values. The random variable x(t ) is the hidden state at
time t (with the model from the above diagram x(t ) { x1, x2, x3 } ). The
random variable y(t ) is the observation at time t ( y(t ) { y1, y2, y3, y4 }). The
arrows in the diagram (often called a trellis diagram) denote conditional
dependencies.
From the diagram, it is clear that the conditional probability distribution of
the hidden variable x(t ) at time t , given the values of the hidden variable x at
all times, depends only on the value of the hidden variable x(t 1). This is
called the Markov property. Similarly, the value of the observed
variable y(t ) only depends on the value of the hidden variable x(t ) (both at
time t ).
FIG A.2
There are three main functions in an HMM
1. Evaluation :-
Given the observation sequence O and the model , how
do we efficiently compute P(O| ), the probability of the observation
sequence, given the mode:
Enumerate all possible state sequences S of length T Sum up all probabilities of these sequences
http://en.wikipedia.org/wiki/Trellis_(graph)http://en.wikipedia.org/wiki/Conditional_probability_distributionhttp://en.wikipedia.org/wiki/Markov_propertyhttp://en.wikipedia.org/wiki/File:Hmm_temporal_bayesian_net.svghttp://en.wikipedia.org/wiki/Markov_propertyhttp://en.wikipedia.org/wiki/Conditional_probability_distributionhttp://en.wikipedia.org/wiki/Trellis_(graph)8/4/2019 Speech Recog Report - For Merge
63/78
63
Probability of path S (calculate for all paths):
State sequence probability.
2. Decode :-
Finding the sequence of hidden states that most probably
generated an observed sequence
Given the parameters of the model and a particular output
sequence, find the state sequence that is most likely to have
generated that output sequence.
This requires finding a maximum over all possible state
sequences
3. Learning:-
Adjust the model parameter to maximize the joint
probability
First make an initial guess of the parameters (which may be
entirely wrong)
Refine it by assessing its worth, attempt to reduce provoked
errors when fitted to the given data
Feed sample speech data along with phonemes of spoken words
8/4/2019 Speech Recog Report - For Merge
64/78
64
8/4/2019 Speech Recog Report - For Merge
65/78
65
Appendics B
Digitizing the Analog signal:
The must be in digital form so that computer can understand it. A signalmust be converted to analog signal by using following steps:
The bandlimited signal is first sampled, converting the analog signal
into a discrete time continuous-amplitude signal.
The amplitude of each sample is quantised into 2 n levels, where n is
the number of bits used to represent a sample.
The discrete amplitude levels are represented or encoded into distinct
binary words each of n bits.
This process is shown in following figure:
FIG B.1 : Block Diagram for digitizing an analog signal
8/4/2019 Speech Recog Report - For Merge
66/78
66
The process of converting a continuous-time continuous-signal to
discrete- time continuous-signal is called Sampling.
The process of converting a discrete-time continuous-signal to discrete-
time discrete-signal is called Quantization.
Sampling is done by multiplying the input signal with a periodic train of
unit amplitude as shown:
FIG B.2 : Sampling Analog Signal
This sampling is carried out with sampling frequency 2F M , where F M is the
maximum frequency component of input signal. This signal can be
accurately reconstructed at receiver end.
8/4/2019 Speech Recog Report - For Merge
67/78
67
Quantisation of signal is done by using a step size which is taken very
small and increased whenever signal value increases as shown:
FIG B.3 Quantization of Signal
8/4/2019 Speech Recog Report - For Merge
68/78
68
Appendics C
History
The speech recognitions foundation was with the turning model given by
Alan Turning (1950). Turning test was to know whether the computer can
think or not? In this there was three participants one computer and other
two human. Each of participants was separated from each other by a wall
and the will talk to each other. One of the human participants was aninterrogator. And the remaining two will prove the he is human and other is
not human. This test led many developers to do research on the speech
recognition.
AT&T Bell Laboratories developed a primitive device that could recognize
speech in the 1950s.
In the 1960s, researchers turned their focus towards a series of smaller
goals that would aid in developing the larger speech recognition system. As
a first step, developers created a device that would use discrete speech.
In the 1970s, continuous speech recognition, which does not require the
user to pause between words, began. This technology became functional
during the 1980s and is still being developed and refined today.
Technological advances have made speech recognition software anddevices more functional and user friendly, today speech recognition has
8/4/2019 Speech Recog Report - For Merge
69/78
69
accuracy more than 90 %. The error rates of various types of recognition
are:
FIG C.1 : errors rate of different speech recognition
8/4/2019 Speech Recog Report - For Merge
70/78
70
Conclusion
In this project we discussed basic concept that are used in speech
recognition. Speech recognition engines work in a similar manner. There
are following things that can be concluded from the study:-
The Knowledge of language and linguistics for that language is
required.
Most of the Speech Recognition packages use Hidden Markov Model
in implementing speech recognition.
Speech recognized can be represented in many ways. e.g., speech to
text conversion, speech production, language learning, information
extraction, etc.
This project allows us to differentiate between the accuracy that can
be achieved by Appling different models.
We can specify the best hardware and software requirement for
Speech Recognition. We can us efficiently use Dragon Naturally
Speaking software with following specification:
Intel Pentium IV processor of 1.5 GHz or above speed.
1GB of RAM
WINDOWS XP or above version of windows.
Creative Microphones having ambient noise removingtechnique.
8/4/2019 Speech Recog Report - For Merge
71/78
71
Intel Sound Card with 16 kHz of Sample rate and signal to
noise ratio of 100 dB.
8/4/2019 Speech Recog Report - For Merge
72/78
72
Bibliography
[1]. Speech and Language Processing-2 nd edition by Jurafsky & Martin
[2].Schaums ouTlines Discrete Mathematics 3 rd edition by Seymour
Lipschutz and Marc Lipson
[3].Principles of digital communication by Taub and Schilling.
[4] http://www.faqs.org/docs/Linux-HOWTO/Speech-Recognition-
HOWTO.html
[5].http://en.wikipedia.org/wiki/Speech_recognition
[6].http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html
[7].www.ee.ic.ac.uk/hp/staff/pnaylor/notes/recog.pdf
8/4/2019 Speech Recog Report - For Merge
73/78
73
8/4/2019 Speech Recog Report - For Merge
74/78
74
Index
A
acoustic model 22
artificial intelligence 17
B
Bibliography 43
C
call steering 17
Comparison 10
court reporting 18
D
defence uses 17
digitizing 40
discourse analysis 5
dragon natural
speaking 32 34
E
economical 19
8/4/2019 Speech Recog Report - For Merge
75/78
75
feasibility
F
feature extraction 8
Filtering 10
Framing 8
H
History 43
hidden markov
model 36-39
K
knowledge based
approach 10
L
language learning 18
language model 23
lexical model 22
Lexicon 16
M
10
8/4/2019 Speech Recog Report - For Merge
76/78
76
matching
Memory 14
Microphone 13
Morphology 5
N
neural network
approach 11
P
pattern matching
approach 10
people with
disabilities 18
phone model 30
phonetics 5
Phonology 5
play back of
information 17
Pragmatics 5
pre filtering 7
probabilistic model 11
processor 14
Q
8/4/2019 Speech Recog Report - For Merge
77/78
77
quantization 41
S
sampling 41
semantics 5
social feasibility 19
sound card 13
spectral features 9
speech grammar 15
speech recognizer 16
syllable model 30
Syntax 5
T
technical feasibility 19
temporal features 9
W
whole word model 30
windowing 8
word detection 7
8/4/2019 Speech Recog Report - For Merge
78/78
Recommended