80
1. INTRODUCTION TO PROJECT Keyword search over a large amount of data is an important operation in a wide range of domains. Felipe et al. has recently extended its study to spatial databases, where keyword search becomes a fundamental building block for an increasing number of real-world applications, and proposed the IR2-Tree. A main limitation of the IR2-Tree is that it only supports exact keyword search. In practice, keyword search for retrieving approximate string matches is required. Since exact match is a special case of approximate string match, it is clear that keyword search by approximate string matches has a much larger pool of applications. Approximate string search is necessary when users have a fuzzy search condition, or a spelling error when submitting the query, or the strings in the database contain some degree of uncertainty or error. In the context of spatial databases, approximate string search could be combined with any type of spatial queries. In this work, we focus on range queries and dub such queries as Spatial Approximate String (SAS) queries. An example in the Euclidean space, depicting a common scenario in location-based services: find all objects within a spatial range r (specified by a rectangular area) that have a description that is similar to “theatre”. We denote SAS queries in Euclidean space as (ESAS) queries. Similarly, Figure 2 extends SAS queries to road networks (referred as RSAS queries). Given a query point 1

Spatial approximate string search Doc

Embed Size (px)

DESCRIPTION

Spatial approximate string search

Citation preview

Page 1: Spatial approximate string search Doc

1. INTRODUCTION TO PROJECT

Keyword search over a large amount of data is an important operation in a wide range of

domains. Felipe et al. has recently extended its study to spatial databases, where keyword

search becomes a fundamental building block for an increasing number of real-world

applications, and proposed the IR2-Tree. A main limitation of the IR2-Tree is that it only

supports exact keyword search. In practice, keyword search for retrieving approximate string

matches is required. Since exact match is a special case of approximate string match, it is

clear that keyword search by approximate string matches has a much larger pool of

applications. Approximate string search is necessary when users have a fuzzy search

condition, or a spelling error when submitting the query, or the strings in the database contain

some degree of uncertainty or error. In the context of spatial databases, approximate string

search could be combined with any type of spatial queries. In this work, we focus on range

queries and dub such queries as Spatial Approximate String (SAS) queries. An example in the

Euclidean space, depicting a common scenario in location-based services: find all objects

within a spatial range r (specified by a rectangular area) that have a description that is similar

to “theatre”. We denote SAS queries in Euclidean space as (ESAS) queries. Similarly, Figure

2 extends SAS queries to road networks (referred as RSAS queries). Given a query point

q and a network distance r on a road network, we want to retrieve all objects within distance r

to q and with the description similar to “theatre”, where the distance between two points is

the length of their shortest path.

LITERATURE SURVEY

2.1 Introduction

2.1.1Theoretical Background

The main aim of the project is to privacy as Spatial Approximate String with string

searching use the compression sensing technique .security is provided social networking by

the data is transmitted from User to admin.

2.2 Technical Background:

JSP:

1

Page 2: Spatial approximate string search Doc

Java server Pages is a simple, yet powerful technology for creating and maintaining dynamic-

content web pages. Based on the Java programming language, Java Server Pages offers

proven portability, open standards, and mature re-usable component model .The Java Server

Pages architecture enables the separation of content generation from content presentation.

This separation not eases maintenance headaches; it also allows web team members to focus

on their areas of expertise. Now, web page designer can concentrate on layout, and web

application designers on programming, with minimal concern about impacting each other’s

work.

Introduction

Jsp technology enables you to mix regular static html with dynamically generated content

from servlets. Separating the static html from the dynamic content provides a number of benefits over

servlets alone.

JSP compared to Asp

Jsp and asp are fairly similar in the functionality that they provide. Jsp may have slightly

higher learning curve. Both allow embedded code in an html page, session variables Platform i.e., NT,

JSP can operate on any platform that conforms to the J2EE specification. Jsp allow component reuse

by using JavaBeans and Ejbs. Asp provides the use of Com/activeX controls.

JSP compared to servlets

A servlet is java class that provides special server side service. It is hard to write

HTML code in servlets. You need to have lots of println statement to generate HTML.

Description

JSP looks like html, but they get compiled into java servlets the first time they are invoked. The

resulting servlet is a combination of the html from the Jsp file and embedded dynamic content

specified by the new tags. That is not to say that Jsp must contain html. Some of them contain only

java code; this is particularly useful when the Jsp is responsible for a particular task like maintaining

application flow.

Everything in Jsp can be broken into 2 categories

1. Elements that are processed on the server.

2. Template data or everything other than elements that the engine processing the Jsp ignores.

JSP Architecture

2

Page 3: Spatial approximate string search Doc

Jsp are built on top of sun’s servlet technology. Jsp is essentially an html page with special Jsp tags embedded. These Jsp tags can

contain java code. The Jsp file extension is .Jsp rather than .htm or .html. The Jsp engine parses the .Jsp and creates a java servlet source file.

It then compiles the source file into a class file; this is done the first time and this why the Jsp is probably slower the first time it is accessed.

Any time after this, the special compiled servlet is executed and is therefore returns faster.

Steps Required For a JSP Request

.The user goes to a web site made using Jsp. The user goes to a Jsp page. The web browser

makes the request via the internet.

The JSP request gets sent to the web server

The web server recognizes that the file required is special (.jsp), therefore passes

the JSP file to the JSP servlet engine.

.If the JSP file has been called the first time, the jsp file is parsed, otherwise go to step 7

.The next step is to generate a special servlet form the jsp file. The entire html required is

converted to println statements.

.The servlet source code is compiled into a class.

.The servlet is instantiated, calling the init and service methods

.Html from the servlet output is sent via the internet

.Html results are diaplayed on the user’s web browser.

Servlets are server-side java programs that can be deployed on a web server. The servlet interface

provides the basic frame work for coding servlets. Java Server Pages, SQL, HTML Forms and

Databases.

This section examines how to communicate with a database from Java. We have already seen

how to interface an HTML Form and a JSP. Now e has to see how that JSP can talk to a database.

The objectives of this section are to understand how to:

1.Administratively register databases.

2.Connect a JSP to an Access database.

3.Insert records in a database using JSP.

4.Inserting data from HTML Form in a database using JSP.

5.Delete Records from Database based on Criteria from HTML Form.

6.Retrieve data from a database using JSP – result sets.

7.Apply SQL operations like sort, create table, remove table, delete, and Access-based arithmetic

functions.

3

Page 4: Spatial approximate string search Doc

JSP Declarations

Used to define page level variables and methods are placed within the <%!and %> tags and

always end with a semicolon.

Example:

<%!

Int I=0;

Int j=0;

Int z=0 ;%>

JSP Scriptlets

Consists of valid code snippets enclosed within the <% and %> JSP tags.

Example:

To accept the user name and display the name 10 times:

<%@ page import=”java.util.*” %>

<%@ page import=”java” %>

<HTML><BODY>

<%out.println (“<HTML>”);

out.println (“<BODY>”);

out.println (“<BODY>”);

out.println (“<HTML>”);

%>

</BODY></HTML>

JSP Expressions

Used to directly insert values into the output

Example: < %=msg %>

JSP Implicit Objects

Are predefined variables that can be included in JSP expressions and Scriplets can be Created.

1. Implicitly by using directives.

2. Explicitly by using standard actions.

4

Page 5: Spatial approximate string search Doc

3. Directly by declaring objects with in scriptlets.

Include variables such as

PAGE: To represent the current instance of the JSP page.

REQUEST: To represent an object of HTTP Servlet request used to retrieve the request data.

RESPONSE: To represent an object of HTTP servlet response used to write the HTML response

output.

JSP Actions: <jsp: getProperty>: To retrieve the property of the specified bean and direct it as

output. Attributes used are: Name and Property.

<jsp: setProperty>: To set the property of specified bean.

Attributes used are: Name, Property, Value and Param.

<jsp: forward> : To forward a request to a different page.

Attribute used is Page.

<jsp: param>: Used as a sub attribute with jsp: include and jsp: forward to pass additional request

parameters.

Attributes used are Name and Value.

<jsp: include> : To insert a file into a particular jsp page.

Attributes used are Page and Flush.

Java Script

Java script is a general purpose, prototype based, object oriented scripting language

developed jointly by sun and netscape and is meant for the WWW . it is designed to be embedded in

diverse applications and systems , without consuming much memory . java script borrows most of its

syntax from java but also inherits from awk and Perl , with some indirect influence from self in its

object prototype system.

Java scripts dynamically typed that is programs don’t declare variable types, and the type of

variable is unrestricted and can change at runtime. Source can be generated at run time and evaluated

against an arbitrary scope. Typical implementations compile by translating source into a specified

byte code format, to check syntax and source consistency. Note that the availability to generate and

interpret programs at runtime implies the presence of a compiler at runtime.

5

Page 6: Spatial approximate string search Doc

Java script is a high level scripting language that does not depend on or expose particular

machine representations or operating system services. It provides automatic storage management,

typically using a garbage collector.

Features

Java script is embedded into HTML documents and is executed with in them.

Java script is browser dependent

Java script is an interpreted language that can be interpreted by the browser at run

time.

Java script is loosely typed language

Java script is an object based language.

Java script is an Event-Driven language and supports event handlers to specify the

functionality of a button.

Advantages

Java script can be used for client side application

Java script provides means to contain multi frame windows for presentation of the

web.

Java script provides basic data validation before it is sent to the server. Eg : login and

password checking or whether the values entered are correct or whether all fields in a

from are filled and reduced network traffic

It creates interactive forms and client side lookup tables .

Servlets

Servlets provide a Java(TM)-based solution used to address the problems currently

associated with doing server-side programming, including inextensible scripting solutions,

platform-specific APIs, and incomplete interfaces.

Servlets are objects that conform to a specific interface that can be plugged into a

Java-based server. Servlets are to the server-side what applets are to the client-side -- object

byte codes that can be dynamically loaded off the net. They differ from applets in that they

are faceless objects (without graphics or a GUI component). They serve as platform-

independent, dynamically-loadable, pluggable helper byte code objects on the server side that

can be used to dynamically extend server-side functionality.

Use Servlets instead of CGI Scripts

6

Page 7: Spatial approximate string search Doc

Servlets are an effective replacement for CGI scripts. They provide a way to generate

dynamic documents that is both easier to write and faster to run. Servlets also address the

problem of doing server-side programming with platform-specific APIs: they are developed

with the Java Servlet API, a standard Java extension. So use servlets to handle HTTP client

requests. For example, have servlets process data posted over HTTPS using an HTML form,

including purchase order or credit card data. A servlet like this could be part of an order-entry

and processing system, working with product and inventory databases, and perhaps an on-line

payment system.

Architecture of the Servlet Package

The javax.servlet package provides interfaces and classes for writing servlets. The architecture

of the package is described below.

The Servlet Interface

The central abstraction in the Servlet API is the Servlet interface. All servlets

implement this interface, either directly or, more commonly, by extending a class that

implements it such as Http Servlet .

  The Servlet interface declares, but does not implement, methods that manage the

servlet and its communications with clients. Servlet writers provide some or all of these

methods when developing a servlet.

Client Interaction

When a servlet accepts a call from a client, it receives two objects

A ServletRequest, which encapsulates the communication from the client to the server.

A ServletResponse, which encapsulates the communication from the servlet back to the client.

ServletRequest and ServletResponse are interfaces defined by the javax.servlet package.  

The ServletRequest Interface

The ServletRequest interface allows the servlet access to

7

Page 8: Spatial approximate string search Doc

Information such as the names of the parameters passed in by the client, the protocol

(scheme) being used by the client, and the names of the remote host that made the request and

the server that received it.

The input stream, ServletInputStream. Servlets use the input stream to get data from clients that

use application protocols such as the HTTP POST and PUT methods.

Interfaces that extend ServletRequest interface allow the servlet to retrieve more protocol-

specific data. For example, the HttpServletRequest interface contains methods for accessing

HTTP-specific header information.  

The ServletResponse Interface

The ServletResponse interface gives the servlet methods for replying to the client. It:

Allows the servlet to set the content length and MIME type of the reply.  

Provides an output stream, ServletOutputStream, and a Writer through which the servlet can

send the reply data.

Interfaces that extend the ServletResponse interface give the servlet more protocol-specific

capabilities. For example, the HttpServletResponse interface contains methods that allow the

servlet to manipulate HTTP-specific header information.

  Additional Capabilities of HTTP Servlets

The classes and interfaces described above make up a basic Servlet. HTTP servlets have

some additional objects that provide session-tracking capabilities. The servlet writer can use

these APIs to maintain state between the servlet and the client that persists across multiple

connections during some time period. HTTP servlets also have objects that provide cookies.

The servlet writer uses the cookie API to save data with the client and to retrieve this data.

The classes mentioned in the Architecture of the Servlet Package section are shown in the

example in bold:

SimpleServlet extends the HttpServlet class, which implements the Servlet interface.

8

Page 9: Spatial approximate string search Doc

SimpleServlet overrides the doGet method in the HttpServlet class. The doGet method is called

when a client makes a GET request (the default HTTP request method), and results in the

simple HTML page being returned to the client.  

Within the doGet method,  

o The user's request is represented by an HttpServletRequest object.  

o The response to the user is represented by an HttpServletResponse object.  

o Because text data is returned to the client, the reply is sent using the Writer object

obtained from the HttpServletResponse object.

Servlet Lifecycle

Each servlet has the same life cycle:

A server loads and initializes the servlet

The servlet handles zero or more client requests  

The server removes the servlet

Initializing a Servlet

When a server loads a servlet, the server runs the servlet's init method. Initialization completes before

client requests are handled and before the servlet is destroyed.

Even though most servlets are run in multi-threaded servers, servlets have no concurrency issues

during servlet initialization.

The server calls the init method once, when the server loads the servlet, and will not call the init

method again unless the server is reloading the servlet. The server cannot reload a servlet until after

the server has destroyed the servlet by calling the destroy method.

The init Method

The init method provided by the HttpServlet class initializes the servlet and logs the

initialization. To do initialization specific to your servlet, override the init() method following

these rules

9

Page 10: Spatial approximate string search Doc

If an initialization error occurs that renders the servlet incapable of handling client requests,

throw an Unavailable Exception.

An example of this type of error is the inability to establish a required network

connection.

 Do not call the System.exit method

Initialization Parameters

The second version of the init method calls the getInitParameter method. This method

takes the parameter name as an argument and returns a String representation of the

parameter's value.

The specification of initialization parameters is server-specific. In the Java Web Server,

the parameters are specified with a servlet is added then configured in the

Administration Tool. For an explanation of the Administration screen where this setup

is performed, see the Administration Tool: Adding Servlets online help document.

If, for some reason, you need to get the parameter names, use the getParameterNames

method.

  Destroying a Servlet

Servlets run until the server are destroys them, for example at the

request of a system administrator. When a server destroys a servlet, the server runs the

servlet's destroy method. The method is run once; the server will not run that servlet

again until after the server reloads and reinitializes the servlet.

When the destroy method runs, another thread might be running a service request. The

Handling Service Threads at Servlet Termination section shows you how to provide a

clean shutdown when there could be long-running threads still running service

requests.

Using the Destroy Method

The destroy method provided by the HttpServlet class destroys the servlet and logs the

destruction. To destroy any resources specific to your servlet, override the destroy

method. The destroy method should undo any initialization work and synchronize

persistent state with the current in-memory state.

The following example shows the destroy method that accompanies the init method

shown previously:

public class BookDBServlet extends GenericServlet {

10

Page 11: Spatial approximate string search Doc

private BookstoreDB books;

... // the init method

public void destroy() {

// Allow the database to be garbage collected

books = null;

}

}

A server calls the destroy method after all service calls have been completed, or a server-

specific number of seconds have passed, whichever comes first. If your servlet handles any

long-running operations, service methods might still be running when the server calls the

destroy method. You are responsible for making sure those threads complete. The destroy

method shown above expects all client interactions to be completed when the destroy method

is called, because the servlet has no long-running operations.

Servlet-client Interaction

Handling HTTP Clients

An HTTP Servlet handles client requests through its service method. The service

method supports standard HTTP client requests by dispatching each request to a method

designed to handle that request. For example, the service method calls the doGet method shown

earlier in the simple example servlet.

Requests and Responses

Methods in the HttpServlet class that handle client requests take two arguments:

1. An HttpServletRequest object, which encapsulates the data from the client

 

2. An HttpServletResponse object, which encapsulates the response to the client  

HttpServletRequest Objects

11

Page 12: Spatial approximate string search Doc

An HttpServletRequest object provides access to HTTP header data, such as any cookies

found in the request and the HTTP method with which the request was made. The

HttpServletRequest object also allows you to obtain the arguments that the client sent as part of

the request.

HttpServletResponse Objects

An HttpServletResponse object provides two ways of returning data to the user:

 The get Writer method returns a Writer the get OutputStream method returns a Servlet OutputStream.

 Use the getWriter method to return text data to the user, and the getOutputStream method for binary

data.

HTML

HTML (hyper text markup language) is a language used to create hyper text documents that have

hyper links embedded in them. It consists of tags embedded in the text of a document with HTML.

We can build web pages or web document s. it is basically a formatting language and not a

programming language. The browser reading the document interprets mark up tags to help format the

document for subsequent display to a reader. HTML is a language for describing structured

documents. HTML is a platform independent. WWW (World Wide Web) pages are written using

HTML. HTML tags control in part the representation of the WWW page when view with web

browser. The browser interprets HTML tags in the web document and displays it. Different browsers

show data differently.

Example code:

<HTML>

<HEAD>

<TITLE>this is an html title</TITLE>

</HEAD>

<BODY>

………

</BODY>

</HTML>

12

Page 13: Spatial approximate string search Doc

Advantages

An HTML document is small and hence easy to send over the net. It is small because it does not

include format information.

HTML documents are cross platform compatible and device independent. We only need an HTML

readable browser to view them. For names, locations etc. are not required.

Apache Tomcat

Introduction to Tomcat

Tomcat is the Reference Implementation for the Java Servlet 2.2 and Java Server Pages 1.1

Technologies. It is the official reference implementation for these complementary

technologies. Tomcat is a servlets container with a JSP environment. A servlet container is a

runtime shell that manages and invokes servlets on behalf of users. Developed under the

Apache license in an open and participatory environment,

Tomcat is intended to be a collaboration of the best-of-breed developers from around the

world.

Tomcat and Servlets

As mentioned above Tomcat is the reference implementation for the Java Servlet 2.2

technology and obviously conforms to the specification that describes the programming

environment that must be provided by all servlet containers that is documented in the Servlet

API Specification, Version 2.2.

This document may be used to understand the web application directory structure and

deployment file, methods of mapping request URLs to servlets, container managed security,

and the syntax of the web.xml, Web Application Deployment Descriptor.

Installation

Tomcat will operate under any Java Development Kit (JDK) environment that provides a

JDK 1.1 or JDK 1.2 compatible platform. The JDK is required so that your servlets, other

classes, and JSP pages can be compiled.

13

Page 14: Spatial approximate string search Doc

Once you have downloaded the required file, unzip it to a directory of your choice. (In the

Microsoft Lab 5 at UWI the file is extracted directly to the C drive (C :\)). A sub-directory

named Jakarta-tomcat is created and this is the root directory of the tomcat hierarchy.

Tomcat 6.x

Implements the Servlet 2.4 and JSP 2.0 specifications .Reduced garbage collection, improved

performance and scalability. Native Windows and UNIX wrappers for platform integration

Faster JSP parsing.

14

Page 15: Spatial approximate string search Doc

Database Tables

It is the age of information technology and data & database play a very key role in this age. A

layperson of these days needs no introduction to databases, whether it is a personal telephone

directory or the bank passbook database are omnipresent. In this session we learn about database

management systems in general with an emphasis on the relational model of the DBMS.

The conventional data processing approach is to develop a program (or many programs) for each

application. This result in one or more data files for each application. Some of the data may be

common between files. However one application may require the file to be organized on a particular

field, while other application may require the file to be organized on another field. A major drawback

of the conventional method is that the storage access methods are built in to the program. Therefore,

though the same data may be required by two applications, the data will have to be sorted in two

different places because each application depends on the way that the data stored.

There are various drawbacks of conventional data file processing environment. Some of them are

listed below:

Data Redundancy

Some data elements like name, address, identification code, are used in various applications. Since

data is required by multiple applications, it is stored in multiple data files. In most cases, there is a

repetition of data. This is referred to as data redundancy, and leads to various other problems.

Data Integrity Problems

Data redundancy is one reason for the problem of data integrity. Since the same data is stored in

different places, it is inevitable that some inconsistency will creep in.

Data Availability Constraints

When data is scattered in different files, the availability of information from a combination of

files is constrained to some extent.

Database Management System

A database management system (DBMS) consists of a collection of interrelated data and a set of

programs to access the data. The collection of data is usually referred to as the database. A Database

system is designed to maintain large volumes of data. Management of data involves:

Defining the structures for the storage of data

Providing the mechanisms for the manipulation of the data

Providing for the security of the data against unauthorized access

15

Page 16: Spatial approximate string search Doc

Users of the DBMS

Broadly, there are three types of DBMS users:

The application programmer

The end user

The database administrator (DBA)

The application programmer writes application programs that use the database. These programs

operate on the data in the database. These operations include retrieving information, inserting data,

deleting or changing data.

The end user interacts with the system either by invoking an application program or by writing their

queries in a database query language. The database query language allows the end user to perform all

the basic operations (retrieval, deletion, insertion and updating) on the data.

The DBA has to coordinate the functions of collecting information about the data to be stored,

designing and maintaining the database and its security. The database must be designed and

maintained to provide the right information at the right time to authorized people. These

responsibilities belong to the DBA and his staff.

Advantages Of a DBMS

The major advantage that the database approach has over the conventional approach is that a database

system provides centralized control of data. Most benefits accrue from this notion of centralized

control.

Redundancy Can Be Controlled

Unlike the conventional approach, each application does not have to maintain its own data files.

Centralized control of data by the DBA avoids unnecessary duplication of data and effectively

reduces the total amount of data storage required. It also eliminates the extra processing necessary to

trace the required data in a large mass of data present. Any redundancies that exist in the DBMS are

controlled and the system ensures that these multiple copies are consistent.

Inconsistency Can Be Avoided

Since redundancy is reduced, inconsistency can also be avoided to some extent. The DBMS guarantee and that

the database is never inconsistent, by ensuring that a change made to any entry automatically applies to the other

entries as well. The process is known as propagating update.

16

Page 17: Spatial approximate string search Doc

The data can be shared

A database allows the sharing of data under its control by any number of application program or

users. Sharing of data does not merely imply that existing applications can share the data in the

database, it also means that new applications can be developed to operate using the same database.

Standards Can Be Enforced

Since there is centralized control of data, the database administrator can ensure that standards are maintained in

the representation of the stored data formats. This is particularly useful for data interchange, or migration of data

between two systems.

Security Restrictions Can Be Applied

The DBMS guarantees that only authorized persons can access the database. The DBA defines the

security checks to be carried out. Different checks can be applied to different operations on the same

data. For instance, a person may have the access rights to query on a file, but may not have the right to

delete or update that file. The DBMS allows such security checks to be established for each piece of

data in the database.

Integrity Can Be Maintained

Centralized control can also ensure that adequate checks are incorporated in the DBMS to

provide data integrity. Data integrity means that the data contain in the database is both

accurate and consistent. Inconsistency between two entries can lead to integrity problems.

However, even if there is no redundancy, the data can still be inconsistent. For example a

student may have enrolled in 10 courses in a semester when the maximum number of courses

one can enroll in is 7. Another example could be that of a student enrolling in a course that is

not being offered that semester. Such problems can be avoided in a DBMS by establishing

certain integrity checks to be carried out whenever any update operation is done. These

checks can be specified at the database level, besides the application programs.

Data Independence

In non-database systems, the requirement of the application dictates the way in which the data is stored and the

access techniques. Besides, the knowledge of the organization of the data, the access techniques are built into

the logic and code of the application. These systems are data dependent. Consider this example, suppose the

17

Page 18: Spatial approximate string search Doc

university has an application that processes the student file. For performance reason, the file is indexed on the

roll number. The application would be aware of the existing index, and the internal structure of the application

would be built around this knowledge. Now consider that the some reason, the file is to index on the registration

data. In this case it is impossible to change the structure of the stored data without affecting the application too.

Such an application is a data dependent one.

Features Of RDBMS

The ability to create multiple relations and enter data into them

An interactive query language

Retrieval of information stored in more than one table

Database Design

Having identified all the data in the system, it is necessary to arrive at the logical database design.

Database design involves designing the conceptual model of the database. This model is independent

of the physical representation of data. Before actually implementing the database, the conceptual

model is designed using various techniques.

The requirements of all the users are taken into account to decide the actual data that needs

to be stored in the system. Once the conceptual model is designed, it can then be mapped to the

DBMS/RDBMS that is actually being used. Two of the widely used approaches are Entity-

relationship (E/R) Modeling and Normalization.

The E/R model is an object based model and is based on a perception of the real world that

is made up of a collection of objects or entities and the relationships among these. E/R modeling is

generally used as a top down approach for new systems.

Entity

Entity is an object or place or event, which can be stored on the system. A physical object can be as

employee, customer, and machinery. An abstract object can be as dept, accounting. An event can be

as registration or application form. A place can be as city, state. Before a table is created it is known

as entity. It is denoted as a rectangle diagram.

Attribute

Attribute is describing the entity. Example an entity employees can contain empno, ename, sal,

hiredate etc. It is represented by a circle.

18

Page 19: Spatial approximate string search Doc

Relation

A “Relation” is a two-dimensional table. It consists of ‘rows” which represent records and ‘columns’

which show the attributes of the entity. A relation is also called a file, it consists of a number of

records, which are also called as tuples. Record consists of a number of attributes, which are also

known as fields or domains.

In order for a relational structure to be useful and manageable, the relation tables must

first be “normalized”.

.

Some of the properties of a relation are

No duplication - In the sense that no two records are identical

Unique Key - Each relation has a unique key by which it

can be accessed

Order - There is no significant order of data in the table.

In case we want the names of all the employees whose grade is 20, we can scan the employee

relation noting the grade. Here the Unique key is the employee number.

Normalization

Normalization is a process of simplifying the relationship between data elements in a record. It is the

transformation of complex data stores to a set of smaller, stable data structures. Normalized

19

Data Item 1

Data Item 2

Data Item 3

Relations

Records

Attributes

Page 20: Spatial approximate string search Doc

data structures are simpler, more stable and are easier to maintain. Normalization can therefore be

defined as a process of simplifying the relationship between data elements in a record.

Purpose for Normalization

To permit simple retrieval of data in response to query and report requests.

To simplify the maintenance of the data through updates, insertions and deletions.

To reduce the need to restructure or reorganize data when new application requirements

arise.

Steps of Normalization

It consists of basic three steps

First Normal Form, which decomposes all data groups into two-dimensional records.

Second Normal form, which eliminates any relationships in which data elements do not

fully depend on the primary key of the record.

Third Normal Form which eliminates any relationships that contain transitive

dependencies.

Fig 3.2 steps involved in the process of normalization

20

Page 21: Spatial approximate string search Doc

ORACLE

Introduction

Oracle is a relational database management system, which organizes data in the form of tables.

Oracle is one of many database servers based on RDBMS model, which manages a seer of data that

attends three specific things-data structures, data integrity and data manipulation. With oracle

cooperative server technology we can realize the benefits of open, relational systems for all the

applications. Oracle makes efficient use of all systems resources, on all hardware architecture; to

21

User Views Data Stores

Un-

normalized

Relations

First Normal Form

Second Normal Form

Third Normal Form

Step 1: Remove repeating groups. Fix record length

identify primary key.

Step 2 : Removal of data items which are not Dependent on

primary key

.Step 3 : Removal of transitive

dependencies.

Page 22: Spatial approximate string search Doc

deliver unmatched performance, price performance and scalability. Any DBMS to be called as

RDBMS has to satisfy Dr.E.F.Codd’s rules.

Oracle is comprehensive operating environment that packs h power of mainframe relation database

management system into user’s microcomputer. It provides a set of functional program that user can

use as tools to build structures and perform tasks. Because applications are developed on oracle are

completely portable to the other versions of the programmer can create a complex application in a

single user, environment and then move it to a multi-user platform. Users do not have to be an expert

to appreciate oracle but the better user understands the program, the more productively and creatively

he can use the tools it provides.

Relational Database Management System

Oracle the right tool

Oracle gives you security and control

Database management tools

Oracle database can be describe at two different levels

Physical Structure

Logical Structure

Physical Structure

a) One or more data files

b) Two or more log files

c) One control file

Logical Structure

a) Table spaces

b) Segments

c) Extents

d) Data Blocks

The data files contain all user data in terms of tables, index and views. The log files contain the

information to open and be recovered, of undone after a transaction (Rollback).

22

Page 23: Spatial approximate string search Doc

The control files physical data, media information to open and manage data files. If the control file is

damaged the server will not be able to open or use the database even if the database is undamaged.

Features of Oracle

Oracle is portable:

The Oracle RDBMS is available on wide range of platforms, ranging from PCs to super

computers and as a multi-user network loadable module (NLM) for Novell Netware. If you develop

an application on one system you can run the same application on other systems without any

modifications.

Oracle is Compatible:

The Oracle command can be used for communicating with IBM, DB/2, Mainframe

RDBMS, which is different from Oracle, i.e., Oracle is compatible with DB/2. Oracle is a

high performance fault tolerant DBMS which is specially designed for on-line transaction

processing and for handling the large database applications.

Oracle Tools

Oracle is RDBMS, which stores and displays the Data in the form of tables. A table

consists of rows and columns. A single row is called Record. Oracle is a modular system that

contains Oracle Database (DB Manager) and several Tools (Functional Programs).

Oracle Tools do 4 major kinds of work

Database management

Data access and manipulation

Programming

Connectivity.

Data Access and Manipulation Tools

These are the tools used for communication with database manager for data access and

manipulation. These tools can be used for not only access and manipulation but you can use design or

use an application. Each tool Provides separate entry point and a unique approach to the Oracle

system. The tools are firmly based on ANSI standard SQL.

23

Page 24: Spatial approximate string search Doc

SQL*PLUS

SQL* Plus is direct access to the Oracle RDBMS. You can see SQL commands to define,

control and manipulate and query data. All users like DBA’s, high-level system developers and others

can talk straight in Oracle RDBMS.

Connectivity Tools

The connectivity tools help in connecting the Oracle databases through network and to other

database systems. SQL* Plus allows for accessing the IBM, DB/2 (an IBM Mainframe RDBMS) and

SQL/DS (Structured query language for data system) databases directly using the normal Oracle

commands without doing any modifications.

SQL

The name SQL stands for structure query language. SQL is data access language, like any other

language, it is used for communication. SQL communicates with database manager. The database

manager could be Oracle, DB2, and SQL base, in grace or any RDBMS that supports SQL language.

These database systems understand SQL.

SQL is easy to learn. Despite the fact that the SQL is a computer programming language, it is

much simpler than traditional programming language like COBOL, BASIC, FORTRAN or APL. This

is due to the fact that SQL is non-procedural language.

Features of SQL

SQL users a free form (A non mathematical syntax), English like structure for its

commands.

SQL Processing Capabilities

SQL is composed of a Definition language, a Data manipulation language and a Data control

language. These three languages support the complete spectrum of Relational Data processing

activity. In fact most SQL based products all access to the data through SQL.

Data definition language: DDL allows creation, deletion and modification of data structures for

bar system. These structures include tables, databases, and indexes.

Ex: Creation, Drop, Alter.

24

Page 25: Spatial approximate string search Doc

Data Manipulation Language: These commands are used to manipulate the data in tables directly

or through views. There are four standard DML statements. They are Delete, Insert, and Update.

Data control language: These commands are used to control usage and access of data. The most

commonly found one’s are Grant and Revoke

SQL Data Manipulation Statements

A transaction is a sequence of SQL statements that Oracle treats as a unit, so that all changes brought

about by the statements are made permanent or undone for the same time. The consistency of the database

PL/SQL lets you use the Commit, Rollback and Save point statements. The Commit statement makes

permanent any changes made during the current transaction until you commit your changes, other users

cannot see them. The Rollback statement ends the current transaction and undoes any changes made since

the transaction began. The Save point statement marks the current point in the processing of a transaction.

3. SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

Keyword search over a large amount of data is an important

operation in a wide range of domains. Felipe et al. has recently extended its

study to spatial databases, where keyword search becomes a fundamental

building block for an increasing number of real-world applications, and

proposed the IR -Tree. A main limitation of the IR -Tree is that it only supports

exact keyword search.

LIMITATIONS WITH EXISTING SYSTEM

Exact Keyword Require For Searching the Results.

3.2PROPOSED SYSTEM

For RSAS queries, the baseline spatial solution is based on the Dijkstra’s

algorithm. Given a query point q, the query range radius r, and a string

predicate, we expand from q on the road network using the Dijkstra algorithm

until we reach the points distance r away from q and verify the string predicate

either in a post-processing step or on the intermediate results of the expansion.

We denote this approach as the Dijkstra solution. Its performance degrades

quickly when the query range enlarges and/or the data on the network increases.

25

Page 26: Spatial approximate string search Doc

This motivates us to find a novel method to avoid the unnecessary road network

expansions, by combining the prunings from both the spatial and the string

predicates simultaneously.

We demonstrate the efficiency and effectiveness of our proposed methods

for SAS queries using a comprehensive experimental evaluation. For ESAS

queries, our experimental evaluation covers both synthetic and real data sets of

up to 10 millions points and 6 dimensions. For RSAS queries, our evaluation is

based on two large, real road network datasets, that contain up to 175,813

nodes, 179,179 edges, and 2 millions points on the road network. In both cases,

our methods have significantly outperformed the respective baseline methods.

ADVANTAGES IN PROPOSED SYSTEM

o This is very helpful for Exact Result from Non Exact keywords .

3.3. SYSTEM REQUIREMENTS

HARDWARE REQUIREMENTS:

Processor : intel pentium-iv (3.00 GHz)

Memory : 512 MB

Hard disk : 100GB

SOFTWARE REQUIREMENTS:

Operating system : Windows XP/7/8

Language : Java ,HTML

Database : Oracle

26

Page 27: Spatial approximate string search Doc

After analyzing the requirements of the task to be performed, the next step is to analyze

the problem and understand its context. The first activity in the phase is studying the existing

system and other is to understand the requirements and domain of the new system. Both the

activities are equally important, but the first activity serves as a basis of giving the functional

specifications and then successful design of the proposed system. Understanding the

properties and requirements of a new system is more difficult and requires creative thinking

and understanding of existing running system is also difficult, improper understanding of

present system can lead diversion from solution.

3.4 SOFTWARE REQUIREMENT SPECIFICATION

SCOPE OF THE PROJECT

The software, Site Explorer is designed for management of web sites from a remote

location.

Purpose: The main purpose for preparing this document is to give a general insight into the

analysis and requirements of the existing system or situation and for determining the

operating characteristics of the system.

Scope: This Document plays a vital role in the development life cycle (SDLC) and it

describes the complete requirement of the system. It is meant for use by the developers and

will be the basic during testing phase. Any changes made to the requirements in the future

will have to go through formal change approval process.

DEVELOPERS RESPONSIBILITIES OVERVIEW:

The developer is responsible for:

Developing the system, which meets the SRS and solving all the requirements of the

system?

Demonstrating the system and installing the system at client's location after the

acceptance testing is successful.

Submitting the required user manual describing the system interfaces to work on it

and also the documents of the system.

Conducting any user training that might be needed for using the system.

27

Page 28: Spatial approximate string search Doc

Maintaining the system for a period of one year after installation.

3.4 FUNCTIONAL REQUIREMENTS

Functional Requirements refer to very important system requirements in a software

engineering process (or at micro level, a sub part of requirement engineering) such as

technical specifications, system design parameters and guidelines, data manipulation, data

processing and calculation modules etc.

Functional Requirements are in contrast to other software design requirements referred to as

Non-Functional Requirements which are primarily based on parameters of system

performance, software quality attributes, reliability and security, cost, constraints in

design/implementation etc.

The key goal of determining “functional requirements” in a software product design and

implementation is to capture the required behavior of a software system in terms of

functionality and the technology implementation of the business processes.

The Functional Requirement document (also called Functional Specifications or Functional

Requirement Specifications), defines the capabilities and functions that a System must be

able to perform successfully.

Functional Requirements should include:

Descriptions of data to be entered into the system

Descriptions of operations performed by each screen

Descriptions of work-flows performed by the system

Descriptions of system reports or other outputs

Who can enter the data into the system?

How the system meets applicable regulatory requirements

The functional specification is designed to be read by a general audience. Readers should

understand the system, but no particular technical knowledge should be required to

understand the document.

28

Page 29: Spatial approximate string search Doc

Examples of Functional Requirements

Functional requirements should include functions performed by specific screens, outlines of

work-flows performed by the system and other business or compliance requirements the

system must meet.

Interface requirements

Field accepts numeric data entry

Field only accepts dates before the current date

Screen can print on-screen data to the printer

Business Requirements

Data must be entered before a request can approved

Clicking the Approve Button moves the request to the Approval Workflow

All personnel using the system will be trained according to internal training strategies

Regulatory/Compliance Requirements

The database will have a functional audit trail

The system will limit access to authorized users

The spreadsheet can secure data with electronic signatures

Security Requirements

Member of the Data Entry group can enter requests but not approve or delete requests

Members of the Managers group can enter or approve a request, but not delete

requests

Members of the Administrators group cannot enter or approve requests, but can delete

requests

The functional specification describes what the system must do; how the system does it is

described in the Design Specification.

If a User Requirement Specification was written, all requirements outlined in the user

requirement specification should be addressed in the functional requirements.

29

Page 30: Spatial approximate string search Doc

3.5 NON FUNCTIONAL REQUIREMENTS

All the other requirements which do not form a part of the above specification are categorized

as Non-Functional Requirements.

A system may be required to present the user with a display of the number of records in a

database. This is a functional requirement.

How up-to-date this number needs to be is a non-functional requirement. If the

number needs to be updated in real time, the system architects must ensure that the system is

capable of updating the displayed record count within an acceptably short interval of the

number of records changing. Sufficient network bandwidth may also be a non-functional

requirement of a system.

Other examples:

Accessibility

Availability

Backup

Certification

Compliance

Configuration Management

Documentation

Disaster Recovery

Efficiency (resource consumption for given load)

Effectiveness (resulting performance in relation to effort)

Extensibility (adding features, and carry-forward of customizations at next major

version upgrade)

Failure Management

Interoperability

Maintainability

Modifiability

Open Source

Operability

30

Page 31: Spatial approximate string search Doc

Performance

Platform compatibility

Price

Portability

Quality (e.g. Faults Discovered, Faults Delivered, Fault Removal Efficacy)

Recoverability

Resilience

Resource constraints (processor speed, memory, disk space, network bandwidth etc.)

Response time

Robustness

Scalability (horizontal, vertical)

Security

Software, tools, standards etc.

Stability

Safety

Supportability

Testability

Usability by target user community

Accessibility is a general term used to describe the degree to which a product, device,

service, or environment is accessible by as many people as possible. Accessibility can be

viewed as the "ability to access" and possible benefit of some system or entity. Accessibility

is often used to focus on people with disabilities and their right of access to the system.

Availability is the degree to which a system, subsystem, or equipment is operable and in a

committable state at the start of a mission, when the mission is called for at an unknown, i.e.,

a random, time. Simply put, availability is the proportion of time a system is in a functioning

condition.

Expressed mathematically, availability is 1 minus the unavailability.

A backup or the process of backing up refers to making copies of data so that these

additional copies may be used to restore the original after a data loss event. These additional

copies are typically called "backups."

31

Page 32: Spatial approximate string search Doc

Certification refers to the confirmation of certain characteristics of an object, system, or

organization. This confirmation is often, but not always, provided by some form of external

review, education, or assessment

Compliance is the act of adhering to, and demonstrating adherence to, a standard or

regulation.

Configuration management (CM) is a field that focuses on establishing and maintaining

consistency of a system's or product's performance and its functional and physical attributes

with its requirements, design, and operational information throughout its life.

Documentation may refer to the process of providing evidence ("to document something")

or to the communicable material used to provide such documentation (i.e. a document).

Documentation may also (seldom) refer to tools aiming at identifying documents or to the

field of study devoted to the study of documents and bibliographies

Disaster recovery is the process, policies and procedures related to preparing for recovery or

continuation of technology infrastructure critical to an organization after a natural or human-

induced disaster.

Disaster recovery planning is a subset of a larger process known as business continuity

planning and should include planning for resumption of applications, data, hardware,

communications (such as networking) and other IT infrastructure

Extensibility (sometimes confused with forward compatibility) is a system design principle

where the implementation takes into consideration future growth. It is a systemic measure of

the ability to extend a system and the level of effort required to implement the extension.

Extensions can be through the addition of new functionality or through modification of

existing functionality. The central theme is to provide for change while minimizing impact to

existing system functions.

Interoperability is a property referring to the ability of diverse systems and organizations to

work together (inter-operate). The term is often used in a technical systems engineering

sense, or alternatively in a broad sense, taking into account social, political, and

organizational factors that impact system to system performance.

Maintenance is the ease with which a software product can be modified in order to:

32

Page 33: Spatial approximate string search Doc

correct defects

meet new requirements

make future maintenance easier, or

cope with a changed environment;

Open source describes practices in production and development that promote access to the

end product's source materials—typically, their source code

Operability is the ability to keep equipment, a system or a whole industrial installation in a

safe and reliable functioning condition, according to pre-defined operational requirements.

In a computing systems environment with multiple systems this includes the ability of

products, systems and business processes to work together to accomplish a common task.

Computer performance is characterized by the amount of useful work accomplished by a

computer system compared to the time and resources used.

Depending on the context, good computer performance may involve one or more of the

following:

Short response time for a given piece of work

High throughput (rate of processing work)

Low utilization of computing resource(s)

High availability of the computing system or application

Fast (or highly compact) data compression and decompression

High bandwidth / short data transmission time

Price in economics and business is the result of an exchange and from that trade we assign a

numerical monetary value to a good, service or asset

Portability is one of the key concepts of high-level programming. Portability is the software-

code base feature to be able to reuse the existing code instead of creating new code when

moving software from an environment to another. When one is targeting several platforms

with the same application, portability is the key issue for development cost reduction.

33

Page 34: Spatial approximate string search Doc

Quality: The common element of the business definitions is that the quality of a product or

service refers to the perception of the degree to which the product or service meets the

customer's expectations. Quality has no specific meaning unless related to a specific function

and/or object. Quality is a perceptual, conditional and somewhat subjective attribute.

Reliability may be defined in several ways:

The idea that something is fit for purpose with respect to time;

The capacity of a device or system to perform as designed;

The resistance to failure of a device or system;

The ability of a device or system to perform a required function under stated

conditions for a specified period of time;

The probability that a functional unit will perform its required function for a specified

interval under stated conditions.

The ability of something to "fail well" (fail without catastrophic consequences

Resilience is the ability to provide and maintain an acceptable level of service in the face of

faults and challenges to normal operation.

These services include:

supporting distributed processing

supporting networked storage

maintaining service of communication services such as

o video conferencing

o instant messaging

o online collaboration

access to applications and data as needed

Response time perceived by the end user is the interval between

(a) The instant at which an operator at a terminal enters a request for a response from

a computer and

(b) The instant at which the first character of the response is received at a terminal.

34

Page 35: Spatial approximate string search Doc

In a data system, the system response time is the interval between the receipt of the end of

transmission of an inquiry message and the beginning of the transmission of a response

message to the station originating the inquiry.

Robustness is the quality of being able to withstand stresses, pressures, or changes in

procedure or circumstance. A system or design may be said to be "robust" if it is capable of

coping well with variations (sometimes unpredictable variations) in its operating environment

with minimal damage, alteration or loss of functionality.

The concept of scalability applies to technology and business settings. Regardless of the

setting, the base concept is consistent - The ability for a business or technology to accept

increased volume without impacting the system.

In telecommunications and software engineering, scalability is a desirable property of a

system, a network, or a process, which indicates its ability to either handle growing amounts

of work in a graceful manner or to be readily enlarged.

Security is the degree of protection against danger, loss, and criminals.

Security has to be compared and contrasted with other related concepts: Safety, continuity,

reliability. The key difference between security and reliability is that security must take into

account the actions of people attempting to cause destruction.

Security as a state or condition is resistance to harm. From an objective perspective, it is a

structure's actual (conceptual and never fully knowable) degree of resistance to harm.

Stability - it means much of the objects will be stable over time and will not need changes.

Safety is the state of being "safe", the condition of being protected against physical, social,

spiritual, financial, political, emotional, occupational, psychological, educational or other

types or consequences of failure, damage, error, accidents, harm or any other event which

could be considered non-desirable. This can take the form of being protected from the event

or from exposure to something that causes health or economical losses. It can include

protection of people or of possessions

Supportability (also known as serviceability) is one of the aspects of RASU (Reliability,

Availability, Serviceability, and Usability)). It refers to the ability of technical support

35

Page 36: Spatial approximate string search Doc

personnel to install, configure, and monitor products, identify exceptions or faults, debug or

isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit

of solving a problem and restoring the product into service. Incorporating serviceability

facilitating features typically results in more efficient product maintenance and reduces

operational costs and maintains business continuity.

Testability, a property applying to an empirical hypothesis, involves two components: (1) the

logical property that is variously described as contingency, defeasibility, which means that

counter examples to the hypothesis are logically possible, and (2) the practical feasibility of

observing a reproducible series of such counter examples if they do exist. In short it refers to

the capability of an equipment or system to be tested

Usability is used to denote the ease which users can employ a tool or other human-made

object to get a particular goal. In human-computer interaction and computer science, usability

refers to the elegance and clarity with which the interaction with a computer program or a

web site is designed.

MODULES:

36

Page 37: Spatial approximate string search Doc

Implementation is the stage of the project when the theoretical design

is turned out into a working system. Thus it can be considered to be the most

critical stage in achieving a successful new system and in giving the user,

confidence that the new system will work and be effective.

The implementation stage involves careful planning, investigation of

the existing system and it’s constraints on implementation, designing of

methods to achieve changeover and evaluation of changeover methods.

1. User Module:

In this module, Users are having authentication and security to access

the detail which is presented in the ontology system. Before accessing or

searching the details user should have the account in that otherwise they should

register first.

.

2. key:

The key of common Index can be made from the Index word given by

the Data owner and File. The secure index and a search scheme to enable fast

similarity search in the context of data. In such a context, it is very critical not to

sacrifice the confidentiality of the sensitive data while providing functionality.

We provided a rigorous security definition and proved the security of the

proposed scheme under the provided definition to ensure the confidentiality.

3. Edit Distance Pruning:

37

Page 38: Spatial approximate string search Doc

Computing edit distance exactly is a costly operation. Sev- eral

techniques have been proposed for identifying candidate strings within a small

edit distance from a query string fast. All of them are based on q-grams and a q-

gram

counting argument. For a string s, its q-grams are produced by sliding a window

of length q over the characters of s. To deal with the special case at the

beginning and the end of s, that have fewer than q characters, one may introduce

special characters, such as “#” and “$”, which are not in S. This helps

conceptually extend

s by prefixing it with q - 1 occurrences of “#” and suffixing it with q - 1

occurrences of “$”. Hence, each q-gram for the string s has exactly q characters.

4. Search:

we provide a specific application of the proposed similarity searchable

encryption scheme to clarify its mechanism.Server performs search on the index

for each component and sends back the corresponding encrypted bit vectors it

makes by the respective like commend. Finally, we illustrated the performance

of the proposed scheme with empirical analysis on a real data.

4. SYSTEM DESIGN

4.1 UML Diagrams:

38

Page 39: Spatial approximate string search Doc

UML is a method for describing the system architecture in detail using the blueprint.

UML represents a collection of best engineering practices that have proven successful

in the modeling of large and complex systems.

UML is a very important part of developing objects oriented software and the

software development process.

UML uses mostly graphical notations to express the design of software projects.

Using the UML helps project teams communicate, explore potential designs, and

validate the architectural design of the software.

Definition:

UML is a general-purpose visual modeling language that is used to specify, visualize,

construct, and document the artifacts of the software system.

UML is a language:

It will provide vocabulary and rules for communications and function on conceptual

and physical representation. So it is modeling language.

UML Specifying:

Specifying means building models that are precise, unambiguous and complete. In

particular, the UML address the specification of all the important analysis, design and

implementation decisions that must be made in developing and displaying a software

intensive system.

UML Visualization:

The UML includes both graphical and textual representation. It makes easy to

visualize the system and for better understanding.

UML Constructing:

39

Page 40: Spatial approximate string search Doc

UML models can be directly connected to a variety of programming languages and it

is sufficiently expressive and free from any ambiguity to permit the direct execution of

models.

UML Documenting:

UML provides variety of documents in addition raw executable codes.The use case

view of a system encompasses the use cases that describe the behavior of the system as seen

by its end users, analysts, and testers.

The design view of a system encompasses the classes, interfaces, and collaborations

that form the vocabulary of the problem and its solution.

The process view of a system encompasses the threads and processes that form the

system's concurrency and synchronization mechanisms.

The implementation view of a system encompasses the components and files that are

used to assemble and release the physical system. The deployment view of a system

encompasses the nodes that form the system's hardware topology on which the system

executes.

Uses of UML:

The UML is intended primarily for software intensive systems. It has been used

effectively for such domain as Enterprise Information System Banking and

i) Financial Services

ii) Telecommunications

iii) Transportation

IV) Defense/Aerospace

v) Retails

vi) Medical Electronics

vii) Scientific Fields

40

Page 41: Spatial approximate string search Doc

Viii) Distributed Web

Building blocks of UML:

The vocabulary of the UML encompasses 3 kinds of building blocks

Things

Relationships

Diagrams

Things:

Things are the data abstractions that are first class citizens in a model. Things are of 4

types

Structural Things, Behavioral Things, Grouping Things, notational Things

Relationships:

Relationships tie the things together. Relationships in the UML are

Dependency, Association, Generalization, Specialization

UML Diagrams:

A diagram is the graphical presentation of a set of elements, most often rendered as a

connected graph of vertices (things) and arcs (relationships).

There are two types of diagrams, they are:

Structural and Behavioral Diagrams

Structural Diagrams:-

The UML‘s four structural diagrams exist to visualize, specify, construct and

document the static aspects of a system. I can View the static parts of a system using one of

the following diagrams. Structural diagrams consist of Class Diagram, Object Diagram,

Component Diagram, and Deployment Diagram.

Behavioral Diagrams:

41

Page 42: Spatial approximate string search Doc

The UML’s five behavioral diagrams are used to visualize, specify, construct, and

document the dynamic aspects of a system. The UML’s behavioral diagrams are roughly

organized around the major ways which can model the dynamics of a system.

Behavioral diagrams consists of

a) Use case Diagram b) Sequence Diagram

c) Collaboration Diagram d) State chart Diagram e) Activity Diagram

4.2 Use-Case diagram:

A use case is a set of scenarios that describing an interaction between a user and a

system.  A use case diagram displays the relationship among actors and use cases.  The two

main components of a use case diagram are use cases and actors.

An actor is represents a user or another system that will interact with the system you

are modeling.  A use case is an external view of the system that represents some action the

user might perform in order to complete a task.

42

Page 43: Spatial approximate string search Doc

Fig 1: USECASE DIAGRAM

Contents:

Use cases

Actors

Dependency, Generalization, and association relationships

System boundary

4.3 Class Diagram:

43

Page 44: Spatial approximate string search Doc

Class diagrams are widely used to describe the types of objects in a system and their

relationships. Class diagrams model class structure and contents using design elements such

as classes, packages and objects. Class diagrams describe three different perspectives when

designing a system, conceptual, specification, and implementation. These perspectives

become evident as the diagram is created and help solidify the design. Class diagrams are

arguably the most used UML diagram type. It is the main building block of any object

oriented solution. It shows the classes in a system, attributes and operations of each class and

the relationship between each class. In most modeling tools a class has three parts, name at

the top, attributes in the middle and operations or methods at the bottom. In large systems

with many classes related classes are grouped together to create class diagrams. Different

relationships between diagrams are show by different types of Arrows. Below is a image of a

class diagram. Follow the link for more class diagram examples.

Fig 2: CLASS DIAGRAM

4.4 Sequence Diagram

Sequence diagrams in UML shows how object interact with each other and the order

those interactions occur. It’s important to note that they show the interactions for a particular

scenario. The processes are represented vertically and interactions are show as arrows. This

44

Page 45: Spatial approximate string search Doc

article explains the purpose and the basics of Sequence diagrams.

Fig 3: SEQUENCE DIAGRAM

4.6 Activity diagram:

Activity Diagram:

45

Page 46: Spatial approximate string search Doc

Activity diagrams describe the workflow behavior of a system.  Activity

diagrams are similar to state diagrams because activities are the state of doing something. 

The diagrams describe the state of activities by showing the sequence of activities

performed.  Activity diagrams can show activities that are conditional or parallel.

How to Draw: Activity Diagrams

Activity diagrams show the flow of activities through the system.  Diagrams are read

from top to bottom and have branches and forks to describe conditions and parallel activities. 

A fork is used when multiple activities are occurring at the same time.  The diagram below

shows a fork after activity1.  This indicates that both activity2 and activity3 are occurring at

the same time.  After activity2 there is a branch.  The branch describes what activities will

take place based on a set of conditions.  All branches at some point are followed by a merge

to indicate the end of the conditional behavior started by that branch.   After the merge all of

the parallel activities must be combined by a join before transitioning into the final activity

state.   .

46

Page 47: Spatial approximate string search Doc

When to Use: Activity Diagrams

Activity diagrams should be used in conjunction with other modeling techniques such

as interaction diagrams and state diagrams.  The main reason to use activity diagrams is to

model the workflow behind the system being designed.  Activity Diagrams are also useful

for: analyzing a use case by describing what actions need to take place and when they should

occur; describing a complicated sequential algorithm; and modeling applications with parallel

processes.

Fig 4.1: ACTIVITY DIAGRAM FOR USER

47

Page 48: Spatial approximate string search Doc

Fig 4.2: ACTIVITY DIAGRAM FOR ADMIN

4.7 Data Flow Diagram

Sign in Sign in

Sign out Sign out

48

Users

Spatial Approximate String Search Admin

Page 49: Spatial approximate string search Doc

Fig 5: Content level

Sign in Sign in

Si Sign out Sign out

Fig 5.2: LEVEL 0 USER LEVEL DIAGRAM

Fig 5.4: LEVEL 1 ADMIN DIAGRAM

49

# Database

# Database

Users Level 0DED

Spatial Approximate String Search

Log in

Search files

View files

Update profile

Register

Download files

Log in

Search filesView filesUpdate profile

Register

Download files

Logout

Page 50: Spatial approximate string search Doc

Sign in U sign in

Sign out sign out

Fig 5.5: LEVEL 1 DFD DIAGRAM

5. IMPLEMENTATION

Implementation is the stage of the project when the

theoretical design is turned out into a working system. Thus it can be considered to be the

most critical stage in achieving a successful new system and in giving the user, confidence

that the new system will work and be effective.

50

user

Level1DED

Spatial Approximate

String Search

# Database

# Database

# Database

# Database

# DatabaseLogout

Page 51: Spatial approximate string search Doc

The implementation stage involves careful planning, investigation of the existing

system and it’s constraints on implementation, designing of methods to achieve changeover

and evaluation of changeover methods.

Implementation is the process of converting a new system design into operation. It is

the phase that focuses on user training, site preparation and file conversion for installing a

candidate system. The important factor that should be considered here is that the conversion

should not disrupt the functioning of the organization.

5.2 SAMPLE CODE:

6. TESTING

6.1 Introduction

The purpose of testing is to discover errors. Testing is the process of trying to

discover every conceivable fault or weakness in a work product. It provides a way to check

51

Page 52: Spatial approximate string search Doc

the functionality of components, sub assemblies, assemblies and/or a finished product It is the

process of exercising software with the intent of ensuring that the

Software system meets its requirements and user expectations and does not fail in an

unacceptable manner. There are various types of test. Each test type addresses a specific

testing requirement.

TYPES OF TESTS

Unit testing

Unit testing involves the design of test cases that validate that the internal program

logic is functioning properly, and that program inputs produce valid outputs. All decision

branches and internal code flow should be validated. It is the testing of individual software

units of the application .it is done after the completion of an individual unit before

integration. This is a structural testing, that relies on knowledge of its construction and is

invasive. Unit tests perform basic tests at component level and test a specific business

process, application, and/or system configuration. Unit tests ensure that each unique path of a

business process performs accurately to the documented specifications and contains clearly

defined inputs and expected results.

52

UNIT TESTING

MODULE TESTING

SUB-SYSTEM TESING

SYSTEM TESTING

ACCEPTANCE TESTING

Component Testing

Integration Testing

User Testing

Page 53: Spatial approximate string search Doc

Fig: 6.1 Testing

Integration testing

Integration tests are designed to test integrated software components to determine if

they actually run as one program. Testing is event driven and is more concerned with the

basic outcome of screens or fields. Integration tests demonstrate that although the

components were individually satisfaction, as shown by successfully unit testing, the

combination of components is correct and consistent. Integration testing is specifically aimed

at exposing the problems that arise from the combination of components.

Functional test

Functional tests provide systematic demonstrations that functions tested are available as

specified by the business and technical requirements, system documentation, and user

manuals.

53

Page 54: Spatial approximate string search Doc

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or

special test cases. In addition, systematic coverage pertaining to identify Business process

flows; data fields, predefined processes, and successive processes must be considered for

testing. Before functional testing is complete, additional tests are identified and the effective

value of current tests is determined.

System Test

System testing ensures that the entire integrated software system meets requirements.

It tests a configuration to ensure known and predictable results. An example of system testing

is the configuration oriented system integration test. System testing is based on process

descriptions and flows, emphasizing pre-driven process links and integration points.

White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of

the inner workings, structure and language of the software, or at least its purpose. It is

purpose. It is used to test areas that cannot be reached from a black box level.

Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings,

structure or language of the module being tested. Black box tests, as most other kinds of tests,

must be written from a definitive source document, such as specification or requirements

document, such as specification or requirements document. It is a testing in which the

software under test is treated, as a black box .you cannot “see” into it. The test provides

inputs and responds to outputs without considering how the software works.

54

Page 55: Spatial approximate string search Doc

6.2 Unit Testing:

Unit testing is usually conducted as part of a combined code and unit test phase of the

software lifecycle, although it is not uncommon for coding and unit testing to be conducted as

two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

All field entries must work properly.

Pages must be activated from the identified link.

The entry screen, messages and responses must not be delayed.

Features to be tested

Verify that the entries are of the correct format

No duplicate entries should be allowed

All links should take the user to the correct page.

6.3 Integration Testing

Software integration testing is the incremental integration testing of two or more

integrated software components on a single platform to produce failures caused by interface

defects.

The task of the integration test is to check that components or software applications,

e.g. components in a software system or – one step up – software applications at the company

level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects

encountered.

6.4 Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant

participation by the end user. It also ensures that the system meets the functional

requirements.

Test Results: All the test cases mentioned above passed successfully. No defects

encountered.

55

Page 56: Spatial approximate string search Doc

CONCLUSION

CONCLUSION:

This paper presents a comprehensive study for spatial approximate string queries in both the

Euclidean space and road networks. We use the edit distance as the similarity measurement

for the string predicate and focus on the range queries as the spatial predicate. We also

address the problem of query selectivity estimation for queries in the Euclidean space. Future

work include examining spatial approximate sub-string queries, designing methods that are

more update-friendly, and solving the selectivity estimation problem for RSAS queries.

.

APPENDIX- A

REFERENCES

[1] S. Acharya, V. Poosala, and S. Ramaswamy. Selectivity estimation in

spatial databases. In SIGMOD, pages 13–24, 1999.

[2] S. Alsubaiee, A. Behm, and C. Li. Supporting location-based

56

Page 57: Spatial approximate string search Doc

approximate-keyword queries. In GIS, pages 61–70, 2010.

[3] A. Arasu, S. Chaudhuri, K. Ganjam, and R. Kaushik. Incorporating

string transformations in record matching. In SIGMOD, pages 1231–

1234, 2008.

[4] A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins.

In VLDB, pages 918–929, 2006.

[5] N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger. The R_-

tree: an efficient and robust access method for points and rectangles. In

SIGMOD, pages 322–331, 1990.

[6] A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Minwise

independent permutations (extended abstract). In STOC, pages

327–336, 1998.

[7] X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-based

relevant spatial web objects. Proc. VLDB Endow., 3:373–384, 2010.

[8] K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin. An efficient filter

for approximate membership checking. In SIGMOD, pages 805–818,

2008.

[9] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and

efficient fuzzy match for online data cleaning. In SIGMOD, pages 313–

324, 2003.

57