110
BARKATULLAH UNIVERSITY INSTITUTE OF TECHNOLOGY BARKATULLAH UNIVERSITY, BHOPAL Department of Computer Science & Engineering PROJECT REPORT ON FREQUENT USER ACCESS PATTERN Submitted for the fulfillment of the requirement for the award of the Degree Bachelor of Engineering (B.E.) 2009-2010 of Barkatullah University, Bhopal By ANIMESH ATUL DINESH MAHESH SANJAY DUBEY RATHORE AHIRWAR PATIL KASDE Page | 1

Major Project Report by Vikash Kumar

Embed Size (px)

DESCRIPTION

HI, This is the format of a major project for the student of computer science and engg.

Citation preview

Page 1: Major Project Report by Vikash Kumar

BARKATULLAH UNIVERSITY INSTITUTE OF TECHNOLOGY

BARKATULLAH UNIVERSITY, BHOPAL

Department of Computer Science & Engineering

PROJECT REPORT

ON

FREQUENT USER ACCESS PATTERN

Submitted for the fulfillment of the requirement for the award of the DegreeBachelor of Engineering (B.E.)

2009-2010of Barkatullah University, Bhopal

By

ANIMESH ATUL DINESH MAHESH SANJAY DUBEY RATHORE AHIRWAR PATIL KASDE

Under the Guidance Of

Mr. Amit Baghel

Page | 1

Page 2: Major Project Report by Vikash Kumar

BARKATULLAH UNIVERSITY INSTITUTE OF TECHNOLOGY

BARKATULLAH UNIVERSITY, BHOPAL

Department of Computer Science & Engineering

CERTIFICATE

YEAR 2009-2010

This is to certify that Animesh Dubey, Atul Rathore, Dinesh Ahirwar, Mahesh Patil, Sanjay Kasde, students of final year of B. E. Computer Science and Engineering Branch have successfully completed this project work titled “FREQUENT USER ACCESS PATTERN” in fulfillment for award of degree in Bachelor of Engineering (Computer Science&Engineering) in the year 2009-2010 by Barkatullah University, Bhopal.

Page | 2

Mr. Amit Baghel(Project Guide &

Coordinator)

Mr. Divakar Singh(Head of the Department Computer Sc. & Engg.)

Dr. R. K. Pandey (Director)

Page 3: Major Project Report by Vikash Kumar

BARKATULLAH UNIVERSITY INSTITUTE OF TECHNOLOGY

BARKATULLAH UNIVERSITY, BHOPAL

Department of Computer Science & Engineering

DECLARATIONYEAR 2009-2010

This project report entitled “FREQUENT USER ACCESS PATTERN” submitted in the fulfillment of the Bachelor of Engineering Degree in Computer Science & Engineering of Barkatullah University, Bhopal during the year 2009-2010 is an authentic record of our project work. To the best of our knowledge this is our original work and was not submitted earlier for award of any other degree, diploma or certificate.

ANIMESH ATUL DINESH MAHESH SANJAY DUBEY RATHORE AHIRWAR PATIL KASDE

Page | 3

Page 4: Major Project Report by Vikash Kumar

ACKNOWLEDGEMENT

We wish to acknowledge our profound sense of gratitude to our project guide

Mr. AMIT BAGHEL, Department of Computer Science and Engineering,

Barkatullah University Institute of Technology, Barkatullah University, Bhopal for

their remarkable guidance and continued encouragement during the preparation of this

project. Indeed it was a matter of great felicity and privilege for us to work under their

aegis. We express our thankfulness to them for their dedicated inspiration, lively interest

and patience through our errors, without which it would have been impossible to bring the

project to near completion.

We also wish to thank Mr. DIVAKAR SINGH, Head of Department, Computer

Science and Engineering, Barkatullah University Institute of Technology, Barkatullah

University who directly or indirectly helped us during the course of this project.

We would like to thank DR. R.K.PANDEY, Director, Barkatullah University

Institute of Technology, Barkatullah University Bhopal for all the encouragement and

facilities provided to us. Last but not the least we would like to thank our parents for their

support and cooperation.

Regardless of the source we wish to express our gratitude to those who may have

contributed to this work, even though anonymously.

ANIMESH DUBEY ATUL RATHORE DINESH AHIRWAR MAHESH PATIL SANJAY KASDE

Page | 4

Page 5: Major Project Report by Vikash Kumar

Table Of Contents

Page no.

1. PROJECT1.1 Introduction 71.2 Objectives 81.3 Technical description of the project. 91.4 Algorithm 361.5 Scope 46

2. HARDWARE & SOFTWARE REQUIREMENTS2.1 Hardware Requirements. 512.2 Software Requirements. 51

3. SYSTEM ANALYSIS & DESIGN 3.1 Requirement Analysis 53

3.2 Feasibility 56

4. SOFTWARE SELECTION CRITERIA4.1 Data Model 584.2 Data Flow Diagram 63

5. TABLES 64

6. OUTPUT 68

7. LIMITATIONS 78

8. BIBLIOGRAPHY 80

Page | 5

Page 6: Major Project Report by Vikash Kumar

PROJECT

Page | 6

Page 7: Major Project Report by Vikash Kumar

INTRODUCTION

The main aim of this Project is to demonstrate the frequent access patterns of the website

userswhile they access the site.The patterns thus generated are analysed and access patterns

are generated on the basis of FAP ALGORITHM.These patterns thus generated are fed to the

FAP TREE which generates the Frequent Pattern Tree. This Project proposes a method that can

discover users’ frequent access patterns underlying users’ browsing web behaviors. It

usesconcept of access pattern according to a user’s access path, and secondly puts forward a

revised algorithm (FAP-Mining) based on the FP-tree algorithm to mine frequent access patterns.

This algorithm first constructs a frequent access pattern tree and then mines users ’frequent

access patterns on the tree.

The tree is then used to Cash the frequently accsesd pages according to the occurrence in the

tree to enhance the space efficiency in cashing process.

Page | 7

Page 8: Major Project Report by Vikash Kumar

OBJECTIVE

This paper proposes a method that can discover users’ frequent access patterns underlying

users’ browsing web behaviors. Firstly, the paper introduces the concept of access pattern

according to a user’s access path, and secondly puts forward a revised algorithm (FAP-Mining)

based on the FP-tree algorithm to mine frequent access patterns. The new algorithm first

constructs a frequent access pattern tree and then mines users ’frequent access patterns on the

tree. What’s more, the algorithm is accurate and scalable for mining frequent access patterns

with different lengths.

The main objective of the project is to find out the frequent access pattern followed by the

website users and therby help the website owner to modify the contents of his website

1. Mining the web access log records to discover different access patterns of different

users.

2. Improve the link structure of web pages, enhance web server performance and provide

characteristic service for customers.

3. Preprocessing web server log data to get user access paths from user session files.

4.4. Mining frequent access patterns with different lengths.

5.5. Cashing the Frequent pages of the Website to increase the sever performanceCashing the Frequent pages of the Website to increase the sever performance..

All these objectives are well covered in our project and hence is a comprehensive tool to

facilitate the working and proceduring of the section of academics section pertaining to

students.

Page | 8

Page 9: Major Project Report by Vikash Kumar

TECHNICAL DESCRIPTION

Page | 9

Page 10: Major Project Report by Vikash Kumar

FRONT END

ASP . NET 2.0

VISUAL STUDIO 2005

C #

ASP.NET

It is a web application framework developed and marketed by Microsoft to allow programmers

to build dynamic web sites, web applications and web services. It was first released in January

2002 with version 1.0 of the .NET Framework, and is the successor to Microsoft's Active Server

Pages (ASP) technology. ASP.NET is built on the Common Language Runtime (CLR), allowing

programmers to write ASP.NET code using any supported .NET language. The ASP.NET SOAP

extension framework allows ASP.NET components to process SOAP messages.

History

After the release of Internet Information Services 4.0 in 1997, Microsoft began researching

possibilities for a new web application model that would solve common complaints about ASP,

especially with regard to separation of presentation and content and being able to write "clean"

code.[1] Mark Anders, a manager on the IIS team, and Scott Guthrie, who had joined Microsoft in

1997 after graduating from Duke University, were tasked with determining what that model would

look like. The initial design was developed over the course of two months by Anders and Guthrie,

and Guthrie coded the initial prototypes during the Christmas holidays in 1997.

The initial prototype was called "XSP"; Guthrie explained in a 2007 interview that,

"People would always ask what the X stood for. At the time it really didn't stand for anything. XML

started with that; XSLT started with that. Everything cool seemed to start with an X, so that's what

we originally named it." The initial prototype of XSP was done using Java,[3] but it was soon

decided to build the new platform on top of the Common Language Runtime (CLR), as it offered

an object-oriented programming environment, garbage collection and other features that were Page | 10

Page 11: Major Project Report by Vikash Kumar

seen as desirable features that Microsoft's Component Object Model platform didn't support.

Guthrie described this decision as a "huge risk", as the success of their new web development

platform would be tied to the success of the CLR, which, like XSP, was still in the early stages of

development, so much so that the XSP team was the first team at Microsoft to target the CLR.

With the move to the Common Language Runtime, XSP was re-implemented in C#

(known internally as "Project Cool" but kept secret from the public), and the name changed to

ASP+, as by this point the new platform was seen as being the successor to Active Server

Pages, and the intention was to provide an easy migration path for ASP developers.[4]

Mark Anders first demonstrated ASP+ at the ASP Connections conference in Phoenix, Arizona

on May 2, 2000. Demonstrations to the wide public and initial beta release of ASP+ (and the rest

of the .NET Framework) came at the 2000 Professional Developers Conference on July 11, 2000

in Orlando, Florida. During Bill Gates' keynote presentation, Fujitsu demonstrated ASP+ being

used in conjunction with COBOL,[5] and support for a variety of other languages was announced,

including Microsoft's new Visual Basic .NET and C# languages, as well as Python and Perl

support by way of interoperability tools created by ActiveState.

Once the ".NET" branding was decided on in the second half of 2000, it was decided

to rename ASP+ to ASP.NET. Mark Anders explained on an appearance on The MSDN Show

that year that, "The .NET initiative is really about a number of factors, it's about delivering

software as a service, it's about XML and web services and really enhancing the Internet in terms

of what it can do ... we really wanted to bring its name more in line with the rest of the platform

pieces that make up the .NET framework."

After four years of development, and a series of beta releases in 2000 and 2001,

ASP.NET 1.0 was released on January 5, 2002 as part of version 1.0 of the .NET Framework.

Even prior to the release, dozens of books had been written about ASP.NET, and Microsoft

promoted it heavily as part of their platform for web services. Guthrie became the product unit

manager for ASP.NET, and development continued apace, with version 1.1 being released on

April 24, 2003 as a part of Windows Server 2003. This release focused on improving ASP.NET's

support for mobile devices.

Page | 11

Page 12: Major Project Report by Vikash Kumar

Characteristics

Pages

.NET pages, known officially as "web forms", are the main building block for application

development.[8] Web forms are contained in files with an ".aspx" extension; these files typically

contain static (X)HTML markup, as well as markup defining server-side Web Controls and User

Controls where the developers place all the required static and dynamic content for the web

page. Additionally, dynamic code which runs on the server can be placed in a page within a block

<% -- dynamic code -- %> which is similar to other web development technologies such as PHP, JSP,

and ASP, but this practice is generally discouraged except for the purposes of data binding since

it requires more calls when rendering the page.

Note that this sample uses code "inline", as opposed to code-behind.

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<script runat="server">

protected void Page_Load(object sender, EventArgs e)

{

Label1.Text = DateTime.Now.ToLongTimeString();

}

</script>

<html xmlns="http://www.w3.org/1999/xhtml">

<head runat="server">

<title>Sample page</title>

</head>

<body>

<form id="form1" runat="server">

<div>

The current time is: <asp:Label runat="server" id="Label1" />

Page | 12

Page 13: Major Project Report by Vikash Kumar

</div>

</form>

</body>

</html>

Code-behind model

Microsoft recommends dealing with dynamic program code by using the code-behind model,

which places this code in a separate file or in a specially designated script tag. Code-behind files

typically have names like MyPage.aspx.cs or MyPage.aspx.vb while the page file is

MyPage.aspx (same filename as the page file (ASPX), but with the final extension denoting the

page language). This practice is automatic in Microsoft Visual Studio and other IDEs. When using

this style of programming, the developer writes code to respond to different events, like the page

being loaded, or a control being clicked, rather than a procedural walk through the document.

ASP.NET's code-behind model marks a departure from Classic ASP in that it encourages

developers to build applications with separation of presentation and content in mind. In theory,

this would allow a web designer, for example, to focus on the design markup with less potential

for disturbing the programming code that drives it. This is similar to the separation of the

controller from the view in model-view-controller frameworks.

Example

<%@ Page Language="C#" CodeFile="SampleCodeBehind.aspx.cs" Inherits="Website.SampleCodeBehind"

AutoEventWireup="true" %>

The above tag is placed at the beginning of the ASPX file. The CodeFile property of the

@ Page directive specifies the file (.cs or .vb) acting as the code-behind while the Inherits

property specifies the Class the Page derives from. In this example, the @ Page directive is

included in SampleCodeBehind.aspx, then SampleCodeBehind.aspx.cs acts as the code-behind

for this page:

using System;

namespace Website

{

public partial class SampleCodeBehind : System.Web.UI.Page

{

Page | 13

Page 14: Major Project Report by Vikash Kumar

protected void Page_Load(object sender, EventArgs e)

{

Response.Write("Hello, world");

}

}

}

In this case, the Page_Load() method is called every time the ASPX page is requested.

The programmer can implement event handlers at several stages of the page execution process

to perform processing.

User controls

An event bubbling mechanism provides the ability to pass an event fired by a user control up to

its containing page.

Custom Controls

Programmers can also build Custom controls for ASP.NET applications. Unlike User controls,

these controls don't have an ASCX markup-file, having all their code compiled into a DLL-file.

Such Custom controls can be used across multiple web-applications and Visual Studio projects

(which is not allowed with User controls). By using a Register directive, the control is loaded from

the DLL.

Rendering technique

ASP.NET uses a visited composites rendering technique. During compilation, the template

(.aspx) file is compiled into initialization code which builds a control tree (the composite)

representing the original template. Literal text goes into instances of the Literal control class, and

server controls are represented by instances of a specific control class. The initialization code is

combined with user-written code (usually by the assembly of multiple partial classes) and results

in a class specific for the page. The page doubles as the root of the control tree.

Actual requests for the page are processed through a number of steps. First, during the

initialization steps, an instance of the page class is created and the initialization code is executed.

This produces the initial control tree which is now typically manipulated by the methods of the

page in the following steps. As each node in the tree is a control represented as an instance of a

Page | 14

Page 15: Major Project Report by Vikash Kumar

class, the code may change the tree structure as well as manipulate the properties/methods of

the individual nodes. Finally, during the rendering step a visitor is used to visit every node in the

tree, asking each node to render itself using the methods of the visitor. The resulting HTML

output is sent to the client.

After the request has been processed, the instance of the page class is discarded and with it the

entire control tree. This is usually a source of confusion among novice ASP.NET programmers

that rely on class instance members that are lost with every page request/response cycle.

State management

ASP.NET applications are hosted by a web server and are accessed using the stateless HTTP

protocol. As such, if an application uses stateful interaction, it has to implement state

management on its own. ASP.NET provides various functions for state management.

Conceptually, Microsoft treats "state" as GUI state; problems may arise if an application needs to

keep track of "data state", for example, a finite state machine which may be in a transient state

between requests (lazy evaluation) or which takes a long time to initialize.

Application state

Application state is held by a collection of shared user-defined variables. These are set and

initialized when the Application_OnStart event fires on the loading of the first instance of the

applications and are available till the last instance exits. Application state variables are accessed

using the Applications collection, which provides a wrapper for the application state variables.

Application state variables are identified by name.

Session state

Server-side session state is held by a collection of user-defined session variables, which are

persisted during a user session. These variables, accessed using the Session collection, are

unique to each session instance. The variables can be set to be automatically destroyed after a

defined time of inactivity, even if the session does not end. Client-side user session is maintained

by either a cookie or by encoding the session ID in the URL itself.

ASP.NET supports three modes of persistence for session variables:

In Process ModePage | 15

Page 16: Major Project Report by Vikash Kumar

When using In Process Mode, the session variables are maintained within the ASP.NET

process. This is the fastest way; however, in this mode the variables are destroyed when

the ASP.NET process is recycled or shut down.

ASPState Mode

In this mode, ASP.NET runs a separate Windows service that maintains the state

variables. As state management happens outside the ASP.NET process and .NET

Remoting must be utilized by the ASP.NET engine to access the data there is a negative

impact on performance compared to the In Process mode. This mode allows an ASP.NET

application to be load-balanced and scaled across multiple servers and because the state

management service runs independently of ASP.NET, the session variables can persist

across ASP.NET process shutdowns. However, since session state server runs as a

single instance it is still a single point of failure so far as session state is concerned. The

session-state service cannot be load balanced and there are restrictions on types that can

be stored in a session variable.

SqlServer Mode

In this mode, the state variables are stored in a database allowing session variables to be

persisted across ASP.NET process shutdowns. The main advantage of this mode is that it

allows the application to balance load on a server cluster, sharing sessions between

servers. This is the slowest method of session state management in ASP.NET.

View state

View state refers to the page-level state management mechanism, utilized by the HTML pages

emitted by ASP.NET applications to maintain the state of the web form controls and widgets. The

state of the controls is encoded and sent to the server at every form submission in a hidden field

known as __VIEWSTATE. The server sends back the variable so that when the page is re-

rendered, the controls render at their last state. At the server side, the application may change

the viewstate, if the processing requires a change of state of any control. The states of individual

controls are decoded at the server, and are available for use in ASP.NET pages using the

ViewState collection.

The main use for this is to preserve form information across postbacks. View state is turned on

by default and normally serializes the data in every control on the page regardless of whether it is

actually used during a postback. This behavior can (and should) be modified, however, as View

state can be disabled on a per-control, per-page, or server-wide basis.

Page | 16

Page 17: Major Project Report by Vikash Kumar

Developers need to be wary of storing sensitive or private information in the View state of a page

or control, as the base64 string containing the view state data can easily be de-serialized. By

default, View state does not encrypt the __VIEWSTATE value. Encryption can be enabled on a

server-wide (and server-specific) basis, allowing for a certain level of security to be maintained.

Server-side caching

ASP.NET offers a "Cache" object that is shared across the application and can also be used to

store various objects. The "Cache" object holds the data only for a specified amount of time and

is automatically cleaned after the session time-limit elapses.

Template engine

When first released, ASP.NET lacked a template engine. Because the .NET framework is object-

oriented and allows for inheritance, many developers would define a new base class that inherits

from "System.Web.UI.Page", write methods here that render HTML, and then make the pages in

their application inherit from this new class. While this allows for common elements to be reused

across a site, it adds complexity and mixes source code with markup. Furthermore, this method

can only be visually tested by running the application - not while designing it. Other developers

have used include files and other tricks to avoid having to implement the same navigation and

other elements in every page.

ASP.NET 2.0 introduced the concept of "master pages", which allow for template-based page

development. A web application can have one or more master pages, which, beginning with

ASP.NET 3.5, can be nested. Master templates have place-holder controls, called

ContentPlaceHolders to denote where the dynamic content goes, as well as HTML and

JavaScript shared across child pages.

Child pages use those ContentPlaceHolder controls, which must be mapped to the place-holder

of the master page that the content page is populating. The rest of the page is defined by the

shared parts of the master page, much like a mail merge in a word processor. All markup and

server controls in the content page must be placed within the ContentPlaceHolder control.

When a request is made for a content page, ASP.NET merges the output of the content page

with the output of the master page, and sends the output to the user.

Page | 17

Page 18: Major Project Report by Vikash Kumar

The master page remains fully accessible to the content page. This means that the content page

may still manipulate headers, change title, configure caching etc. If the master page exposes

public properties or methods (e.g. for setting copyright notices) the content page can use these

as well.

Directory structure

In general, the ASP.NET directory structure can be determined by the developer's preferences.

Apart from a few reserved directory names, the site can span any number of directories. The

structure is typically reflected directly in the urls. Although ASP.NET provides means for

intercepting the request at any point during processing, the developer is not forced to funnel

requests through a central application or front controller.

The special directory names (from ASP.NET 2.0 on) are

App_Browsers 

holds site-specific browser definition files.

App_Code 

This is the "raw code" directory. The ASP.NET server automatically compiles files (and

subdirectories) in this folder into an assembly which is accessible in the code of every

page of the site. App_Code will typically be used for data access abstraction code, model

code and business code. Also any site-specific http handlers and modules and web

service implementation go in this directory. As an alternative to using App_Code the

developer may opt to provide a separate assembly with precompiled code.

App_Data 

default directory for databases, such as Access mdb files and SQL Server mdf files. This

directory is usually the only one with write access for the application.

App_LocalResources 

Contains localized resource files for individual pages of the site. E.g. a file called

CheckOut.aspx.fr-FR.resx holds localized resources for the French version of the

CheckOut.aspx page. When the UI culture is set to french, ASP.NET will automatically find

and use this file for localization.

App_GlobalResources 

Page | 18

Page 19: Major Project Report by Vikash Kumar

Holds resx files with localized resources available to every page of the site. This is where

the ASP.NET developer will typically store localized messages etc. which are used on

more than one page.

App_Themes 

holds alternative themes of the site.

App_WebReferences 

holds discovery files and WSDL files for references to web services to be consumed in the

site.

Bin 

Contains compiled code (.dll files) for controls, components, or other code that you want to

reference in your application. Any classes represented by code in the Bin folder are

automatically referenced in your application.

Performance

ASP.NET aims for performance benefits over other script-based technologies (including Classic

ASP) by compiling the server-side code to one or more DLL files on the web server. This

compilation happens automatically the first time a page is requested (which means the developer

need not perform a separate compilation step for pages). This feature provides the ease of

development offered by scripting languages with the performance benefits of a compiled binary.

However, the compilation might cause a noticeable but short delay to the web user when the

newly-edited page is first requested from the web server, but won't again unless the page

requested is updated further.

The ASPX and other resource files are placed in a virtual host on an Internet Information

Services server (or other compatible ASP.NET servers; see Other Implementations, below). The

first time a client requests a page, the .NET framework parses and compiles the file(s) into

a .NET assembly and sends the response; subsequent requests are served from the DLL files.

By default ASP.NET will compile the entire site in batches of 1000 files upon first request. If the

compilation delay is causing problems, the batch size or the compilation strategy may be

tweaked.

Developers can also choose to pre-compile their "codebehind" files before deployment, using MS

Visual Studio, eliminating the need for just-in-time compilation in a production environment. This

also eliminates the need of having the source code on the web server.

Page | 19

Page 20: Major Project Report by Vikash Kumar

Page | 20

Page 21: Major Project Report by Vikash Kumar

Microsoft Visual Studio

It is an Integrated Development Environment (IDE) from Microsoft. It can be used to

develop console and graphical user interface applications along with Windows Forms

applications, web sites, web applications, and web services in both native code together with

managed code for all platforms supported by Microsoft Windows, Windows Mobile, Windows CE,

.NET Framework, .NET Compact Framework and Microsoft Silverlight.

Visual Studio includes a code editor supporting IntelliSense as well as code refactoring.

The integrated debugger works both as a source-level debugger and a machine-level debugger.

Other built-in tools include a forms designer for building GUI applications, web designer, class

designer, and database schema designer. It accepts plug-ins that enhance the functionality at

almost every level—including adding support for source-control systems (like Subversion and

Visual SourceSafe) and adding new toolsets like editors and visual designers for domain-specific

languages or toolsets for other aspects of the software development lifecycle (like the Team

Foundation Server client: Team Explorer).

Visual Studio supports different programming languages by means of language

services, which allow the code editor and debugger to support (to varying degrees) nearly any

programming language, provided a language-specific service exists. Built-in languages include

C/C++ (via Visual C++), VB.NET (via Visual Basic .NET), C# (via Visual C#), and F# (as of Visual

Studio 2010[3]). Support for other languages such as M, Python, and Ruby among others is

available via language services installed separately. It also supports XML/XSLT, HTML/XHTML,

JavaScript and CSS. Individual language-specific versions of Visual Studio also exist which

provide more limited language services to the user: Microsoft Visual Basic, Visual J#, Visual C#,

and Visual C++.

Page | 21

Page 22: Major Project Report by Vikash Kumar

Microsoft provides "Express" editions of its Visual Studio 2010 components Visual

Basic, Visual C#, Visual C++, and Visual Web Developer at no cost. Visual Studio 2010, 2008

and 2005 Professional Editions, along with language-specific versions (Visual Basic, C++, C#,

J#) of Visual Studio 2005 are available for free to students as downloads via Microsoft's

DreamSpark program.

Architecture

Visual Studio does not support any programming language, solution or tool intrinsically. Instead,

it allows plugging in various types of functionality. Specific functionality is coded as a VSPackage.

When installed, the functionality is available as a Service. The IDE provides three services:

SVsSolution, which provides the ability to enumerate projects and solutions; SVsUIShell, which

provides windowing and UI functionality (including tabs, toolbars and tool windows); and

SVsShell, which deals with registration of VSPackages. In addition, the IDE is also responsible

for coordinating and enabling communication between services. All editors, designers, project

types and other tools are implemented as VSPackages. Visual Studio uses COM to access the

VSPackages. The Visual Studio SDK also includes the Managed Package Framework (MPF),

which is a set of managed wrappers around the COM-interfaces that allow the Packages to be

written in any CLI compliant language. However, MPF does not provide all the functionality

exposed by the Visual Studio COM interfaces. The services can then be consumed for creation

of other packages, which add functionality to the Visual Studio IDE.

Support for programming languages is added by using a specific VSPackage called a Language

Service. A language service defines various interfaces which the VSPackage implementation can

implement to add support for various functionalities. Functionalities that can be added this way

include syntax coloring, statement completion, brace matching, parameter information tooltips,

member lists and error markers for background compilation. If the interface is implemented, the

functionality will be available for the language. Language services are to be implemented on a

per-language basis. The implementations can reuse code from the parser or the compiler for the

language. Language services can be implemented either in native code or managed code. For

native code, either the native COM interfaces or the Babel Framework (part of Visual Studio

SDK) can be used. For managed code, the MPF includes wrappers for writing managed

language services.

Page | 22

Page 23: Major Project Report by Vikash Kumar

Visual Studio does not include any source control support built in but it defines the MSSCCI

(Microsoft Source Code Control Interface) by implementing which source control systems can

integrate with the IDE. MSSCCI defines a set of functions that are used to implement various

source control functionality. MSSCCI was first used to integrate Visual SourceSafe with Visual

Studio 6.0 but was later opened up via the Visual Studio SDK. Visual Studio .NET 2002 used

MSSCCI 1.1, and Visual Studio .NET 2003 used MSSCCI 1.2. Both Visual Studio 2005 and 2008

use MSSCCI Version 1.3, which adds support for rename and delete propagation as well as

asynchronous opening.

Visual Studio supports running multiple instances of the environment (each with its own set of

VSPackages). The instances use different registry hives (see MSDN's definition of the term

"registry hive" in the sense used here) to store their configuration state and are differentiated by

their AppId (Application ID). The instances are launched by an AppId-specific .exe that selects

the AppId, sets the root hive and launches the IDE. VSPackages registered for one AppId are

integrated with other VSPackages for that AppId. The various product editions of Visual Studio

are created using the different AppIds. The Visual Studio Express edition products are installed

with their own AppIds, but the Standard, Professional and Team Suite products share the same

AppId. Consequently, one can install the Express editions side-by-side with other editions, unlike

the other editions which update the same installation. The professional edition includes a

superset of the VSPackages in the standard edition and the team suite includes a superset of the

VSPackages in both other editions. The AppId system is leveraged by the Visual Studio Shell in

Visual Studio 2008

Features

Code editor

The Visual Studio code editor showing IntelliSense suggestions and a docked Task List window

Visual Studio, like any other IDE, includes a code editor that supports syntax highlighting and

code completion using IntelliSense for not only variables, functions and methods but also

language constructs like loops and queries. IntelliSense is supported for the included languages,

as well as for XML and for Cascading Style Sheets and JavaScript when developing web sites

and web applications. Auto complete suggestions are popped up in a modeless list box, overlaid

on top of the code editor. In Visual Studio 2008 onwards, it can be made temporarily semi-

transparent to see the code obstructed by it. The code editor is used for all supported languages.

Page | 23

Page 24: Major Project Report by Vikash Kumar

The Visual Studio code editor also supports setting bookmarks in code for quick navigation.

Other navigational aids include collapsing code blocks and incremental search, in addition to

normal text search and reg ex search. The code editor also includes a multi-item clipboard and a

task list. The code editor supports code snippets, which are saved templates for repetitive code

and can be inserted into code and customized for the project being worked on. A management

tool for code snippets is built in as well. These tools are surfaced as floating windows which can

be set to automatically hide when unused or docked to the side of the screen. The Visual Studio

code editor also supports code refactoring including parameter reordering, variable and method

renaming, interface extraction and encapsulation of class members inside properties, among

others.

Visual Studio features background compilation (also called incremental compilation). As code is

being written, Visual Studio compiles it in the background in order to provide feedback about

syntax and compilation errors, which are flagged with a red wavy underline. Warnings are

marked with a green underline. Background compilation does not generate executable code,

since it requires a different compiler than the one used to generate executable code. Background

compilation was initially introduced with Microsoft Visual Basic but has now been expanded for all

included languages.

Debugger

Visual Studio includes a debugger that works both as a source-level debugger and as a machine-

level debugger. It works with both managed code as well as native code and can be used for

debugging applications written in any language supported by Visual Studio. In addition, it can

also attach to running processes and monitor and debug those processes. If source code for the

running process is available, it displays the code as it is being run. If source code is not available,

it can show the disassembly. The Visual Studio debugger can also create memory dumps as well

as load them later for debugging. Multi-threaded programs are also supported. The debugger can

be configured to be launched when an application running outside the Visual Studio environment

crashes.The debugger allows setting breakpoints (which allow execution to be stopped

temporarily at a certain position) and watches (which monitor the values of variables as the

execution progresses). Breakpoints can be conditional, meaning they get triggered when the

condition is met. Code can be stepped over, i.e., run one line (of source code) at a time. It can

either step into functions to debug inside it, or step over it, i.e., the execution of the function body

isn't available for manual inspection. The debugger supports Edit and Continue, i.e., it allows Page | 24

Page 25: Major Project Report by Vikash Kumar

code to be edited as it is being debugged (32 bit only; not supported in 64 bit). When debugging,

if the mouse pointer hovers over any variable, its current value is displayed in a tooltip ("data

tooltips"), where it can also be modified if desired. During coding, the Visual Studio debugger lets

certain functions be invoked manually from the Immediate tool window. The parameters to the

method are supplied at the Immediate window.

Designer

Visual Studio includes a host of visual designers to aid in the development of applications. These

tools include:

Visual Studio 2005 in Designer view The WPF Designer in Visual Studio

Visual Studio Web Designer in code editor Visual Studio 2005 in Class Designer

Windows Forms Designer

The Windows Forms designer is used to build GUI applications using Windows Forms. It

includes a palette of UI widgets and controls (including buttons, progress bars, labels,

layout containers and other controls) that can be dragged and dropped on a form surface.

Layout can be controlled by housing the controls inside other containers or locking them to

Page | 25

Page 26: Major Project Report by Vikash Kumar

the side of the form. Controls that display data (like textbox, list box, grid view, etc.) can be

data-bound to data sources like databases or queries. The UI is linked with code using an

event-driven programming model. The designer generates either C# or VB.NET code for

the application.

WPF Designer

The WPF designer, codenamed Cider, was introduced with Visual Studio 2008. Like the

Windows Forms designer it supports the drag and drop metaphor. It is used to author user

interfaces targeting Windows Presentation Foundation. It supports all WPF functionality

including databinding and automatic layout management. It generates XAML code for the

UI. The generated XAML file is compatible with Microsoft Expression Design, the

designer-oriented product. The XAML code is linked with code using a code-behind model.

Web designer/development

Visual Studio also includes a web-site editor and designer that allows web pages to be

authored by dragging and dropping widgets. It is used for developing ASP.NET

applications and supports HTML, CSS and JavaScript. It uses a code-behind model to link

with ASP.NET code. From Visual Studio 2008 onwards, the layout engine used by the web

designer is shared with Microsoft Expression Web. There is also ASP.NET MVC support

for MVC technology as a separate download and Dynamic Data project available from

Microsoft.

Class designer

The Class Designer is used to author and edit the classes (including its members and their

access) using UML modeling. The Class Designer can generate C# and VB.NET code

outlines for the classes and methods. It can also generate class diagrams from hand-

written classes.

Data designer

The data designer can be used to graphically edit database schemas, including typed

tables, primary and foreign keys and constraints. It can also be used to design queries

from the graphical view.

Mapping designer

From Visual Studio 2008 onwards, the mapping designer is used by LINQ to SQL to

design the mapping between database schemas and the classes that encapsulate the

Page | 26

Page 27: Major Project Report by Vikash Kumar

data. The new solution from ORM approach, ADO.NET Entity Framework, replaces and

improves the old technology.

C#

It is pronounced "see sharp", is a multi-paradigm programming language encompassing

imperative, functional, generic, object-oriented (class-based), and component-oriented

programming disciplines. It was developed by Microsoft within the .NET initiative and later

approved as a standard by Ecma (ECMA-334) and ISO (ISO/IEC 23270). C# is one of the

programming languages designed for the Common Language Infrastructure.

C# is intended to be a simple, modern, general-purpose, object-oriented programming language.

Its development team is led by Anders Hejlsberg. The most recent version is C# 4.0, which was

released in April 12, 2010.

Design goals

1. C# language is intended to be a simple, modern, general-purpose, object-oriented

programming language.

2. The language, and implementations thereof, should provide support for software

engineering principles such as strong type checking, array bounds checking, detection

of attempts to use uninitialized variables, and automatic garbage collection. Software

robustness, durability, and programmer productivity are important.

3. The language is intended for use in developing software components suitable for

deployment in distributed environments.

4. Source code portability is very important, as is programmer portability, especially for

those programmers already familiar with C and C++.

5. Support for internationalization is very important.

6. C# is intended to be suitable for writing applications for both hosted and embedded

systems, ranging from the very large that use sophisticated operating systems, down to

the very small having dedicated functions.

Page | 27

Page 28: Major Project Report by Vikash Kumar

7. Although C# applications are intended to be economical with regard to memory and

processing power requirements, the language was not intended to compete directly on

performance and size with C or assembly language.

History

During the development of the .NET Framework, the class libraries were originally written using a

managed code compiler system called Simple Managed C (SMC). In January 1999, Anders

Hejlsberg formed a team to build a new language at the time called Cool, which stood for "C-like

Object Oriented Language". Microsoft had considered keeping the name "Cool" as the final name

of the language, but chose not to do so for trademark reasons. By the time the .NET project was

publicly announced at the July 2000 Professional Developers Conference, the language had

been renamed C#, and the class libraries and ASP.NET runtime had been ported to C#.

C#'s principal designer and lead architect at Microsoft is Anders Hejlsberg, who was previously

involved with the design of Turbo Pascal, Embarcadero Delphi (formerly CodeGear Delphi and

Borland Delphi), and Visual J++. In interviews and technical papers he has stated that flaws in

most major programming languages (e.g. C++, Java, Delphi, and Smalltalk) drove the

fundamentals of the Common Language Runtime (CLR), which, in turn, drove the design of the

C# programming language itself.

James Gosling, who created the Java programming language in 1994, and Bill Joy, a co-founder

of Sun Microsystems, the proprietor of Java, called C# an "imitation" of Java; Gosling further

claimed that "[C# is] sort of Java with reliability, productivity and security deleted." Klaus Kreft and

Angelika Langer (authors of a C++ streams book) stated in a blog post that "Java and C# are

almost identical programming languages. Boring repetition that lacks innovation," "Hardly

anybody will claim that Java or C# are revolutionary programming languages that changed the

way we write programs," and "C# borrowed a lot from Java - and vice versa. Now that C#

supports boxing and unboxing, we'll have a very similar feature in Java." Anders Hejlsberg has

argued that C# is "not a Java clone" and is "much closer to C++" in its design.

Page | 28

Page 29: Major Project Report by Vikash Kumar

BACK END

SQL SERVER 2005

SQL Server 2005 (codenamed Yukon), released in October 2005, is the successor to SQL

Server 2000. It included native support for managing XML data, in addition to relational data. For

this purpose, it defined an xml data type that could be used either as a data type in database

columns or as literals in queries. XML columns can be associated with XSD schemas; XML data

being stored is verified against the schema. XML is converted to an internal binary data type

before being stored in the database. Specialized indexing methods were made available for XML

data. XML data is queried using XQuery ; CLR Integration was the main features with this edition

where one could write SQL code as Managed Code these are those code which are being

executed by CLR(Common Language Runtime). SQL Server 2005 added some extensions to the

T-SQL language to allow embedding XQuery queries in T-SQL. In addition, it also defines a new

extension to XQuery, called XML DML, that allows query-based modifications to XML data. SQL

Server 2005 also allows a database server to be exposed over web services using TDS packets

encapsulated within SOAP (protocol) requests. When the data is accessed over web services,

results are returned as XML.

For relational data, T-SQL has been augmented with error handling features (try/catch) and

support for recursive queries (Common Table Expressions). SQL Server 2005 has also been

enhanced with new indexing algorithms and better error recovery systems. Data pages are

checksummed for better error resiliency, and optimistic concurrency support has been added for

better performance. Permissions and access control have been made more granular and the

Page | 29

Page 30: Major Project Report by Vikash Kumar

query processor handles concurrent execution of queries in a more efficient way. Partitions on

tables and indexes are supported natively, so scaling out a database onto a cluster is easier.

SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET Framework.

SQL Server 2005 introduced "MARS" (Multiple Active Results Sets), a method of allowing usage

of database connections for multiple purposes

Architecture

Protocol layer

Protocol layer implements the external interface to SQL Server. All operations that can be

invoked on SQL Server are communicated to it via a Microsoft-defined format, called Tabular

Data Stream (TDS). TDS is an application layer protocol, used to transfer data between a

database server and a client. Initially designed and developed by Sybase Inc. for their Sybase

SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server,

TDS packets can be encased in other physical transport dependent protocols, including TCP/IP,

Named pipes, and Shared memory. Consequently, access to SQL Server is available over these

protocols. In addition, the SQL Server API is also exposed over web services.

Data storage

The main unit of data storage is a database, which is a collection of tables with typed columns.

SQL Server supports different data types, including primary types such as Integer, Float,

Decimal, Char (including character strings), Varchar (variable length character strings), binary

(for unstructured blobs of data), Text (for textual data) among others. The rounding of floats to

integers uses either Symmetric Arithmetic Rounding or Symmetric Round Down (Fix) depending

on arguments: SELECT Round(2.5, 0) gives 3.

Microsoft SQL Server also allows user-defined composite types (UDTs) to be defined and used.

It also makes server statistics available as virtual tables and views (called Dynamic Management

Views or DMVs). In addition to tables, a database can also contain other objects including views,

stored procedures, indexes and constraints, along with a transaction log. A SQL Server database

can contain a maximum of 231 objects, and can span multiple OS-level files with a maximum file

size of 220 TB. The data in the database are stored in primary data files with an extension .mdf. Page | 30

Page 31: Major Project Report by Vikash Kumar

Secondary data files, identified with an .ndf extension, are used to store optional metadata. Log

files are identified with the .ldf extension.

Storage space allocated to a database is divided into sequentially numbered pages, each 8 KB in

size. A page is the basic unit of I/O for SQL Server operations. A page is marked with a 96-byte

header which stores metadata about the page including the page number, page type, free space

on the page and the ID of the object that owns it. Page type defines the data contained in the

page - data stored in the database, index, allocation map which holds information about how

pages are allocated to tables and indexes, change map which holds information about the

changes made to other pages since last backup or logging, or contain large data types such as

image or text. While page is the basic unit of an I/O operation, space is actually managed in

terms of an extent which consists of 8 pages. A database object can either span all 8 pages in an

extent ("uniform extent") or share an extent with up to 7 more objects ("mixed extent"). A row in a

database table cannot span more than one page, so is limited to 8 KB in size. However, if the

data exceeds 8 KB and the row contains Varchar or Varbinary data, the data in those columns

are moved to a new page (or possibly a sequence of pages, called an Allocation unit) and

replaced with a pointer to the data.

For physical storage of a table, its rows are divided into a series of partitions (numbered 1 to n).

The partition size is user defined; by default all rows are in a single partition. A table is split into

multiple partitions in order to spread a database over a cluster. Rows in each partition are stored

in either B-tree or heap structure. If the table has an associated index to allow fast retrieval of

rows, the rows are stored in-order according to their index values, with a B-tree providing the

index. The data is in the leaf node of the leaves, and other nodes storing the index values for the

leaf data reachable from the respective nodes. If the index is non-clustered, the rows are not

sorted according to the index keys. An indexed view has the same storage structure as an

indexed table. A table without an index is stored in an unordered heap structure. Both heaps and

B-trees can span multiple allocation units.

Buffer management

SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-

memory, and the set of all pages currently buffered is called the buffer cache. The amount of

memory available to SQL Server decides how many pages will be cached in memory. The buffer

cache is managed by the Buffer Manager. Either reading from or writing to any page copies it to

Page | 31

Page 32: Major Project Report by Vikash Kumar

the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than

the on-disc version. The page is updated on the disc by the Buffer Manager only if the in-memory

cache has not been referenced for some time. While writing pages back to disc, asynchronous

I/O is used whereby the I/O operation is done in a background thread so that other operations do

not have to wait for the I/O operation to complete. Each page is written along with its checksum

when it is written. When reading the page back, its checksum is computed again and matched

with the stored version to ensure the page has not been damaged or tampered with in the

meantime.

Logging and Transaction

SQL Server ensures that any change to the data is ACID-compliant, i.e., it uses transactions to

ensure that any operation either totally completes or is undone if fails, but never leaves the

database in an intermediate state. Using transactions, a sequence of actions can be grouped

together, with the guarantee that either all actions will succeed or none will. SQL Server

implements transactions using a write-ahead log. Any changes made to any page will update the

in-memory cache of the page, simultaneously all the operations performed will be written to a log,

along with the transaction ID which the operation was a part of. Each log entry is identified by an

increasing Log Sequence Number (LSN) which ensure that no event overwrites another. SQL

Server ensures that the log will be written onto the disc before the actual page is written back.

This enables SQL Server to ensure integrity of the data, even if the system fails. If both the log

and the page were written before the failure, the entire data is on persistent storage and integrity

is ensured. If only the log was written (the page was either not written or not written completely),

then the actions can be read from the log and repeated to restore integrity. If the log wasn't

written then integrity is also maintained although the database state remains unchanged as if the

transaction never occurred. If it was only partially written, then the actions associated with the

unfinished transaction are discarded. Since the log was only partially written, the page is

guaranteed to have not been written, again ensuring data integrity. Removing the unfinished log

entries effectively undoes the transaction. SQL Server ensures consistency between the log and

the data every time an instance is restarted.

Concurrency and locking

SQL Server allows multiple clients to use the same database concurrently. As such, it needs to

control concurrent access to shared data, to ensure data integrity - when multiple clients update

Page | 32

Page 33: Major Project Report by Vikash Kumar

the same data, or clients attempt to read data that is in the process of being changed by another

client. SQL Server provides two modes of concurrency control: pessimistic concurrency and

optimistic concurrency. When pessimistic concurrency control is being used, SQL Server controls

concurrent access by using locks. Locks can be either shared or exclusive. Exclusive lock grants

the user exclusive access to the data - no other user can access the data as long as the lock is

held. Shared locks are used when some data is being read - multiple users can read from data

locked with a shared lock, but not acquire an exclusive lock. The latter would have to wait for all

shared locks to be released. Locks can be applied on different levels of granularity - on entire

tables, pages, or even on a per-row basis on tables. For indexes, it can either be on the entire

index or on index leaves. The level of granularity to be used is defined on a per-database basis

by the database administrator. While a fine grained locking system allows more users to use the

table or index simultaneously, it requires more resources. So it does not automatically turn into

higher performing solution. SQL Server also includes two more lightweight mutual exclusion

solutions - latches and spinlocks - which are less robust than locks but are less resource

intensive. SQL Server uses them for DMVs and other resources that are usually not busy. SQL

Server also monitors all worker threads that acquire locks to ensure that they do not end up in

deadlocks - in case they do, SQL Server takes remedial measures, which in many cases is to kill

one of the threads entangled in a deadlock and rollback the transaction it started. To implement

locking, SQL Server contains the Lock Manager. The Lock Manager maintains an in-memory

table that manages the database objects and locks, if any, on them along with other metadata

about the lock. Access to any shared object is mediated by the lock manager, which either grants

access to the resource or blocks it.

SQL Server also provides the optimistic concurrency control mechanism, which is similar to the

multiversion concurrency control used in other databases. The mechanism allows a new version

of a row to be created whenever the row is updated, as opposed to overwriting the row, i.e., a

row is additionally identified by the ID of the transaction that created the version of the row. Both

the old as well as the new versions of the row are stored and maintained, though the old versions

are moved out of the database into a system database identified as Tempdb. When a row is in the

process of being updated, any other requests are not blocked (unlike locking) but are executed

on the older version of the row. If the other request is an update statement, it will result in two

different versions of the rows - both of them will be stored by the database, identified by their

respective transaction IDs.

Page | 33

Page 34: Major Project Report by Vikash Kumar

Data retrieval

The main mode of retrieving data from an SQL Server database is querying for it. The query is

expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with

Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved. It

is processed by the query processor, which figures out the sequence of steps that will be

necessary to retrieve the requested data. The sequence of actions necessary to execute a query

is called a query plan. There might be multiple ways to process the same query. For example, for

a query that contains a join statement and a select statement, executing join on both the tables

and then executing select on the results would give the same result as selecting from each table

and then executing the join, but result in different execution plans. In such case, SQL Server

chooses the plan that is supposed to yield the results in the shortest possible time. This is called

query optimization and is performed by the query processor itself.

SQL Server includes a cost-based query optimizer which tries to optimize on the cost, in terms of

the resources it will take to execute the query. Given a query, the query optimizer looks at the

database schema, the database statistics and the system load at that time. It then decides which

sequence to access the tables referred in the query, which sequence to execute the operations

and what access method to be used to access the tables. For example, if the table has an

associated index, whether the index should be used or not - if the index is on a column which is

not unique for most of the columns (low "selectivity"), it might not be worthwhile to use the index

to access the data. Finally, it decides whether to execute the query concurrently or not. While a

concurrent execution is more costly in terms of total processor time, because the execution is

actually split to different processors might mean it will execute faster. Once a query plan is

generated for a query, it is temporarily cached. For further invocations of the same query, the

cached plan is used. Unused plans are discarded after some time.

SQL Server also allows stored procedures to be defined. Stored procedures are parameterized

T-SQL queries, that are stored in the server itself (and not issued by the client application as is

the case with general queries). Stored procedures can accept values sent by the client as input

Page | 34

Page 35: Major Project Report by Vikash Kumar

parameters, and send back results as output parameters. They can call defined functions, and

other stored procedures, including the same stored procedure (up to a set number of times).

SQL CLR

Microsoft SQL Server 2005 includes a component named SQL CLR ("Common Language

Runtime") via which it integrates with .NET Framework. Unlike most other applications that

use .NET Framework, SQL Server itself hosts the .NET Framework runtime, i.e., memory,

threading and resource management requirements of .NET Framework are satisfied by SQLOS

itself, rather than the underlying Windows operating system. SQLOS provides deadlock detection

and resolution services for .NET code as well. With SQL CLR, stored procedures and triggers

can be written in any managed .NET language, including C# and VB.NET. Managed code can

also be used to define UDT's (user defined types), which can persist in the database. Managed

code is compiled to .NET assemblies and after being verified for type safety, registered at the

database. After that, they can be invoked like any other procedure. However, only a subset of the

Base Class Library is available, when running code under SQL CLR. Most APIs relating to user

interface functionality are not available.

When writing code for SQL CLR, data stored in SQL Server databases can be accessed using

the ADO.NET APIs like any other managed application that accesses SQL Server data.

However, doing that creates a new database session, different from the one in which the code is

executing. To avoid this, SQL Server provides some enhancements to the ADO.NET provider

that allows the connection to be redirected to the same session which already hosts the running

code. Such connections are called context connections and are set by setting context connection

parameter to true in the connection string. SQL Server also provides several other enhancements

to the ADO.NET API, including classes to work with tabular data or a single row of data as well

as classes to work with internal metadata about the data stored in the database. It also provides

access to the XML features in SQL Server, including XQuery support. These enhancements are

also available in T-SQL Procedures in consequence of the introduction of the new XML Datatype

(query,value,nodes functions).

Page | 35

Page 36: Major Project Report by Vikash Kumar

ALGORITHM

Page | 36

Page 37: Major Project Report by Vikash Kumar

1.1 Introduction

Today Web Mining is one of the popular issues in data mining. Web Mining can be broadly

defined as the discovery and analysis of useful information from the World Wide Web. [I] In terms

of its research domains, web mining can be divided into three branches: Web Content Mining,

Web Structure Mining and Web Usage Mining. Web content mining describes the discovery of us

information from the web content/data/documents. The web content encompasses a broad rang

of data such as text, image, audio, video... . Web structure mining is the process of discovering

knowledge from the World Wide Web organization and links across the webs. Web usage mining

is the process of extracting interesting patterns or knowledge from various web access log

records.

During the process of web usage mining, The rules and patterns in web log records are explored

and analyzed mainly by means of these techniques relating to artificial intelligence, data mining,

database theory and so on. In general, web usage mining consists of three phases, namely data

preprocessing, pattern discovery and pattern analysis, shown in Figure l-l. he main purpose of

data preprocessing is to extract useful data from raw web log and then to map these data into the

abstract data form necessary for pattern discovery; Pattern discovery finds out various rules and

patterns, taking advantage of all sorts of data mining techniques such as statistics analysis,

association rule analysis, classification, clustering, and sequential pattern analysis and so on;

Pattern analysis filters out the “useless” rules discovered in the period of pattern discovery, and

then extracts the “interesting” rules and patterns for the end users.

In this paper, mining web access log records discovers different access patterns of different

users. In this way, web designers and maintainers can easily improve the link structure of web

pages, enhance web server performance and provide characteristic service for customers. For

example, corporations that deal with e-commerce can use this method to optimize customer

strategy, discover target customers, and let all kinds of customers realize their excellent and

Page | 37

Page 38: Major Project Report by Vikash Kumar

characteristic services. Consequently the mining of the web user access pattern can be applied

into many industries widely

Figure 1-1 Web Usage Mining

2 Problem Description : User access pattern is the concept based on user access path,

which is an important sequential pattern in web usage mining. After preprocessing web server

log data, we can get user access paths from user session files. Fig 2-1 shows the user's(Uid l)

browsing path through which Uidl accesses certain web site: A-B-C-D-B-G-E-H-G-C-A-1-K-1-D.

Figure 2-1. the User's(Uid,) Browsing Path

Page | 38

Page 39: Major Project Report by Vikash Kumar

The link structure between web pages can be mapped into a directed graph: Sitemap=(N, E).

N={pl, p2, …,pi, …, pn} (n=1,2,…) denotes the collection of all pages of the web site, 1in, |N||

(=n) denotes the number of pages. E={<pl, p2>, <p4, p7>, …, <pi,pj>…} denotes the set of all the

hyperlinks of all pages of the web site. <p i, pj> denotes that there is a hyperlink from page pi to pj,

ij, 1 i, jn.

Definition 1. xi, xi+1 N(1 i k-1), <xi, xi+1> E, the duple AP=(userid, {x1, x2, …, xi, …, xk-1,

xk}) (k2) is called userid's access pattern.

Example 1. In Figure 2-1, (Uidl, {A, B, C, D}) is one of the Uidl's access pattern.

The difference between user access pattern definited here and user access path is that access

pattern is always the forward reference of access path, whereas user access path includes

forward and backward references of web pages. For instance, the access path of page H shown

in Figure 2-1 is A-B-C-D-B-G-E-H, whereas the corresponding access pattern is APH=(Uidl, {A, B,

G, E, H}). So, it is concluded that access pattern presents users' browsing behavior more simply

and clearly than access path and it facilitates the analysis and mining of users' access laws as

well.

Definition 2. x N, that duple AP=(userid, {x}) is called userid's meta-access pattern.

Example 2. In Figure 2-1, (Uidl, {A}) is the Uidl's meta-access pattern.

Definition 3. If access pattern AP=(userid, {x1, x2, …, xi, …, xs-1, xs}) is known, we say that {xl, x2,

…, xi, xk-1, xk} (k s) is the length-k base of AP.

Example 3. {A, B}, {B, C}, and {C, D} are the length-2 bases of AP=(Uidl, {A, B, C, I)}); {A, B, C}

and {B, C, D} are the length-3 bases of AP=(Uidl, {A, B, C, D}).

Definition 4. APl=(useridl, {xl, x2, …, xi, …, xs-1, xs}) and AP2=(userid2, {y1, y2, …, yj, …, yt-1, yt})

(1 i s, 1 j t, s t), iff useridl= userid2, x1=yj , x2=yj+1 , …, xi=yj+i-1, …, xs-1=yj+s-1, we say

that API is the sub-pattern of AP2 , which is denoted as AP1 AP2.

Page | 39

Page 40: Major Project Report by Vikash Kumar

Definition 5. If access pattern AP=(userid, {xl, x2, .., xi, …, xk-1, xk}) (k=2, 3, …) is known,

xsand there doesn’t exist (userid, {x1, x2, …, xi, …, xk-1, xk, xs}), so AP is called userid’s

longest access pattern.

Definition 6. If it is known that access pattern AP=(userid, {x1, x2, …, xi, …, xk-1, xk}{k=2, 3, …) we

say that {he minimal access count among the access counts of all pages contained in AP is the

access count of AP, which is denoted as [AP].

Definition 7. If it is known that access pattern AP=(userid, {x1, x2, …, xi, …, xk-1, xk) {k=2, 3, …)

SupAP= ([AP] / [ Supuserid] ) * [Supuserid] presents the total number of the access counts of each

longest access pattern whose user-name is userid).

Definition 8.

( I ) access pattern AP=(userid {x1, x2, …, xi, …, xk-1, xk) {k=2, 3, …) if SupAp min_sup (the

threshold of support) and the number of pages contained in AP is 14 we say that AP is the

frequent length-n access pattern. All the sub-patterms of frequent access pattern are also

frequent access patterns.

( II) the longest access pattern AP=(userid {x1, x2, …, xi, …, xk-1, xk) {k=2, 3, …) if SupAp

min_sup, we say that AP is the longest frequent access pattern. This paper emphatically

discusses how to mine frequent access patterns that satisfy minimum support (minimum support

is the minimal threshold of support defined by experience.). Moreover, it must be indicated that

the support of access pattern defined here is different from the concept of support defined in

association rule. The support of association rule represents the relation among those items in the

same item set, whereas the support of access pattern reflects universality of certain access

pattern in the entire users' browsing behaviors. Consequently, it is necessary to distinguish

between the two similar concepts.

3. FAP-Mining Algorithm:Page | 40

Page 41: Major Project Report by Vikash Kumar

FP-growth is an algorithm with good functionality when it is used in mining association rules and

sequential patterns. There is no sequence among those elements of an item during mining

association rules, whereas access pattern mining requires sequential page access. Thus the Fp-

growth has to be revised before applied to mining user frequent access pattern. In this paper, the

new algorithm is called Frequent Access Pattern Mining (FAP-Mining). The FAP-Mining is divided

into two steps. Section 3.1 is Step One, which constructs frequent access pattern tree (FAP tree)

according to access paths derived from user session files, and records the access counts of each

page. Section 3.2 is Step Two, where the function of FAP-growth is used to mine both long and

short access patterns on the FAP tree.

3.1 The Construction of FAP-Tree

Algorithm: FAP_Tree(tree, p). Construct frequent access tree.

Input: The set of user access path p.

Output: The set of use access pattern.

Procedure FAP-Tree(T, p);

{

create-tree(T);

/Construct the root of FAP-Tree signed with ”null” /

while p<>nil do

{

if p.name is the same as the name of T’s ancestor (n) then

{

n.count:=n.count+ 1 ;

T:=n;

}

else

if p.name is the same as the name of T’s child(c) then

{

c. count :=c.count+ 1 ;

T:=c;

Page | 41

Page 42: Major Project Report by Vikash Kumar

}else

insert-tree(T, p);

/insert the new node of p into T, as a child of the current node /

p:=p.next;

}

}

In order to facilitate frequent access pattern generation and FAP tree traversal, a page header

table ascending on access counts of each page is built so that each page points to its occurrence

in the tree via ,a chain of node-link. Table 3-1 shows an episode of access path of certain user

contained in the user session file. According to access paths in Table 3-1, the function of FAP-

Tree constructs frequent access tree shown in Figure 3-1.

Figure 3-1 FAP Tree

3.2. FAP-growth

Page | 42

Page 43: Major Project Report by Vikash Kumar

The mining of FAP tree proceeds as follow. At first,according to the access counts of each

page-node in FAPtree, the function of FAP-growth generates each length-1frequent access

pattern (as an initial suffix), thenconstructs its prefix pattern base, and finally builds alonger

access pattern by every prefix pattern baseconnecting with its suffix. If the longer access

patternsatisfies the min-sup, it becomes a new frequent accesspattern The FAP-growth executes

the procedure until allfrequent length-1 access patterns have been done. In short,the FAP-growth

method transforms the problem of mininglong frequent access pattern to searching for shorter

ones(as initial saixes) and then concatenating the suffixes.

Algorithm: FAP-gowth(tree, a ), mine frequent access

pattern.

Input: FAP tree, min_sup=2(the minimal acces., 4- count

that satisfies the support threshold).

Output: the set of all the access patterns: a.

Procedure FAP-growth(tree, a );

{

for each a, .count>,min-sup

/ a, is a member of the page header table. /

{

generate access pattern p = a, ;

a:=a U p;

p:=a, .next;

/p points to the first location of a, in the FAP

tree./

while (pfnull) and (p.countbmin-sup) do

{

look for each a, ’s prefix access pattern base,

then construct access pattern /3, by a, ’s

prefuc access pattern base connecting with itself;

if p, 3 min-sup then

a:=a up,;

Page | 43

Page 44: Major Project Report by Vikash Kumar

/p points to the next location of ai in the FAP

1

tree./

1

1

The mining of the FAP tree shown in Figure 3-1 is

summarized in Table 3-2. And for the page-node D in the

page headler table as an example, we illustrate the mning

process of FAP-growth (tree, a ):

( 1 ) 11.count 3 min-sup, generate frequent len@h-l

access pattern p =(Userid, {D}), a :=a U p ;

( 2 ) p(=D.next) points to the first location of the

page-node: D in the FAP tree that lies in the branch of AB-

C-D;

(3) p.count (=2) 2min_sup, construct the set ofD’s

prefix access patternbase({{C}:3. {B,C}::!, {A,B,C}:3}),

then build access pattern ,8, by D’s prefix access pattern

base connecting with D. (shown in Figure 3-1). if Suppp,

>,min_su]p, ,8, is a frequent access pattern.

( 4 ) p(=p.next) points to the next location of the

page-node D in the FAP tree that lies in the branch of 12-1

-D; Because p.count(=l) doesn’t satisf;y the min--sup,

the function doesn’t mine ths branch. Then the function

will do with the next page-node in the page head table

after the page-node D have been done.

4. Conclusion

Page | 44

Page 45: Major Project Report by Vikash Kumar

The mining of frequent access pattern from user’sbrowsing web behaviors is worth applying, to

many fieldssuch as the design and mintenance of web site,e-commence and education-. The

FAP-]Mining methodproposed in this paper is feasible by extracting users’access patterns from

users’ access paths of certain web site.If being improved, this method would be widely applied to

many fields. Next, we will use a large number of data to teste the functionality of this method, and

make further exploration on the analysis of association rule and access pattern among users’

browsing behaviors

Page | 45

Page 46: Major Project Report by Vikash Kumar

SCOPE

Page | 46

Page 47: Major Project Report by Vikash Kumar

Web Usage Mining

Web usage mining is the type of Web mining activity that involves the automatic discovery of user

access patterns from one or more Web servers. As more organizations rely on the Internet and

the World Wide Web to conduct business, the traditional strategies and techniques for market

analysis need to be revisited in this context. Organizations often generate and collect large

volumes of data in their daily operations. Most of this information is usually generated

automatically by Web servers and collected in server access logs. Other sources of user

information include referrer logs which contains information about the referring pages for each

page reference, and user registration or survey data gathered via tools such as CGI scripts.

Analyzing such data can help these organizations to determine the life time value of customers,

cross marketing strategies across products, and effectiveness of promotional campaigns, among

other things. Analysis of server access logs and user registration data can also provide valuable

information on how to better structure a Web site in order to create a more effective presence for

the organization. In organizations using intranet technologies, such analysis can shed light on

more effective management of workgroup communication and organizational infrastructure.

Finally, for organizations that sell advertising on the World Wide Web, analyzing user access

patterns helps in targeting ads to specific groups of users.

Most of the existing Web analysis tools provide mechanisms for reporting user activity in the

servers and various forms of data filtering. Using such tools, for example, it is possible to

determine the number of accesses to the server and the individual files within the organization's

Web space, the times or time intervals of visits, and domain names and the URLs of users of the

Web server. However, in general, these tools are designed to deal handle low to moderate traffic

servers, and furthermore, they usually provide little or no analysis of data relationships among the

accessed files and directories within the Web space.

Page | 47

Page 48: Major Project Report by Vikash Kumar

1.Pros

Web mining essentially has many advantages which makes this technology attractive to

corporations including the government agencies. This technology has enabled ecommerce to do

personalized marketing, which eventually results in higher trade volumes. The government

agencies are using this technology to classify threats and fight against terrorism. The predicting

capability of the mining application can benefits the society by identifying criminal activities. The

companies can establish better customer relationship by giving them exactly what they need.

Companies can understand the needs of the customer better and they can react to customer

needs faster. The companies can find, attract and retain customers; they can save on production

costs by utilizing the acquired insight of customer requirements. They can increase profitability by

target pricing based on the profiles created. They can even find the customer who might default

to a competitor the company will try to retain the customer by providing promotional offers to the

specific customer, thus reducing the risk of losing a customer.

2.Cons

Web mining the technology itself doesn’t create issues, but this technology when used on data of

personal nature might cause concerns. The most criticized ethical issue involving web mining is

the invasion of privacy. Privacy is considered lost when information concerning an individual is

obtained, used, or disseminated, especially if this occurs without their knowledge or consent. The

obtained data will be analyzed, and clustered to form profiles; the data will be made anonymous

before clustering so that no individual can be linked directly to a profile. But usually the group

profiles are used as if they are personal profiles. Thus these applications de-individualize the

users by judging them by their mouse clicks. De-individualization, can be defined as a tendency

of judging and treating people on the basis of group characteristics instead of on their own

individual characteristics and merits. This trend has increased the amount of data being captured

and traded increasing the likeliness of one’s privacy being invaded. The companies which buy

the data are obliged make it anonymous and these companies are considered authors of any

specific release of mining patterns. They are legally responsible for the contents of the release;

any inaccuracies in the release will result in serious lawsuits, but there is no law preventing them

from trading the data.

Some mining algorithms might use controversial attributes like sex, race, religion, or sexual

orientation to categorize individuals. These practices might be against the anti-discrimination

Page | 48

Page 49: Major Project Report by Vikash Kumar

legislation. The applications make it hard to identify the use of such controversial attributes, and

there is no strong rule against the usage of such algorithms with such attributes. This process

could result in denial of service or a privilege to an individual based on his race, religion or sexual

orientation, right now this situation can be avoided by the high ethical standards maintained by

the data mining company. The collected data is being made anonymous so that, the obtained

data and the obtained patterns cannot be traced back to an individual. It might look as if this

poses no threat to one’s privacy, actually many extra information can be inferred by the

application by combining two separate unscrupulous data from the user.

Page | 49

Page 50: Major Project Report by Vikash Kumar

Page | 50

HARDWAREAND

SOFTWAREREQUIREMENTS

Page 51: Major Project Report by Vikash Kumar

HARDWARE REQUIREMENTS

1. Processor Requirement :-

a. Minimal:- Intel Pentium IV 2000 Mhz or Similar

2. Primary Memory Requirement

a. Minimal:- 256 MB 400Mhz SD RAM

3. Secondary Memory Requirement

a. Minimal:- 15 MB

SOFTWARE REQUIREMENTS

1. Operating System

a. Preferable:- Windows XP Service Pack 2 or Higher

2. Visual Studio 2005

3. .NET Framework 2.0

4. SQL Server 2005

Page | 51

Page 52: Major Project Report by Vikash Kumar

Page | 52

SYSTEM ANALYSISAND

DESIGN

Page 53: Major Project Report by Vikash Kumar

REQUIREMENT ANALYSIS

FUNCTIONAL & NON-FUNCTIONAL REQUREMENTS

Requirements are also divided in these two groups. These requirements are: -

1. FUNCTIONAL REQUIREMENT: - Functional requirement is interaction between system

and environment. It describes relationship between the input and output of the system.

Functional requirement specify which output should be produced from the given input. For

each functional requirement, a detailed description of all the data inputs and their source,

the units of measure, and the range of valid inputs must be specified.

For our project, the functional requirements may be stated as follows:

1 People: In this project four people are involved for project development and the

project guide continually monitors as well as guides us to develop the project.

2 Hardware Requirements:

Processor Requirement :-

o Minimal:- Intel Pentium IV 2.0 Ghz or Similar

o Optimal:- Intel Core 2 Duo 2.0 Ghz or Similar

Primary Memory Requirement

o Minimal:- 256 MB 400Mhz SD RAM

o Optimal:- 1 GB 800Mhz DDR2 RAM

Secondary Memory Requirement

o Minimal:- 15 MB

o Optimal:- 25MB

Page | 53

Page 54: Major Project Report by Vikash Kumar

3 Software Requirements:

Operating System

o Preferable:- Windows XP Service Pack 2 or Higher

Visual Studio 2005

.NET Framework 2.0

Sql server 2005

2. NON-FUNCTIONAL REQUIREMENT: -In non-functional requirement limits are provided

which means restriction is attached with the requirement and one has to fulfill or satisfy that

limit.

1. Time:The project has been completed within 2 and half months.

2. Input:The software registers new students data into the database and/or updates

his/her records when administrator enters a matching password with Scholar Number.

3. GENERAL TYPES OF REQUIREMENTS : -

1. Physical Environment: Physical environment requirements such as where is the

equipment to function located, are there any environmental restrictions like

temperature, humidity, magnetic interference, etc are to be gathered prior to the

development phase of the system. No special physical requirements are needed in our

project.

2. Interface: Interface requirements are such as is the input coming from one or more

Page | 54

Page 55: Major Project Report by Vikash Kumar

systems, is output going to one or more systems, is there any prescribed medium that the

data must use should be gathered. In our project the input comes from user as a text. It gets

stored in the data base.

3.User & Human Factors: User and Human factors consist of requirements as who will use

the system, will there be different types of users, what is the skill levels of each type of user,

what kind of training is required for each user, and how easy will it be for a user to understand

and use the system are required. In this project no special training is required to use the

project.

4.Functional: Functional requirement such as what will the system do, when will the system

do it, are there several modes of operation, how and when the system can be changed, are

there constraints for speed, response time are required.

5.Documentation: Documentation requirements like how much documentation are required,

should it be online or book format or both are required.

6.Data: Data requirements like what should be the format of both the input and output, how

accurate must they be, should any data be retained for any period of time should be known.

Page | 55

Page 56: Major Project Report by Vikash Kumar

FEASIBILITY

1) Economic Feasibility

Economic consideration is crucial for any system. these considerations includes a broad

range of cost-benefit analysis, long term corporate income & strategies .The cost/benefit

analysis is proved out to be positive for our Project.

2) Technical Feasibility:

a) Resource Availability:

Are the hardware and software resources required are available to develop the

application?

b) Technology:

Have the System designers, the skills and guidance to achieve their goal.

3) Legal Feasibility:-

A development of any infringement, violation or liability that could result from

development of the system. There is no such law or rule which will be violated by the

development of this software.

Page | 56

Page 57: Major Project Report by Vikash Kumar

SOFTWARE SELECTION CRITERIA

Page | 57

Page 58: Major Project Report by Vikash Kumar

DATA MODEL

SOFTWARE ENGINEERING

1.4 SOFTWARE ENGINEERING APPROACH

Software engineering is a layered technology. The foundation for software

engineering is the process layer. Software engineering process is the glue that holds

the technology layers together and enables rational and timely development of

computer software. Process defines a framework that must be established for

effective delivery of software engineering technology. The software process forms

the basis for management control of software projects and establishes the context in

which technical method are applied, work products (models, documents, data,

reports, forms, etc.) are produced, milestones are established, quality is ensured and

change is properly managed.

Software engineering method provide the technical “How to’s” for building software.

Methods encompass a broad array of tasks that include communication,

requirements analysis, design modeling, program construction, testing and support.

Software engineering relies on a set of basic principles that govern each area of the

technology and include modeling activities and other descriptive technique.

A process framework establishes the foundation for a complete software process by

identifying a small number of framework utilities that are applicable to all software

projects, regardless of their size or complexity. In addition, the process framework

encompasses an asset of umbrella activities that are applicable across the entire

software process. Each framework activity is populated by a set of software

engineering actions – a collection of related tasks that produce a major software

engineering work product. The following generic process framework is applicable to

our software project:-

Communication: These framework activities involve heavy communication and

collaboration with the customer (and other stack holders) and encompass

requirements gathering and other related activities.

Page | 58

Page 59: Major Project Report by Vikash Kumar

Planning: This activity establishes a plan for the software engineering work that

follows. It describes the technical asks to be conducted, the risks that are likely,

the resources that will be required, the work product to be produced and a work

schedule.

Modeling: This activity encompasses the creation of models that allow the

developer and customer to better understand software requirement and the

design that will achieve those requirements.

Construction: This activity combines four generation (either manual or

automated) and the testing that is required uncovering errors in the code.

Deployment: The software (as a complete entity or as a partially completed

increment) is delivered to the customer who evaluates the delivered product and

provides feedback based on the evaluation.

Page | 59

Page 60: Major Project Report by Vikash Kumar

1.4.1 Process Model

Process models define a distinct set of activities, actions, tasks, milestones and

work products that are required to engineer high quality software. These process

models are not perfect, but they do provide a useful roadmap for software

engineering work. Software engineers and the managers adapt a perspective

process model to their needs and then follow it. In addition, the people who have

requested the software have a role to play as the process model is followed.

Process model provides stability, control and organization to an activity that can, if

left uncontrolled, becomes quite chaotic. The process models are also referred to as

“Rigorous process models” because they often encompass the capability suggested

by CMMI. The terminology and detail of each process model deer, but the generic

framework activities remain reasonably consistent.

From the point of view of the software engineer, the work products are the

programs, documents and data that are produced as a consequence of the activity

and tasks defined by the process. There are a number of software process

assessment mechanism that enable organizations to determine the “Maturity” of

their software process. However, the quality, timeliness and long term viability of

the product you built are the best indicators of the efficacy of the process that you

use.

The project follows a Simple Incremental Model, as shown below:

Page | 60

Page 61: Major Project Report by Vikash Kumar

Fig 1.2: The SIMPLE INCREMENTAL MODEL of Software Development

It follows the following steps:

Requirement Analysis

It involves gathering information about the current existing tools and techniques

that can be used for development of Network based search engine and to decide

which one is the best one to be followed. Also to decide onto the right

programming language that can be used to implement the algorithm.

Design

It involves drawing a well established design which will involve various DFDs, i.e.,

Data Flow Diagrams. They show how the data flow takes place in the system. It

also involves various Use Case and Class Diagrams which show the interaction of

users with the system and behavior of various classes. Many other diagrams such

as sequence diagrams and collaboration diagrams are also drawn.

Coding

Page | 61

Page 62: Major Project Report by Vikash Kumar

This phase involves actual coding part. It is done mostly in java but some modules

are designed using visual c# and user interface is developed using JSP servlets.

The most important part of this phase is the documentation. A well documented

code is prepared.

Testing

After the coding part is over, various test cases are prepared to test the

deliverable web application. It includes various testing such as Integration Testing

and Alpha testing,

System Data Flow Digram

Page | 62

Page 63: Major Project Report by Vikash Kumar

Page | 63

Page 64: Major Project Report by Vikash Kumar

TABLES

Page | 64

Page 65: Major Project Report by Vikash Kumar

1.Product entry table : It includes information about all the items available at our site.

Product entry table

3 P a s s w o r d e n t r y t a b l e :

Password at our site.

Page | 65

Field Name Data Type Width

Prod_no Character 5

Prod_name Character 15

Prod_qty Varchar2 4

Prod_price Varchar2 7

Prod_cat Character 10

Field Name Data Type Width

Cust_Id Character 5

LoginPassword Varchar2

10

HintQuestion Varchar2

20

HintAnswer Varchar2 10

Page 66: Major Project Report by Vikash Kumar

4 Web Access Table : It includes information about all the Navigational Patterns of

Members at our site.

Field Name Data Type Width Key

User_Name Varchar 50 PK

Session_Name Varchar 50

Web_Access_Sequence_Path Varchar 200

Frequent_Subsequence_Path Varchar 200

Web Access Table

4.BinaryTree Table : It includes information about all the Navigational Patterns of Members at

our site.

Field Name Data Type Width Key

Page | 66

Page 67: Major Project Report by Vikash Kumar

Tree_Id (pk) Number 10 PK

NodeName Varchar 50

Left_Child Number 10

Right_Child Number 10

Node_Count Number 10

BinaryTree Table

ForExample :

User_Name Session_Name Web_Access_Sequence_Path Frequent_Subsequence_Path

UserA S1 Pqspr Pqpr

UserB S2 Tptqrp Pqrp

UserB S3 Qpqupt Qpqp

UserA S4 Puqrur pqprr

Page | 67

Page 68: Major Project Report by Vikash Kumar

OUTPUT

Page | 68

Page 69: Major Project Report by Vikash Kumar

HOME PAGE

This is the home page of the website which is accessed by the users and administrator.It displays various page links to the project.

Page | 69

Page 70: Major Project Report by Vikash Kumar

BOOKS

It is the page which displays information about books.

Page | 70

Page 71: Major Project Report by Vikash Kumar

ELECTRONICS

It is the page which displays information about books.

Page | 71

Page 72: Major Project Report by Vikash Kumar

ADMIN

It is administrator login page which displays the login id and password tab.Administrator can log in to this page to manage the Frequent Access Pattern.

Page | 72

Page 73: Major Project Report by Vikash Kumar

CHANGE PASSWORD

It is Change Password page which displays the admin name and password tab.Administrator can log in to this page to more admin user ids and password.

Page | 73

Page 74: Major Project Report by Vikash Kumar

WAP

The web access pattern page displays the web access pattern stored as a result of the user access.It allows administrator to generate Frequent Access Pattern.

Page | 74

Page 75: Major Project Report by Vikash Kumar

TREE TABLE

It generates the TREE corresponding to the Frequent Access Pattern.It is used to generate data for Cashed Pages.

Page | 75

Page 76: Major Project Report by Vikash Kumar

WAP TREE

It generates the TREE Pattern corresponding to the Frequent Access Pattern.

Page | 76

Page 77: Major Project Report by Vikash Kumar

CASHED PAGES

It displays the Cashed Pages fron Frequently accessed Page Pattern.

Page | 77

Page 78: Major Project Report by Vikash Kumar

LIMITATIONSAND

FUTUREENHANCEMENTS

Page | 78

Page 79: Major Project Report by Vikash Kumar

LIMITATIONS OF THE PROJECT: The Project is implementedthroughly and is of immence practical importance and use.It can be of use for the establishment for better space and cash management.

Even though the project has some limitations:

1.The project is constrained by the availibility of the space on the website on which it is loaded.

2.The project requires the server to have .NET and C# environment.

3.The database must be in consistance with the web access pattern.

4.Users must logout of the website to ensure the data is saved in the database as project does not support incomplete transitions.

Future Enhancements:The Project can be modified to include many features for various other applications.

1.Project can also be linked to Research and Development Department of the Enterprise to provide current access patterns of the users.

2.Poject can be modified to include online shopping and salesfor the company.

3.Data base administrator can be linked to the online database to check for space eficiency.

Page | 79

Page 80: Major Project Report by Vikash Kumar

BIBLIOGRAPHY

Page | 80

Page 81: Major Project Report by Vikash Kumar

1. Xidong Wang Yiming Ouyang Xuegang Hu Yan Zhang- Discovery of User Frequent Access Patterns on Web Usage MinPng(School of Computer and Information, Hefei University of technology, Tunxi Road 193, Hefei, Anhui, China. [email protected] [email protected] Xueghu@mail. hJah.cn [email protected]

2 Elmasri, navathe, somayajulu, gupta- FUNDAMENTALS OF DATABASE DESIGN Pg. no. 5- 15

3 M. Morris Mano COMPUETR SYSTEM ORGANISATION( Pg. no. 446,462-469)

4.ASP .NET (NIIT) 9 TH EDITION(CH. NO. 5,6,7)

Page | 81