Upload
napo-nathnael-mosola
View
190
Download
0
Embed Size (px)
Citation preview
The National University of Lesotho
Department of Mathematics and Computer Science
COMPUTER SCIENCE PROJECT
C S 4 4 0 3
2011/12, ACADEMIC YEAR.
Mathematics and Computer Science Digital Library System
MACSDL
June 25th 2012.
Compiled & submitted by:
1. Mosola, N.N – 200800142
2. Koali, M.S – 200800572
3. Senatsi, K.V – 200800535
Supervisor: Mr. L.Poulo
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
2
ABSTRACT:
The world is evolving rapidly, while the pace of technology rises exponentially. Living in an
information age, free access to information is a high demand, making people share what they
have and obtain what they do not possess. Information sharing is common these days, with
knowledge being shared amongst individuals, a need to manage such information is necessary.
The National University of Lesotho, amongst its faculties is the faculty of Science and
Technology which has a few departments.
The department of Mathematics and Computer Science (MACS) seeks to have an information
management system, where large pools of information can be stored, accessed freely and
managed adequately. A web based MACS digital library (DL) system is the answer. MACS DL
will manage shared digital information objects for students and lecturers to enhance learning at
NUL. The system will bring an evolution to the Information, communication and Technology
(ICT) usage within the NUL barracks where both students and their lecturers are in need of
information on daily basis. Information retrieval will be at the heart of the system, while a
repository of digital objects is kept.
Following the prototyping process model and software life cycle, the system will be developed to
address this issue. This document discusses all the relevant steps that take place as the system is
under development.
ACKNOWLEDGEMENTS Thanks to the Mathematics and Computer Science department at NUL, the project was indeed an
eye opener; lots of great lessons have been picked up from this one and surely are the ones to
build for the future. Working together on this project has made us a unit and we hope to work
together again, we were a great team, an incredible team indeed and with you, a new
„Computing‟ era is born.
Big and ongoing thanks go to our supervisor and thesis advisor, Mr.Lebeko Poulo, for
introducing us to this fascinating subject of „Digital Libraries (DLs) and Information Retrieval
(IR)‟. Even in this four credit hour course, we learned more about DLs and IR systems than we
could have learned in a lifetime in any other field of study. Thank you for giving us a chance to
work on such a fascinating and challenging project! Our warm and kind regards go to the student
union in the MACS department, this project would not have been possible without them willing
to spare minutes with us during the requirements elicitation phase, testing and the evaluation
phases of this endeavor.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
3
CHAPTER ONE Introduction
Building a digital library (DL) is inevitably an expensive and resource-intensive. Before
embarking on such a project, it is important to consider some basic principles underlying the
design, implementation and maintenance of a DL. The principles applied in building this project,
hereinafter called MACS DL, do not only apply on this endeavor but are essential to building
large digital libraries that we know today, good examples are: The ACM digital library
(http://portal.acm.org/dl) , New Zealand Digital Library (http://www.nzdl.org/cgi-bin/library),
National Science Digital Library (http://nsdl.org/), to mention but a few.
A digital library is “a focused collection of digital objects, including text, video, and audio, along
with methods for access and retrieval, and for selection, organization, and maintenance of the
collection.” C.f. Witten, Ian and David Bainbridge (2002), How to Build a Digital Library,
Morgan Kaufman, p. 6.
Brief background of DLs
As the need to avail information and resources for access globally arose, digital library systems
(DLS) were born and their importance grew to greater heights over traditional libraries to
digitally preserve collections of valuable resources and information on the Web for educational
and research purposes. As a result, the basic idea was to create web-based, easily-accessible
collection of digital information whose organization and management would be automated to
address the inefficiency of traditional libraries. MACS DL is no exception as all the principles
used in its development follow the same route.
What is MACS DL?
MACS DL is an educational portal built for use at the National University of Lesotho (NUL),
under the department of Mathematics and Computer Science (MACS) in the faculty of Science
and Technology (FOST), to enhance the mode of course delivery and provide facilities to
academics in this faculty. MACS DL provides services to the mentioned NUL community such
as file sharing, browsing documents, searching textual materials, storing unlimited amount of
digital objects on the server for current and future purposes, and information retrieval (IR).
Motivation
The higher education industry in Lesotho is experiencing an unprecedented growth rate. This
trend is largely a result of new enabling technologies that have facilitated the virtual delivery of
academic programs. This has in turn led to libraries becoming key success factors in the virtual
academic environment.
As students at NUL, it has come to our attention that the famous Thomas Mofolo library within
the premises of NUL is not adequate and well-equipped enough to provide services to the
students, researchers, and N.U.L staff in general. With that in mind, we aim to promote, support,
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
4
manage and disseminate high quality research, development and innovation in information,
library and related fields.
Project aims
We aim to encourage and facilitate the development of information strategies in higher education
communities such as NUL. The main reason of building a digital library system is to provide
unlimited, free and remote access, to information from multiple users around the NUL campus.
Problem Statement
The National University of Lesotho (NUL) has a vision to be a leading African university. In the
faculty of Science and Technology (FOST), the department of Mathematics and Computer
Science (MACS) has a vision to facilitate learning and enable both students and lecturers have a
better way of managing and conducting their academic work. Currently, MACS does not have an
academic portal that manages textual digital objects to enhance learning. Students and lecturers
rely on the internet search engines such as Google, Yahoo, etc, for any academic material they
need. MACS department requires a more direct digital library that encourages file sharing for
easy access of materials used in the MACS department.
Proposed solution
A well managed digital library system that will serve as a repository of rich information of
greatest demand contributed by students and lecturers in the MACS department. Our hope is that
this will increase the availability of student research for scholars, empower lecturers and students
to conduct researches and advance digital library technology worldwide. MACS DL shall be a
repository that archives any textual objects for current and future reference to enhance learning at
NUL and provide free access to information. The fundamental reason for building a digital
library for MACS department at NUL is belief that it will provide better delivery of information
than was not possible in the past.
Why a [MACS] DL?
Some of the advantages of DLs, though not limited to, are the following:
DLs bring the libraries closer to users: Information is more and easily accessible, and
increases information usage. This is very much different to what happens when a
traditional library, like Thomas Mofolo, is used since users need to physically go to the
library.
Searching and browsing capabilities: Computer systems are better than manual methods
for finding information. DLs offer efficient and advanced search, information retrieval
and browsing techniques that enable users to better search for their information need,
browse material searched with relative ease.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
5
Information sharing: Placing digital information on a network makes it available to
everyone. With a MACS DL maintained on the NUL site, it will vastly be an
improvement over expensive, physical duplication of little used material, which is
sometimes inaccessible without having to travel to the location where it is stored, like
Thomas Mofolo library.
Availability of information: MACS DL‟s doors will never close; usage of MACS DL‟s
collections can be done when library (i.e. Thomas Mofolo library) buildings are closed.
Materials are never checked-out, misplaced, or stolen!
Project Plan
MACS DL system is sub-divided into two major parts, namely:
The DL: This is by large, the most important of the two. The DL is a focused collection
of digital objects organized and maintained in a proper manner. The DL will contain a
pool of electronic versions of books and journals.
Search engine – This will assist in the information retrieval (IR) and file indexing (FI).
The plan is to have a successful implemented DL with an incorporated search engine that
enables users of the DL to retrieve information they require. Upon successful completion of
these two, the system will be deemed to have met the user‟s requirements, later discussed in
this document.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
6
CHAPTER TWO
System Requirements Specification (SRS)
System functions and purpose:
The system is a managed digital library portal for higher education and research purposes
to be used at NUL under FOST, in the MACS department by both students and lecturers.
MACS DL manages textual collections. The system allows intended users to upload
materials to the server, browse through the collection, sort the collection, search for any
material on the server, and download material from the collection.
Hardware and Software requirements:
1. Hardware:
Computers with a minimum secondary disk space of 20GB and primary memory
of 128MB.
2. Software:
Apache tomcat web server, MySQL database server and Java Integrated
Development Environment (IDE).
Performance specification:
Using data structures and algorithms designs, each module/function of the system has
been optimized as to never burden the processor with prolonged processes.
User interface (UI):
The system will provide an interactive, easy to learn and use interfaces to interact with its
users enabling it to be used effectively and efficiently. The system UI obeys the basics of
human computer interaction principles and designs.
System data:
Any data captured into the system, e.g. user‟s information is stored in relational database
schemas with normalized objects to conform to data integrity rules and consistency. The
system provides tight information security measures to allow access only to users with
credentials to access the system data.
System design constraints:
Imposed on the system design, MACS DL only manages textual digital objects. The
system uses English language only, bearing in mind that the intended users are academics
and can actually understand the language.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
7
Requirements Engineering (RE)
Requirements engineering establishes a solid base for design and construction of any system.
Without it, the resulting software would have a higher probability of not meeting user‟s needs.
To build elegant software that actually solves user‟s problems and meets the SRS mentioned
earlier, the developers conducted an extensive study around the NUL campus to gather views
from the targeted/potential users. The following RE steps were followed:
Inception: this is where the scope and nature of the system was defined.
1. Scope: A web based digital library system that manages textual objects.
2. Nature: The system is an academic portal helping students and researchers to
share materials and search for any texts on the server.
Elicitation: This step helps to define what is actually required. Interviews with NUL students
in the MACS department and lecturers were conducted to help developers elicit the user‟s
requirements to identify the problem properly and propose elements of the solution. The
following diagrams were used to elicit user‟s requirements.
Figure 2.1 Use Case scenarios
Use Case Number Use Case Name Use Case description
1 Browsing Accessing subsets of data by
categorical classification.
E.g. browse by author name,
alphabetical order, title , by
date etc
2 Searching Indexing, Information
retrieval and querying
3 Annotate Adding commentary,
generalization and reviews
4 Upload/Submission Adding new digital objects
to the DLS
5 Download Saving a digital object to a
local storage media
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
8
Fig 2.2 Use case diagram
SEARCH
BROWSE
SUBMIT
ANNOTATE
DOWNLOADCLIENT
Elaboration: The basic requirements gathered were refined and modified to suite the design
of the system under development; as a result an analysis model was produced.
Negotiation: The priorities of the system were clarified. E.g. as one of the priorities, the
system must be able to upload documents to the server, enable information retrieval and
download material from the server. During this step, different approaches to solving the
problem identified were coined a preliminary set of set of solution requirements was
negotiated amongst the developers.
Specification: From the elaboration and negotiation steps, a detailed specification of the
system was developed as enough resources had been gathered.
Validation: In an iterative manner, prototyping as a standalone process model, users of the
system were frequently visited to make sure what is being developed conforms to what the
users required.
Management: Throughout the project‟s life cycle, changes to the initially gathered
requirements were brilliantly managed as the prototyping model allows iteration of the steps
performed.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
9
As a result of the above RE steps, prototypes were built as an end product.
To translate user‟s needs into technical requirements, the development team used Quality
Function Deployment (QFD), emphasizing what is valuable to the users, identifying the
following three types of requirements:
Types of Requirements identified
1. Normal requirements: These were stated by the users during interviews
conducted, and provided developers with an understanding of what should be
developed.
2. Expected requirements: These were not explicitly mentioned by the users but
were identified by the developers. E.g. ease of searching.
3. Exciting requirements: These were identified by developers, as they are
beyond the user‟s expectations.
Requirements gathering Techniques used
A number of techniques were used to gather requirements from the target population. The
following were of great importance in the requirements gathering phase:
Stratified sampling: A small group of students in the MACS department were chosen to
represent the entire MACS student union. In the communication and planning phase of
the prototyping model followed by the developers, ten (10) students were sampled.
Observing users: Sampled students were observed as they carried out their daily
activities, using Google and Yahoo as internet search engines to search for material they
require on the internet. On the other end of things, sampled students were observed as
they used the MACS DL search engine.
Interviews: face-to-face interviews with the sampled population of students were
conducted. Initially, a pilot study was employed, to make certain that the methods
proposed by the developers were viable and that in the long run, the solution would be
appreciated. As the developers needed concrete answers and proof for future references
that interviews were conducted, a live video recording session of students using the
MACS DL system as a prototype was recorded. This video is in the possession of the
developers and shall be made available to the supervisor.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
10
CHAPTER THREE Project Risk Analysis and Management
SWOT analysis was used to determine the strengths, weaknesses, opportunities and threats of
this project. The following table depicts the outcomes of this extensive risk assessment.
Table 3.1 SWOT analysis
Strengths Weaknesses Opportunities Threats
1. Skilled project team
members in
programming web based
IR systems
Apache Tomcat web
server storage
New technology Existing DLs and
search engines, such
as Google scholar,
4Shared, etc
2. Availability of required
resources
Not enough metadata
can be found in digital
objects
Integration with external
search engines, such as
Web resources
3. New technology
facilitating information
sharing
Computer illiterate
end users
Search engine
development
Information overload
A thorough study in assessing the risks related to embarking on a project of this nature was
conducted by the developers and the above table shows the results of risk analysis using SWOT
analysis.
Other software engineering methods of risk assessment, management and mitigation were
employed to try and analyze the uncertainties that could put the project under risk. These
involve:
Identifying technical risks for MACS DL project
Identifying technology risks for MACS DL project
Identifying staff risks for MACS DL project
From the above, the developing team came up with the following risk analysis table:
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
11
A scale of 1 to 5 was used to estimate the impact of the risk on the project, and the following
mappings were concluded:
1 = catastrophic, 2 = critical, 3 = marginal, 4 = negligible, 5 = low
Table 3.2 Risk Table
Risk Management Mitigation Monitoring Impact
Supervisor not readily
available on campus
Use groupware systems to
contact supervisor
This risk was inevitable
(i.e. unavoidable)
Fortnightly
progress reports
must be send to
the supervisor
2
Ambiguous project scope Refine project scope Request clear definition
of scope
Brain-storming
sessions
1
Developers not familiar
with technology used
Seek related sources from
supervisor and specialists
Quickly learn how to use
the technologies required
Project progress
report
3
Requirements change Elicit requirements, build
prototypes
Iteratively refine the
requirements to track
changes
Perform
requirements
engineering
3
Project team member drops
out of the project
Ensure timely and
consistent check-ups on
team members
Meet with team members
regularly discussing the
project
Measure
effectiveness of
mitigation. E.g.
ensure that
every member
is doing some
work on the
project
5
The risks identified were then assessed using the methods described above. In a round-robin
fashion, the developers had to assign each impact of the risk a value until an agreement was
reached, which is depicted in the tables above.
Projects are always under some risk if any event is identified that could dent the project‟s
schedule. The schedule of this project was affected by some of the risks identified above, for
example; in the requirements elicitation phase, numerous iterations regarding the requirements
identified were a must [RE] do.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
12
CHAPTER FOUR Object Oriented Analysis Design (OOA)
1. Class Responsibility Collaborator (CRC) Modeling
CRC modeling provides a simple means of identifying and organizing classes. NB: CRC
modeling is not an official part of Unified Modeling Language, but a collection of index cards
that represent classes. Using this modeling, developers were able to identify potential classes that
they could use as the building blocks of the MACS DL system.
Benefits identified
Portability: No computers are needed as CRC can be used anywhere, during the
brainstorming sessions.
Tangible: They allow participants to experience at firsthand how the system will work.
Limited size: Index cards can only hold a limited amount of information compared to
class diagrams. This enforces a high-level analysis.
Fig. 4.1Class Responsibility Collaborator (CRC) Cards
Class Name: Searcher
Class Type : Internal entity
Responsibilities Collaborators
Generates Query
Filters Query
Locates query
Retrieves Results
Inverted File
Retriever
Ranker
Class Name :Inverted File
Class Type : External Entity
Responsibilities Collaborators
Insert Data into an Index
Maintains Index
Searcher
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
13
Class Name : Retriever
Class Type : External Entity
Responsibilities : Collaborators :
Displays ranked results
Ranker
Searcher
Class Name : Ranker
Class Type :External Entity
Responsibilities Collaborators
Ranks data Retriever
Searcher
Class Name : Browser
Class Type :External Entity
Responsibilities Collaborators
Retrieving data from links Searcher
Retriever
Class Name : Downloader
Class Type : External Entity
Responsibilities Collaborators
Copies data to local storage Searcher
Retriever
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
14
Class Name : Digital Library
Class Type : External Entity
Responsibilities Collaborators
Processes a given request Searcher
Browser
Downloader
Up-loader
Class Name : Up-loader
Class Type : External Entity
Responsibilities Collaborators
Copies digital objects from local storage
into the collection
2. Class Diagrams
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
15
The system consists of the following classes, depicted in class diagrams.Fig 4.2 Class diagrams
OOA continued…
3. Data flow diagram (DFD)
DFDs show how data is captured as input, transformed in the processes and output as results to
the users.
+GenarateQuery() : string
+FilterQuery() : string
+LocateQuery() : void
+RetrieveResults() : string
-SearchQuery : string
-Results : string
SEARCHER
+InsertData() : void
+Maintains() : void
-Query : string
-Size : int
Inverted_File
+RankResults() : void
+DisplayResults() : void
-Data : string
RETRIEVER
+RankData() : void
-Data : string
Ranker
+BrowseLink() : string
-Link : string
BROWSER
+DownloadFile() : void
-File : string
DOWNLOADER
+CopyFile() : void
-Data : string
UPLOADER
+ProcessRequest() : void
-Request : string
DIGITAL_LIBRAY
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
16
Fig 4.3 DFD
BROWSE SEARCH
SUBMIT
ANNOTATE
CLIENT
DATABASE
SERVERINDEX RETRIEVE
A d
igita
l ob
ject
Fe
ed
ba
ck
A request for a digital object
Query/Results
A request for a digital object
Com
ment(s)
Indexed object
CLIENT
Ne
w d
igia
tl
ob
ject
Dig
ital o
bje
ct
A digital object
Commect(s) o
n digital o
bject
DOWNLOADDownload Request
Dig
ital obje
ct
CHAPTER FIVE Data Structures and Algorithms
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
17
The following data structures were used in the development of the DL. The developers
extensively studied the various data structures to use, and from a long list of candidates, it was
thought that the best ones to use were the following:
Hash Table: This data structure store all index terms. A hash table location references a
posting list node for a specific index term.
Why Hash Table?
A Hash table data structure provides efficient searching which has been optimized to a
time complexity of O (1) to find a posting list for an index term.
Posting list (implemented using linked list)
This is a linked list data structure in which a node in the list encapsulates term frequency
(the number of times a term appears in the document) and document id (document
filename). A new node is added in the list every time a document is indexed which
contains the term. This operation runs at O (1).
Algorithms:
Indexing
0. Read an index object from disk.
0.1.Extract entire text from a given document.
0.2. Break the text into tokens /terms.
0.3.Filter the stop-words from the terms.
0.4.Stem each term, applying the stemming process
0.5.For each stemmed version of the term:
Begin
0.5.1. if a term does not exist
0.5.1.1 Store the term into the hash table
0.5.1.1.2 Create a corresponding posting list for the term.
0.5.2. Else
0.5.2.1.Add a node in the posting list.
End
0.6.Save an index object to disk.
Searching and Ranking
1. Read an index object from disk.
1.1.Break the user query into tokens/terms.
1.2. Stem each term, applying the stemming process
1.3.For each stemmed-term:
Begin
1.2.1. If a term exists
1.2.2. Retrieve its posting list and compute its weight in relation to query vector
Q, and all document vectors (Di…Dn ) where n is the number of documents in the
collection.
1.2.3. Else
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
18
1.2.3.1 The term weight is zero.
End
1.4.For each document
Begin
1.4.1. Compute the score (using the ranking formula).
End
1.4.2. Sort documents according to their score.
1.4.3. Return sorted documents (The document with highest score is the most relevant
document to the given query).
System Design and Engineering
This section discusses how the system was engineered. In this context, there are numerous steps
that were followed, now that the developers were equipped with the requirement from the RE
phase, classes to implement from the OOA phase, data structures and algorithms to implement,
the developers now had to design and engineer the system to meet the requirements.
Indexing Documents
Overview:
Searching, indexing and ranking techniques are at the core of the implementation of this piece of
work. This chapter discusses the searching algorithm‟s efficiency for indexing and ranking
documents. Indexing extracts terms from a given document when uploaded to the server, to
indicate what the document is all about or summarize its content. This process takes extracted
terms and places them in an inverted index/file data structure. Searching pertains to posing a
query and awaiting results from the digital library (DL) system. Information retrieval is the
process of identifying the most relevant information that satisfies the given search query.
The point of using an index is to increase the speed and efficiency of searches of the document
collection. Without indexing, searching would have to be sequential, thus increasing the
complexity of the algorithms. An inverted index contains two parts: an index of terms generally
called the term index, which stores a distinct list of terms found in the document collection and,
for each term, a posting list, which is simply a list of documents that contain the term. When
submitting documents to the DL system, punctuations are removed, all terms converted to lower
case, and stop words are removed. Stop words are those terms with little information content,
e.g. conjunctions. This strategy will be discussed in depth later in this document.
Suppose there are two documents; D1 and D2 and D1 has the following contents: Mathematics
and Computer Science department whilst D2 contains: Department of Social Science.
Key terms: Information retrieval (IR), Inverted Index (II), ranking, stop words, stemming, term
weight, posting list.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
19
Table 5.1 Inverted file structure analogy
Term Document ;Term frequency
Mathematics 1;1
Computer 1;1
Science 1;1, 2;1
Department 1;1, 2;1
Social 2;1
Inverted Index architecture
Fig 5.2 Inverted Index architecture.
Indexing documents
A document is uploaded through an interface to add it to the collection. The index Builder class
is instantiated and constructed with the document name. The document is then indexed using the
indexDocument method which simply allows the Text Extractor instance to extract text from the
document and breaks the text into tokens and also filter the stop words. The stemmer instance
stems the tokens. The inverted Index class will then be instantiated to store the stemmed terms
into a hash table and a posting list is created for each term. The entire process forms the inverted
index.
Index
Builder
Do
cu
me
nt
Text
Extractor
Stemmer
Inverted Index
Posting List Hash Table
Stop words list
Ste
mm
ed
toke
ns
toke
ns
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
20
Posting List(s) [implemented as linked lists]
A posting list indicates, for a given term, which documents contain the term. Typically, a
Linked list data structure is used to store the entries in a posting list. This is because in most
retrieval operations, a user enters a query and all documents that contain the query are obtained.
This is done by hashing on the term in the index and finding the associated posting list. Once the
posting list is obtained, a simple scan of the linked list yields all the documents that satisfy the
query.
Index Builder
The index builder drives the indexing process. The index builder loops through all the document
objects and calls the indexDocument method to add each document to the inverted index. Once
all the documents have been processed, the writeIndextoDisk method is used to store the
invertedIndex object to disk, which is read every time a new document is uploaded to check for
duplicates, a programming technique called SERIALIZABLE functions was used to make these
functions SERIALIZABLE so that each time the program runs, the inverted index is read from
disk hence all data in it will not only be available at runtime but saved to this inverted index file.
Applying stemming process c.f. Porter’s stemming algorithm
Stemming simply refers to changing all term forms to canonical versions. For example studying,
studies, and studied all map to study. Stemming reduces words by stripping off suffixes,
converting them to neutral stems that are devoid of tense, number, and in some languages case
and gender information. This relaxes the match between query terms and words in the documents
so that, for example, libraries is deemed equivalent to library. Stemming is not appropriate for
all queries, particularly those involving names and other very specific words.
This process avoids mapping words with different roots to the same term. Porter‟s Stemming
algorithm has been used to provide this service to the MACS DL system.
Below is a description of Porter‟s stemming algorithm, which can be found on the following
URL:http://snowball.tartus.org/text/introduction.html,
http://snowball.tartus.org/algorithms/lovins/stemmer.html.
Porter‟s stemming algorithm defines five successively applied steps of word transformation.
Each step consists of a set of rules in the form <condition> <suffix> → <new suffix>. For
example, a rule (m > 0) EED → EE means “if the word has at least one vowel and consonant
plus EED ending, change the ending to EE”. This would mean words such as “agreed” become
“agree”, while “feed” remains unchanged since the condition would not be satisfied hence
another production rule would be used.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
21
The algorithm is very concise, having just about sixty (60) rules, and very readable for a
programmer. It is also very efficient in terms of computation complexity as compared to other
affix and/ or statistical, stemming algorithms such as N-gram stemming, Hidden-Markov Model
(HMM) algorithm, to mention but a few, although HMM algorithms are beneficial in fields such
as machine translation and natural language processing, where numerous languages form the
data set.
The flaws identified with using classical stemmers like Porter‟s stemming algorithm is that they
often conflate words with similar syntax but completely different semantics. For example,
“news” and “new” are both stemmed to “new” while they belong to two quite different
categories.
Dr. Porter, did not only publish the standard implementation of his work written in C and Java
programming languages, but also developed a whole stemmers framework called Snowball. This
framework provides a stemmer definition script language and a translator to ANSI C and Java.
The main purpose was to enable programmers to develop their own stemmers for other character
sets or languages. Currently there are implementations for Romance, Germanic, Uralic and
Scandinavian languages as well as English, Russian, and Turkish on the websites given.
We chose Porter‟s stemming algorithm because of its efficiency in dealing with English related
corpus, and it really helped in paving the way for developing MACS DL.
Applying Stop words removal
Stop words make up a large fraction of the text in most documents. Eliminating such words from
consideration speeds processing, saves huge amount of disk space in indexes, and does not
damage retrieval effectiveness. A list of words filtered out during automatic indexing because
they make poor index terms is called a stop word list or a negative dictionary. These are words
such as: a, and, on, in, the, about etc. Here we remove the words such as articles, Prepositions,
conjunctions etc. from the documents. The following screen shot depicts an inverted index object
after indexing two documents; the output of the indexing module was as follows:
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
22
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
23
CHAPTER SIX Searching, Browsing, Ranking and Information Retrieval (IR)
IR aims to retrieve large amounts of data, as fast as possible from different kinds of information
stored in more than one form, be it visual, audio or textual. The user can retrieve information
through posing a query, where the information retrieval module/function will retrieve all the
information that satisfies the query. This is in contrast to what a database system does, where an
exact answer is retrieved from a database object that matches a query using a select statement. IR
systems do not retrieve a definite answer, but produce ranking of documents that seem to contain
information relevant to the query given to the system. This is a process called indexing, which
was covered earlier in this document. MACS DL information retrieval mechanism has been
engineered to produce only the results that best match the provided query, filtering unwanted
results.
Methodology
Several different types of IR mechanisms exist, but MACS DL system employs a method called
Inverted File indexing. This is the most well organized index structure for text query evaluation
as the system was developed to be used on textual digital objects.
IR systems high level architecture
A general scheme in figure 6.1 explains the essential structure of classical IR system. Through
the first phase is the preprocessing mechanism, the raw documents of the corpus are processed to
tokenized documents and then indexed as a list of postings per terms. At the second phase the
user gives a query to represent his "information need". The query is then transformed to a system
query and its relevant documents are retrieved from the index. The retrieved documents are
ranked according to their relevance to the query and returned to the user through a user interface,
later discussed in this document.
Figure 6.1: Classical IR system architecture
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
24
Term Weighting
This text retrieval module, like the rest, has been designed based on a comparison of content
identifiers attached both to stored texts and to the user‟s information queries. A formal
representation of the term vectors is obtained by including in each vector all possible content
terms allowed in the system and adding term weight assignments to provide distinction amongst
terms. If Wk represents the weight of term k in document D or query Q, and t terms in all are
available for content representation, the term vectors for document D and/or query Q can be
written as:
D = (t0, w0, t1, w1,...., tn, wn) and Q = (q0, w0; q1, w1;. . .; qr,wr).
Searching process
Searching is the most important part of the DL system. Information is retrieved based on the
search process. This technique gives results based on the relevancy of the query provided.
Finally, the related documents are then displayed on an output interface as links on a web page.
The following screenshot shows the result of searching, after three documents were indexed
correctly.
Document 1 – a document on digital libraries
Document 2 - a document on digital libraries and Information retrieval
Document 3 – a document on distributed databases.
Query = Introduction to digital libraries
The computations of the term weights, term frequency, in relation to an uploaded document gave
the following output:
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
25
Ranking retrieved documents
Ranking uses similarity to select items that can be used in ranking the output triggered by a
query. This involves ordering from the most likely items that satisfy the query. It also displays
the most likely relevant terms first. To rank a document retrieved by a query similarity between
them has to be calculated. The below formula is used to measure similarity between query and
item.
Ranking is done in two phases, these are:
Coarse grain ranking – Documents are sorted depending on the frequency of the
query tokens. The document that contains all query terms will be ranked first.
Fine grain ranking – Depends upon weights of terms. In this phase, the similarity
function is calculated between document and query.
This module sorts the retrieved documents based on their relevance to the query posed, using the
following formula:
The following screenshot depicts the result of a query, with the results ranked according to their
relevance to the query posted.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
26
In ranking, an artificial measure is used to gauge the similarity of each document to the query,
and a fixed number of the closest matching documents are returned as answers.
Metadata browsing
Browsing is often described as the other side of the coin from searching, but really the two are at
opposite ends of a spectrum. Searching is purposeful, whereas browsing tends to be casual.
Terms such as random, informal, unsystematic, and without design are used to capture the
unplanned nature of browsing and, often, the lack of a specific goal. Searching implies that you
know what you‟re looking for, whereas browsing implies that you‟ll know it when you see it.
The metadata provided with the documents in a collection can support different browsing
activities. Information collections that are entirely devoid of metadata can be searched. This is
one of the real strength of full-text searching, but they cannot be browsed in any meaningful way
unless some additional data is present. The structure that is implicit in metadata is the key to
providing browsing facilities. Here are some examples of browsing:
Lists: This is the simplest structure that is simply an ordered list. It can either be
alphabetical, in an ascending or descending order.
Dates: An automatically generated selector gives a choice of years, months and dates that
can be used to browse metadata.
Name: Offers users the flexibility of browsing collections using author‟s names. For
example Deitel.
Title: Users can browse collections using the titles of the documents in a pool of
collections. For example, Advanced Java Programming.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
27
CHAPTER SEVEN User Interface (UI) Design
A user interface describes how users of the system interact with it. Human Computer Interaction
(HCI) basics and principles have been employed in developing the MACS DL user interfaces, to
enable users to have a seamless interaction with the system. Common interface styles that were
used are:
Menus
Forms
Principles of UI design
Consistency: The system is expected to be consistent. MACS DL achieved consistency in
the choice of colors used. The system has consistent interfaces and styles.
Learn-ability: The system should be easy to learn how to use. MACS DL is very easy to
use, providing labels and necessary information to guide users on how to best utilize it.
Informative feedback: The system should provide informative feedback to users after an
operation was performed. MACS DL adheres to this principle as at each instance, the
system provides users with feedback after a query was posed and results displayed.
Provide error prevention and handling: The system must have mechanisms to prevent
users from committing errors and if any, be able to handle them. MACS DL is no
exception as it prevents errors and system crashes.
Off-load the short term memory: Reduces the number of steps users have to perform
when carrying out an operation. MACS DL was designed to have interfaces with links
and proper labels that make users to remember easily.
Provide short-cuts for users: The system provides hyperlinks as a form of shortcuts to
navigate web pages.
System dialogue yielding closure: The system informs its users about its current state at
each instance. For example, after posing a query, the system retrieves the results with a
message that reads “RESULTS MATCHING THE QUERY” to yield closure of the IR
operation.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
28
Provide internal locus of control: The system allows users to be in control of it. Every
operation the system performs is triggered by users. For example, documents retrieved
are only downloaded when clicked.
Technologies used in the UI development
Java server pages and servlets to make the system web based.
Java scripts
eXtensible Mark-up Language (XML) to allow file formats
Hypertext Mark-up Language (HTML)
Cascading style sheets to provide presentable documents with minimal effort
eXtensible Style sheet Language (XSL) for supporting XML and HTML that are XML
compliant.
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
29
CHAPTER EIGHT System Testing and Evaluation
The system was frequently tested for errors after completing each module. Testing is the process
of exercising a program with the specific intent of finding errors prior to delivery to the end user.
The system was thoroughly tested mainly to show the following:
Errors
Requirements conformance
Performance
Quality
Who did the testing?
The developing team did most of the testing while independent testers were also invited to test
the system.
Testing strategies
Unit test
Integration test
White box test
Validation test
System test
Regression test
The following table depicts some of the modules and criteria used in the testing phase.
Table 8.1 Test results
Test case Test strategy Description Results
GUI functionality Unit test Testing action performed when
buttons and controls are clicked
PASS
Code snippets Integration test Integrating modules to form a
complete system
PASS
System performance White box test Accessing the system on concurrently PASS
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
30
System functionality Integration and
Unit test
Integrating system modules and
testing each of them for functionality
PASS
Databases connectivity Integration test Integrating a third party software,
Oracle 10g database server
PASS
Human Computer
Interaction
Unit test Each module was tested for interaction
with users
PASS
Textual objects Validation test Testing if the material uploaded is text
or not
PASS
Error handling Regression test Testing system errors PASS
System Evaluation
Evaluating the system for users to accept it as a usable tool. Direct observation and Pilot study
evaluation techniques were used to find out the user‟s views during this phase.
Direct observation: The developers observed directly when some sampled users were
evaluating the system. Users had to perform all the operations that are implemented in the
MACS DL system and evaluate results.
Pilot study: A small group of users was asked a set of questions regarding the system.
Using a questionnaire, the pilot study was conducted and users provided their evaluation
heuristics. Some of the questions asked were: Is the system usable? Is the system useful?
Evaluation results
The results obtained from the system evaluation phase were used to enhance the system‟s
functionality to make it more effective and efficient. The results were collected to guide the
developers and also users on how to improve the system and how to best use it, respectively.
The following is an in depth analysis of results obtained from the evaluation phase:
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
31
Direct observation of users:
We tried to investigate the factors that influence the perceived ease of use and usefulness of
digital libraries among NUL students. Data were collected from under-graduate students at NUL.
Individual undergraduate students were the population sample identified, and using stratified
sampling method, each student around the NUL campus such as the Thomas Mofolo library and
classrooms was handed a questionnaire.
Evaluation results and analysis
Out of One hundred and fifty questionnaires that were distributed, only sixty nine were returned,
giving a response rate of 46%. Based on the study, 60% of the respondents were Computer
Science and Engineering students, 20% were from social sciences and 10% from humanities.
Table 8.2
A scale of 1 to 4, ranking as follows was used to grade the scores:
1 = Best, 2 = Good, 3 = Not sure, 4 = Bad
Item evaluated Score Answer
HCI (Usability) 1 Best
HCI (Functionality) 1 Best
Project functionality 1 Best
System training
There will be no need of training the users as the system is usable and easy to learn.
Furthermore, MACS DL system is no exception to the already web based existing digital
library systems that are in use today, which NUL MACS department students are already
accustomed to using.
Conclusions and future prospects
MACS DL system was a success, making it an exciting endeavor that served as an eye opener to
the developers in their academic career as plenty of new computing concepts were learned during
the execution of this project. The system is ready for deployment and use in an organization as
huge as NUL.
This system covers major parts of search engine implementation like stop-word removal,
stemming, automatic Indexing, searching. To make this system a complete search engine we
could add other parts of it like clustering and thesaurus expansion. We could implement this
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
32
system for any digital objects collection such as videos, images etc. This system takes a lot of
time to upload large documents, perhaps in the future new implementation strategies could be
employed to make this faster.
References
1. Arms, W. Digital Libraries. MIT Press, Cambridge, MA, 2000.
2. Alexa T.M and Marie E.G, Principles For Digital Library Development, accessed on
September 10th
, 2011, from
http://www.lhncbc.nlm.nih.gov/dlb/pubs/200105_cacm_mccray.pdf
3. Bin Li at, The History of Digital Libraries. Accessed on September 12th
, 2011, from
http://www.ils.unc.edu/~lib/digital-library.html
4. Gerald Salton and Christopher Buckley Term-Weighting approaches in automatic text
retrieval, Cambridge, 2000.
5. Williams B. Frakes and Ricardo Baeza- Yates, Information Retrieval: Data Structures &
Algorithms, 88-94
6. Witten et al, How to Build a Digital Library, Morgan Kaufman Publishers
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
33
APPENDIX A Acronyms
A1. MACS – Mathematic and Computer Science
A2. NUL – National University of Lesotho
A3. FOST- Faculty of Science and Technology
A4. IR – Information Retrieval
A5. FI – File indexing
A6. II – Inverted Index
A7. UI – User Interface
A8. CRC – Class Responsibility Collaborator
A9. DFD – Data Flow Diagram
A11. RE – Requirements Engineering
A12. SRS – System Requirements Specification
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
34
APPENDIX B Programs
//Cascading Style sheet
#header1
{
height:200px;
padding-top: 20px;
padding: 0px 0px 0px 0px;
width: 900px;
background-repeat:no-repeat;
background-position:top;
padding-bottom: 3px;
}
#logos
{
font-family: Arial,sans-serif;
color:#FFFFFF;
font-size:18px;
font-style:italic;
padding: 15px 0px 0px 135px;
background:url(images/buka.jpg) left top no-repeat;
height: 200px;
}
*
{
border: 0;
margin: 0;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
35
}
#uploader-button
{
font-family: Arial, Helvetica, sans-serif;
font-size: 12px;
font-weight:normal;
color: #ffffff;
width: 60px;
height: 21px;
background: url(images/read.gif);
background-repeat:no-repeat;
background-position:left top;
border: none;
float:right;
}
img
{
border: 0px;
}
body{
font: 12px Arial, Helvetica, sans-serif;
color: #000000;
background: url(images/body_bg.jpg) top repeat-x #FFFFFF;
line-height: 20px;
}
#bg{
background: url(images/bg.jpg) center top no-repeat;
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
36
/* search */
#search
{
float:right;
padding-right:45px;
padding-top:1px
}
#search form
{
margin: 0;
}
#search fieldset
{
margin: 0;
padding: 0;
border: none;
}
#search input
{
float: left;
font: 11px Georgia, "Times New Roman", Times, serif;
}
#search-text
{
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
37
width: 230px;
height: 19px;
padding-top: 4px;
padding-left: 10px;
padding-right: 12px;
border: none;
background: url(images/search.png);
background-repeat:no-repeat;
background-position:left top;
color: #000000;
}
#search-submit
{
width: 40px;
height: 23px;
background: url(images/search2.png);
background-repeat:no-repeat;
background-position:left top;
border: none;
}
/*MENU*/
/*MENU*/
#menu
{
width:650px;
height:55px;
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
38
#menu ul
{
list-style:none;
padding-left:0px;
}
#menu li
{
display:inline;
}
#menu ul li a
{
font-family: Arial,sans-serif;
font-size: 18px;
font-weight:normal;
color: #008ae8;
float: left;
width: 85px;
height: 30px;
display: block;
text-align: left;
text-decoration: none;
padding-top: 5px;
padding-left:40px;
background: url(images/menu_bg.png);
background-repeat:no-repeat;
background-position:10px 5px;
}
#menu a:hover
{
width: 85px;
height: 35px;
color: #093285;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
39
text-decoration: none;
background: url(images/menu_hov.png);
background-repeat:no-repeat;
background-position:10px 5px;
}
#left_part
{
width: 100px;
float:left;
padding: 0px 0px 0px 0px;
}
.main_top
{
background: url(images/main_top.png) no-repeat top;
height: 15px;
}
.main_bot
{
background: url(images/main_bot.png) no-repeat top;
height: 15px;
width:750px;
padding-bottom: 10px;
}
.main_bg1
{
background: url(images/main_bg.png);
padding-left: 8px;
color: black;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
40
padding-right: 7px;
font-family: Tahoma;
}
/*main page*/
#main
{
width: 900px;
margin: 0px auto;
background:url(images/main.jpg) right top no-repeat;
}
#main2
{
width: 750px;
height: 400px;
margin-left: 8px;
clear:both;
/*background: url(images/left_bg.jpg);*/
background-repeat:repeat-y;
background-position:left;
}
#header {
width:900px;
height: 100px;
}
#logo {
padding: 0px 0px 0px 0px;
height: 113px;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
41
}
#logo H2 {
font-family: Arial, Helvetica, sans-serif;
color:#000000;
font-size:18px;
font-style:italic;
}
#logo a {
text-decoration: none;
text-transform: lowercase;
font-style: italic;
font-size: 16px;
color: #000000;
}
#logo H2 a{
font-size: 12px;
font-family: Arial, Helvetica, sans-serif;
font-weight:100;
}
/* buttons */
#buttons
{
text-align:center;
height: 30px;
margin: 0px auto;
padding: 0px 0px 0px 0px;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
42
background: url(images/buttons.png);
width: 600px;
}
#buttons a
{
font-family: Georgia, "Times New Roman", Times, serif;
font-size: 18px;
display: block;
float: left;
text-decoration: none;
color: #0059FF;
text-align: center;
padding-top: 0px;
font-weight:100;
width: 170px;
}
#buttons .but:hover {
text-decoration:underline;
}
.top { height:334px;
padding-top: 10px;
padding-left: 10px;
background:url(images/top.jpg) left top no-repeat;
}
.top_bot {
background: url(images/top_bot.jpg) left top no-repeat;
height: 28px}
#content
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
43
{
width: 876px;
margin: 0px auto;
background: #E6F6FF;
padding: 0px 12px 5px 12px;
line-height: 22px;
background-repeat:repeat-y;
text-align: left;
background-position:left;
}
#content_razd {
background: url(images/content_razd.gif) 586px repeat-y ;
}
#content_top {
width: 900px;
background: url(images/content_top.png) 0px top no-repeat ;
height: 10px;
}
#content_bot {
width: 900px;
background: url(images/content_bot.png) 0px bottom no-repeat ;
height: 9px;
}
.float_l {
float:left;}
.col {
width: 265px;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
44
float:left;
padding: 0px 0px 0px 0px;}
.col_razd {
background:url(images/col_text.gif) center repeat-y;
height: 124px;
width: 40px;
float:left;
margin-top: 35px;
}
h1 {
padding: 0px 0px 5px 0px;
font-family: Georgia, "Times New Roman", Times, serif;
font-size: 16px;
font-weight: bold;
color:#051B93;}
#left{
width: 558px;
float: left;
color:#000000;
margin-left: 0px;
}
.text{
padding: 0px 0px 15px 0px;
}
.img_l { float:left;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
45
margin: 6px 15px 40px 0px;
}
.img_r { float: right;
margin: 9px 10px 3px 10px;
}
.span_cont { color: #07249F;
font-size:12px;
font-weight:bold;
}
#content H2{
font-family: Georgia, "Times New Roman", Times, serif;
font-size:16px;
font-weight: bold;
color: #07249F;
text-align: left;
padding: 5px 0px 5px 0px;
}
.read_r{
text-align: right;
padding: 0px 8px 0px 0px;
background: url(images/read.gif) right 3px no-repeat;
}
.razd_g {
background: url(images/razd_g.gif) 0px 2px repeat-x;
height: 5px;
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
46
.read_r a {
font-size:12px;
color: #ffffff;
text-decoration: none;
padding-right: 9px;
}
.next {
width: 100%;
text-align: right;
padding: 0px 0px 0px 0px;}
.next a{
color:#FFFFFF;
text-decoration: none; }
.next a:hover {
text-decoration: underline; }
.more {
text-align:right;}
.more a {
color: #009FFF;
text-decoration:none;
}
#right{
float: right;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
47
width: 270px;
}
.span_dat {
color: #002380;
text-decoration: underline;}
#bottom {
background: #E6F6FF;
margin: 0px auto;
color:#000000;
padding: 0px 0px 0px 15px;
}
#b_col1 {
width: 220px;
float: left;
margin-left: 0px;
}
#b_col2 {
width: 180px;
float: left;
margin-left: 57px;
}
#b_col3 {
width: 160px;
float: left;
margin-left: 20px;
text-align: left;
}
#b_col4 {
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
48
width: 184px;
float: left;
margin-left: 35px;
text-align: left;
}
.a_icons {
color:#FF0000;
text-decoration:none;}
.a_icons:hover {
text-decoration: underline;}
#bottom ul {
list-style:none;
padding: 0px 0px 0px 0px;}
#bottom li {
padding: 8px 0px 0px 0px;
}
#bottom ul a:hover {
text-decoration:underline;
}
#bottom ul a {
color:#000000;
text-decoration:none;
font-weight: 100;}
.fu_i {
padding: 0px 14px 0px 0px;
vertical-align: middle ;
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
49
#b_col2 ul {
list-style:none;
padding: 0px 0px 0px 0px;}
#b_col2 li {
padding: 4px 0px 0px 18px;
background: url(images/fish2.gif) 0px 11px no-repeat;}
#b_col2 a {
color:#FFFFFF;
}
#footer{
font-size: 11px;
color: #000000;
text-align: center;
padding: 20px 0px 0px 0px;
height: 60px;
text-align: center;
margin: 0px auto;
}
#footer a{
color: #000000;
font-size: 11px;
text-decoration: none;
}
#footer a:hover{
color: #000000;
font-size: 11px;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
50
text-decoration: underline;
}
/* ------------------------------------------------------------------------
DO NOT CHANGE THE FOLLOWING
------------------------------------------------------------------------- */
div.pp_overlay {background: #000;display: none;left: 0;position: absolute;top: 0;width: 100%;z-index: 9500;}
div.pp_pic_holder {display: none;position: absolute;width: 100px;z-index: 10000;}
//Java source code for Index class, index.java
package InvertedIndex;
import InvertedIndex.Index.PostingListNode;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Hashtable;
public class Index implements Serializable {
public class documentVector implements Serializable {
public String docId;
public double score;
public ArrayList docVector;
public documentVector() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public documentVector(String documentId) {
//compiled code
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
51
throw new RuntimeException("Compiled Code");
}
}
public class PostingList implements Serializable {
public PostingListNode first;
public int documentFrequency;
public PostingList() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void Add(PostingListNode Node) {
//compiled code
throw new RuntimeException("Compiled Code");
}
}
public class PostingListNode implements Serializable {
public String documentId;
public int docReference;
public int termFrequency;
public PostingListNode next;
public PostingListNode() {
//compiled code
throw new RuntimeException("Compiled Code");
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
52
public PostingListNode(String docId, int tf, int docRef) {
//compiled code
throw new RuntimeException("Compiled Code");
}
}
private ArrayList PostingLists;
private int count;
private Hashtable<String, Integer> IndexTerms;
public int numOfdocuments;
public ArrayList docVectors;
public ArrayList queryVector;
private ArrayList queryterms;
public Hashtable<String, Integer> documents;
private String stopwordsPath;
public Index(String stopwords_Path) {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void addIndexTerm(String termId, String docId, int tf) {
throw new RuntimeException("Compiled Code");
}
public void Search(String query) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void getVectors() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void computeScores() {
//compiled code
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
53
throw new RuntimeException("Compiled Code");
}
public void sortDocuments() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public ArrayList RetrieveAnswer(String query) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
}
//Source code for class IndexBuider
package InvertedIndex;
import java.util.ArrayList;
import java.util.Hashtable;
public class IndexBuilder {
public Index invertedIndex;
private String document;
private String response;
private Hashtable<String, Integer> termsfrequency;
public ArrayList QueryResults;
private TextExtractor Extractor;
public IndexBuilder(String stopwords_path) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public IndexBuilder(String docId, String stopwords_path) {
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
54
//compiled code
throw new RuntimeException("Compiled Code");
}
private int frequency(ArrayList tokens, String term) {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void indexDocument() throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void SaveIndexToDisk(String path) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void ReadIndexFromDisk(String path) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void AnswerQuery(String query) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public static void main(String[] args) throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
55
}
}
//Java source code for class StemText
package InvertedIndex;
import java.util.ArrayList;
class StemText {
private char[] b;
private int i;
private int i_end;
private int j;
private int k;
private static final int INC = 50;
public StemText() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void add(char ch) {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void add(char[] w, int wLen) {
//compiled code
throw new RuntimeException("Compiled Code");
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
56
public String toString() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public int getResultLength() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public char[] getResultBuffer() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final boolean cons(int i) {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final int m() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final boolean vowelinstem() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final boolean doublec(int j) {
//compiled code
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
57
throw new RuntimeException("Compiled Code");
}
private final boolean cvc(int i) {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final boolean ends(String s) {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void setto(String s) {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void r(String s) {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void step1() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void step2() {
//compiled code
throw new RuntimeException("Compiled Code");
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
58
private final void step3() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void step4() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void step5() {
//compiled code
throw new RuntimeException("Compiled Code");
}
private final void step6() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void stem() {
//compiled code
throw new RuntimeException("Compiled Code");
}
public ArrayList stemIndexTerms(ArrayList textTokens) {
//compiled code
throw new RuntimeException("Compiled Code");
}
}
//Source code for class stopwords
package InvertedIndex;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
59
import java.io.BufferedReader;
import java.io.IOException;
import java.util.Hashtable;
public class StopWords {
public Hashtable<String, Integer> stopWords;
private BufferedReader stopWordsFile;
private int count;
public StopWords(String path) throws IOException {
//compiled code
throw new RuntimeException("Compiled Code");
}
}
//Source code for class TextExtractor
package InvertedIndex;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
public class TextExtractor {
private File file;
private String filename;
public String textFromFile;
public ArrayList Tokens;
private String stopwordsPath;
public TextExtractor(String stopwords_Path) {
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
60
//compiled code
throw new RuntimeException("Compiled Code");
}
public TextExtractor(String Filename, String stopwords_Path) {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void ExtractText() throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
public void indexTerms() throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
private void pdfFile() throws Exception {
//compiled code
throw new RuntimeException("Compiled Code");
}
private void docxFile() throws IOException, ParserConfigurationException, SAXException {
//compiled code
throw new RuntimeException("Compiled Code");
}
private void pptFile() throws IOException {
//compiled code
throw new RuntimeException("Compiled Code");
}
private void txtFile() throws IOException {
//compiled code
throw new RuntimeException("Compiled Code");
}
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
61
//Source code for
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
62
//Java source code creating a home page interface
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Mathematics & Computer Science Digital Library System</title>
<meta name="keywords" content="" />
<meta name="description" content="" />
<script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script>
<script type="text/javascript" src="lib/jquery.tools.js"></script>
<script type="text/javascript" src="lib/jquery.custom.js"></script>
<link href="styles.css" rel="stylesheet" type="text/css" />
<link href="style.css" rel="stylesheet" type="text/css" />
</head>
<script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">
function confirmMessage()
{
//display a confirmation box yielding closure of a system operation
{
alert("File successfully uploaded to server");
}
}
$(document).ready(function()
{
var passfield = document.getElementById('password_field_id');
passfield.type = 'text';
});
function focusCheckDefaultValue(field, type, defaultValue)
{
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
63
if (field.value == defaultValue)
{
field.value = '';
}
if (type == 'pass')
{
field.type = 'password';
}
}
function blurCheckDefaultValue(field, type, defaultValue)
{
if (field.value == '')
{
field.value = defaultValue;
}
if (type == 'pass' && field.value == defaultValue)
{
field.type = 'text';
}
else if (type == 'pass' && field.value != defaultValue)
{
field.type = 'password';
}
}
</script>
<body>
<div id="bg">
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
64
<div id="main">
<div id="content">
<div class="navi"></div>
<div id ="header1">
<div id="menu">
<ul>
<!--create button links-->
<li id="button1"><a href="macsdl.jsp" title="">Home</a></li>
<li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li>
<li id="button2"><a href="#" title="">Contacts</a></li>
</ul>
</div>
<div id ="logos"></div>
<div id="search">
<form method="get" action="searchResults.jsp">
<fieldset>
<input type="text" name="search" id="search-text" size="25" value ="Search"
onFocus="javascript:focusCheckDefaultValue(this, '', 'Search');"
onBlur="javascript:blurCheckDefaultValue(this, '', 'Search');"
>
<input type="submit" id="search-submit" value="" />
</fieldset>
</form>
</div>
</div>
<br/><br/>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
65
<div align="left">
<img src="images/img11.jpg" class="img_l" align="left"alt="" /><br/><br/>
<span class="span_cont">About MACS DL </span><br />
MACS DL is an educational portal for higher learning, with unlimited amounts of large pools of books,journals etc, everything you
ever needed.
</div>
<form enctype="Multipart/form-data" action="uploadfile.jsp" method="post" >
<br/><br/><br/>
<center>
<table border="2">
<tr>
<center>
<td colspan="2">
<p align ="center"><b>Upload and share your files with the NUL community</b>
</td>
</center>
</tr>
<tr>
<td>
<b>Choose a file to upload:</b>
</td>
<td>
<input name="inputfile" type="file">
</td>
</tr>
<tr>
<td colspan="2">
<p align="right"><input type="submit" id ="uploader-button" value="UPLOAD"
onclick="confirmMessage()"></p>
</td>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
66
</tr>
</table>
</center>
</form>
<div class="razd_g"></div><br />
<div class="col">
<h1>Add to the MACS DL</h1>
<img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your
files<br/>to the server, download and get stuff you need most!
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">Browse by date</h1>
<img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">SEARCH MACS DL</h1>
<img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search
button. Get the results instantly!
</div>
<div style="clear: both"></div>
<div style="height:15px; width: 100%"></div>
<div class="razd_g"></div>
<div style="clear: both"></div>
</div>
<div id="content_bot"></div>
<!-- content ends -->
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
67
<div style="height:15px; width: 100%"></div>
<!-- bottom end -->
<!-- footer begins -->
<div id="footer">
<p>Copyright 2012<p>Design by
<a href="http://www.nul.ls" title="MACS DL">Mosola Napo N</a>
<!--End of notice --></p><!-- end of copyright notice-->
</div>
<!-- footer ends -->
</div>
</div>
</body>
</html>
//Java Source code for Uploading files
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<%@page language="java"%>
<%@page import="InvertedIndex.*"%>
<%@page import ="java.io.File,java.io.FileInputStream,java.io.InputStream"%>
<%@page import="java.io.*,java.util.*, javax.servlet.*" %>
<%@page import="javax.servlet.http.*,javax.servlet.ServletException"%>
<%@page import="org.apache.commons.fileupload.*" %>
<%@page import="org.apache.commons.fileupload.disk.*"%>
<%@page import="org.apache.commons.fileupload.servlet.*" %>
<%@page import="org.apache.commons.io.output.*" %>
<%
//
//
//Upload document to the server.
File file ;
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
68
// Verify the content type
String contentType = request.getContentType();
if ((contentType.indexOf("multipart/form-data") >= 0))
{
DiskFileItemFactory factory = new DiskFileItemFactory();
String Path="C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/";
factory.setRepository(new File(Path));
String filename=null;
// Create a new file upload handler
ServletFileUpload upload = new ServletFileUpload(factory);
try
{
// Parse the request to get file items.
List fileItems = upload.parseRequest(request);
// Process the uploaded file items
Iterator i = fileItems.iterator();
while ( i.hasNext () )
{
FileItem fi = (FileItem)i.next();
filename=fi.getName();
file=new File(Path+filename);
fi.write( file ) ;
%>
You have successfully uploaded the file by the name of:<br>
<%=filename%>
<%
}
}catch(Exception ex) {
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
69
%>
<%=ex%>
<%
}%>
<%@page import="org.apache.tika.metadata.Metadata"%>
<%@page import="org.apache.tika.parser.AutoDetectParser"%>
<%@page import="org.apache.tika.sax.BodyContentHandler"%>
<%@page import="java.sql.*"%>
<%
try
{
Connection conn=null;
// ResultSet results=null;
Statement stat;
//Class.forName("com.mysql.jdbc.Driver");
//conn=DriverManager.getConnection("jdbc:mysql://localhost:3306/dl",
// "root",
// "admin");
Class.forName("oracle.jdbc.driver.OracleDriver");
conn=DriverManager.getConnection
("jdbc:oracle:thin:dl/admin@localhost:1521/XE");
String resourceLocation = Path+filename;
File file2 = new File(resourceLocation);
InputStream input = new FileInputStream(file2);
Metadata metadata = new Metadata();
BodyContentHandler handler = new BodyContentHandler();
AutoDetectParser parser = new AutoDetectParser();
parser.parse(input, handler, metadata);
String Author= metadata.get(Metadata.AUTHOR);
String Title=metadata.get(Metadata.TITLE);
String last_modified=metadata.get(Metadata.LAST_MODIFIED);
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
70
%>
Author:
<%=Author %><b></b>Title:
<%=Title %><b></b>Last_Modified:
<%=last_modified %>
<%
if(Author!=null&&Title!=null&&last_modified!=null)
{
stat=conn.createStatement();
int count=stat.executeUpdate
("insert into browse Values('"+Author.toLowerCase()+"','"+
Title.toLowerCase()+"','"+last_modified.toLowerCase()+"','"+filename+"')");
}
}
catch(SQLException exc)
{
;
}%>
<%
//
//Index the uploaded document
IndexBuilder index = new IndexBuilder(Path+filename,Path+"stopwords.txt");
index.ReadIndexFromDisk(Path+"invertedIndex.object");
index.indexDocument();
index.SaveIndexToDisk(Path+"invertedIndex.object");
%>
<%
}else
{
%>
No document uploaded!
<%
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
71
}
%>
%>
<meta http-equiv="refresh" content="0; URL=http://localhost:8080/DigitalLibrarySearch/macsdl.jsp">
<meta name="keywords" content="automatic redirection">
//Java server page for Browsing: Browse by Author
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Mathematics & Computer Science Digital Library System</title>
<meta name="keywords" content="" />
<meta name="description" content="" />
<script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script>
<script type="text/javascript" src="lib/jquery.tools.js"></script>
<script type="text/javascript" src="lib/jquery.custom.js"></script>
<link href="styles.css" rel="stylesheet" type="text/css" />
<link href="style.css" rel="stylesheet" type="text/css" />
</head>
<script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">
function confirmMessage()
{
//display a confirmation box asking the visitor if they want to get a message
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
72
{
alert("File successfully uploaded to server");
}
}
$(document).ready(function()
{
var passfield = document.getElementById('password_field_id');
passfield.type = 'text';
});
function focusCheckDefaultValue(field, type, defaultValue)
{
if (field.value == defaultValue)
{
field.value = '';
}
if (type == 'pass')
{
field.type = 'password';
}
}
function blurCheckDefaultValue(field, type, defaultValue)
{
if (field.value == '')
{
field.value = defaultValue;
}
if (type == 'pass' && field.value == defaultValue)
{
field.type = 'text';
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
73
else if (type == 'pass' && field.value != defaultValue)
{
field.type = 'password';
}
}
</script>
<body>
<div id="bg">
<div id="main">
<div id="content">
<div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->
<div id ="header1">
<div id="menu">
<ul>
<li id="button1"><a href="macsdl.jsp" title="">Home</a></li>
<li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li>
<li id="button2"><a href="#" title="">Contacts</a></li>
</ul>
</div>
<div id ="logos"></div>
<div align="center">
<br/>
<center>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
74
<a href="Browse.jsp" >Browse by author</a><br/><br/>
<a href="BrowseByTittle.jsp" >Browse title</a><br/><br/>
<a href="BrowsebyDate.jsp" >Browse by date</a><br/><br/>
</center>
</div>
</div>
<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>
<div class="razd_g"></div><br />
<div class="col">
<h1>Add to the MACS DL</h1>
<img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your
files<br/>to the server, download and get stuff you need most!
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">Browse by date</h1>
<img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">SEARCH MACS DL</h1>
<img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search
button. Get the results instantly!
</div>
<div style="clear: both"></div>
<div style="height:15px; width: 100%"></div>
<div class="razd_g"></div>
<div style="clear: both"></div>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
75
</div>
<div id="content_bot"></div>
<!-- content ends -->
<div style="height:15px; width: 100%"></div>
<!-- bottom end -->
<!-- footer begins -->
<div id="footer">
<p>Copyright 2012<p>Design by
<a href="http://www.nul.ls" title="MACS DL">Mosola Napo N</a>
<!--End of notice --></p><!-- end of copyright notice-->
</div>
<!-- footer ends -->
</div>
</div>
</body>
</html>
//Java server Page for Browsing: Browse by Title
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Mathematics & Computer Science Digital Library System</title>
<meta name="keywords" content="" />
<meta name="description" content="" />
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
76
<script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script>
<script type="text/javascript" src="lib/jquery.tools.js"></script>
<script type="text/javascript" src="lib/jquery.custom.js"></script>
<link href="styles.css" rel="stylesheet" type="text/css" />
</head>
<script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">
function confirmMessage()
{
//display a confirmation box asking the visitor if they want to get a message
{
alert("File successfully uploaded to server");
}
}
$(document).ready(function()
{
var passfield = document.getElementById('password_field_id');
passfield.type = 'text';
});
function focusCheckDefaultValue(field, type, defaultValue)
{
if (field.value == defaultValue)
{
field.value = '';
}
if (type == 'pass')
{
field.type = 'password';
}
}
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
77
function blurCheckDefaultValue(field, type, defaultValue)
{
if (field.value == '')
{
field.value = defaultValue;
}
if (type == 'pass' && field.value == defaultValue)
{
field.type = 'text';
}
else if (type == 'pass' && field.value != defaultValue)
{
field.type = 'password';
}
}
</script>
<body>
<div id="bg">
<div id="main">
<div id="content">
<div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->
<div id ="header1">
<div id="menu">
<ul>
<li id="button1"><a href="macsdl.jsp" title="">Home</a></li>
<li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li>
<li id="button2"><a href="#" title="">Contacts</a></li>
</ul>
</div>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
78
<div id ="logos"></div>
</div>
<br/><br/>
<div id="main">
<%@page import="java.io.*"%>
<%@page import="java.sql.*,java.util.*" %>
<%
String nam=request.getParameter("Name");
if(nam!=null)
{
Connection conn=null;
ResultSet results=null;
Statement stat;
Class.forName("oracle.jdbc.driver.OracleDriver");
conn=DriverManager.getConnection
("jdbc:oracle:thin:dl/admin@localhost:1521/XE");
stat=conn.createStatement();
results = stat.executeQuery("Select reference from browse "+
"Where title Like '%"+ nam.toLowerCase()+"%'");
while (results.next()) {
String filename=results.getString("reference");
%>
<!--embed src="test.pdf" width="800px" height="110px"></embed--->
<!--a href="test.pdf">test</a-->
<br><br><br>
<center>
<h1>Browse Results:</h1>
<div id="main">
<div class="main_top"></div>
<div class="main_bg1">
<tr>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
79
<td>
<a href="downloadfile.jsp?<%=filename%>
"><h2><%=filename%> </h2>
</a><br><br>
</td>
</tr>
</div>
<div class="main_bot"></div>
</div>
</center>
<%
}
results.close();
}
else
{%>
<div id="search">
<b>Enter The Title Of The Book:</b>
<form method="get" action="BrowseByTittle.jsp">
<fieldset>
<input type="text" name="Name" id="search-text" size="25" value ="Title"
onFocus="javascript:focusCheckDefaultValue(this, '', 'Title');"
onBlur="javascript:blurCheckDefaultValue(this, '', 'Title');"
>
<input type="submit" id="search-submit" value="" />
</fieldset>
</form>
</div>
<%
}%>
<br/><br/><br/><br/><br/><br/>
<div class="razd_g"></div><br />
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
80
<div class="col">
<h1>Add to the MACS DL</h1>
<img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your
files<br/>to the server, download and get stuff you need most!
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">Browse by date</h1>
<img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">SEARCH MACS DL</h1>
<img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search
button. Get the results instantly!
</div>
<div style="clear: both"></div>
<div style="height:15px; width: 100%"></div>
<div class="razd_g"></div>
<div style="clear: both"></div>
</div>
<div id="content_bot"></div>
<!-- content ends -->
<div style="height:15px; width: 100%"></div>
</div>
</div>
</div>
</body>
</html>
//Java server page for Browsing: Browse by date
<%@page contentType="text/html" pageEncoding="UTF-8"%>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
81
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Mathematics & Computer Science Digital Library System</title>
<meta name="keywords" content="" />
<meta name="description" content="" />
<script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script>
<script type="text/javascript" src="lib/jquery.tools.js"></script>
<script type="text/javascript" src="lib/jquery.custom.js"></script>
<link href="styles.css" rel="stylesheet" type="text/css" />
</head>
<script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">
function confirmMessage()
{
//display a confirmation box asking the visitor if they want to get a message
{
alert("File successfully uploaded to server");
}
}
$(document).ready(function()
{
var passfield = document.getElementById('password_field_id');
passfield.type = 'text';
});
function focusCheckDefaultValue(field, type, defaultValue)
{
if (field.value == defaultValue)
{
field.value = '';
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
82
}
if (type == 'pass')
{
field.type = 'password';
}
}
function blurCheckDefaultValue(field, type, defaultValue)
{
if (field.value == '')
{
field.value = defaultValue;
}
if (type == 'pass' && field.value == defaultValue)
{
field.type = 'text';
}
else if (type == 'pass' && field.value != defaultValue)
{
field.type = 'password';
}
}
</script>
<body>
<div id="bg">
<div id="main">
<div id="content">
<div class="navi"></div> <!-- create automatically the point dor the navigation depending on the numbers of items -->
<div id ="header1">
<div id="menu">
<ul>
<li id="button1"><a href="macsdl.jsp" title="">Home</a></li>
<li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
83
<li id="button2"><a href="#" title="">Contacts</a></li>
</ul>
</div>
<div id ="logos"></div>
</div>
<br/><br/>
<div id="main">
<%@page import="java.io.*"%>
<%@page import="java.sql.*,java.util.*" %>
<%
String nam=request.getParameter("Name");
if(nam!=null)
{
Connection conn=null;
ResultSet results=null;
Statement stat;
Class.forName("oracle.jdbc.driver.OracleDriver");
conn=DriverManager.getConnection
("jdbc:oracle:thin:dl/admin@localhost:1521/XE");
stat=conn.createStatement();
results = stat.executeQuery("Select reference from browse "+
"Where date_modified Like '%"+ nam.toLowerCase()+"%'");
while (results.next()) {
String filename=results.getString("reference");
%>
<!--embed src="test.pdf" width="800px" height="110px"></embed--->
<!--a href="test.pdf">test</a-->
<br><br><br>
<center>
<h1>Browse Results:</h1>
<div id="main">
<div class="main_top"></div>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
84
<div class="main_bg1">
<tr>
<td>
<a href="downloadfile.jsp?<%=filename%>
"><h2><%=filename%> </h2>
</a><br><br>
</td>
</tr>
</div>
<div class="main_bot"></div>
</div>
</center>
<%
}
results.close();
}
else
{%>
<div id="search">
<b>Enter Year Of Publication:</b>
<form method="get" action="BrowsebyDate.jsp">
<fieldset>
<input type="text" name="Name" id="search-text" size="25" value ="Year"
onFocus="javascript:focusCheckDefaultValue(this, '', 'Year');"
onBlur="javascript:blurCheckDefaultValue(this, '', 'Year');"
>
<input type="submit" id="search-submit" value="" />
</fieldset>
</form>
</div>
<%
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
85
}%>
<br/><br/><br/><br/><br/><br/>
<div class="razd_g"></div><br />
<div class="col">
<h1>Add to the MACS DL</h1>
<img src="images/col_img1.jpg" class="img_l" alt="" />Add you objects and share with the NUL community by uploading your
files<br/>to the server, download and get stuff you need most!
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">Browse by date</h1>
<img src="images/col_img2.jpg" class="img_l" alt="" />Browse the collection by date, specify the date and browse freely.
</div>
<div class="col_razd"></div>
<div class="col">
<h1 class="tit">SEARCH MACS DL</h1>
<img src="images/col_img3.jpg" class="img_l" alt="" />Type any query in the above search text field and click the search
button. Get the results instantly!
</div>
<div style="clear: both"></div>
<div style="height:15px; width: 100%"></div>
<div class="razd_g"></div>
<div style="clear: both"></div>
</div>
<div id="content_bot"></div>
<!-- content ends -->
<div style="height:15px; width: 100%"></div>
</div>
</div>
</div>
</body>
</html>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
86
//Java Source code for search results
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Mathematics & Computer Science Digital Library System</title>
<meta name="keywords" content="" />
<meta name="description" content="" />
<script type="text/javascript" src="lib/jquery-1.3.2.min.js"></script>
<script type="text/javascript" src="lib/jquery.tools.js"></script>
<script type="text/javascript" src="lib/jquery.custom.js"></script>
<link href="styles.css" rel="stylesheet" type="text/css" />
</head>
<script language="JAVASCRIPT" type="TEXT/JAVASCRIPT">
function confirmMessage()
{
//display a confirmation box asking the visitor if they want to get a message
{
alert("File successfully uploaded to server");
}
}
$(document).ready(function()
{
var passfield = document.getElementById('password_field_id');
passfield.type = 'text';
});
function focusCheckDefaultValue(field, type, defaultValue)
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
87
{
if (field.value == defaultValue)
{
field.value = '';
}
if (type == 'pass')
{
field.type = 'password';
}
}
function blurCheckDefaultValue(field, type, defaultValue)
{
if (field.value == '')
{
field.value = defaultValue;
}
if (type == 'pass' && field.value == defaultValue)
{
field.type = 'text';
}
else if (type == 'pass' && field.value != defaultValue)
{
field.type = 'password';
}
}
</script>
<body>
<div id="bg">
<div id="main">
<div id="content">
<div class="navi"></div>
<div id ="header1">
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
88
<div id="menu">
<ul>
<li id="button1"><a href="macsdl.jsp" title="">Home</a></li>
<li id="button2"><a href="ByAuthor.jsp" title="">Browse</a></li>
<li id="button2"><a href="#" title="">Contacts</a></li>
</ul>
</div>
<div id ="logos"></div>
<div id="search">
<form method="get" action="searchResults.jsp">
<fieldset>
<input type="text" name="search" id="search-text" size="25" value ="Search"
onFocus="javascript:focusCheckDefaultValue(this, '', 'Search');"
onBlur="javascript:blurCheckDefaultValue(this, '', 'Search');"
>
<input type="submit" id="search-submit" value="" />
</fieldset>
</form>
</div>
</div>
<br/><br/>
<center>
<br/><br/><br/>
<h1>Search Results related to the query</h1>
<div id="main2">
<div class="main_top"></div>
<div class="main_bg1">
<!--p style="line-height: 200%; margin-bottom: 3px" >First Name :</p-->
<tr>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
89
<td>
<%@page import="InvertedIndex.*,java.io.*"%>
<%//Display browsed items
String Path=
"C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/";
IndexBuilder invertedIndex =
new IndexBuilder(Path+"stopwords.txt");
invertedIndex.ReadIndexFromDisk(Path+"invertedIndex.object");
invertedIndex.AnswerQuery(request.getParameter("search"));
File filename;
for(int i=0;i<invertedIndex.QueryResults.size();i++)
{
filename=new File((String)invertedIndex.QueryResults.get(i));
String file=filename.getName();
%>
<h1>
<a href="downloadfile.jsp?<%=file%>">
<%=file%>
</a></h1>
<%
}
%>
</td>
</tr><br /><br/>
</div>
<div class="main_bot"></div>
</div>
</center>
MATHEMATICS AND COMPUTER SCIENCE DIGITAL LIBRARY SYSTEM
90
<!-- content ends -->
</div>
</div>
</body>
</html>
//Source code for downloading a file from Server, downloadfile.jsp
<%
String filename=request.getQueryString();
String Path="C:/Users/KELVIN/Documents/NetBeansProjects/DigitalLibrarySearch/documents/";
File file=new File(Path+filename);
BufferedInputStream reader=
new BufferedInputStream(new FileInputStream(file));
try
{
//servlet=response.getOutputStream();
response.setContentType("APPLICATION/OCTET-STREAM");
response.setHeader("Content-Disposition","attachment;filename="+file.getName());
//response.setContentLength((int)file.length());
//start to read file contents in bytes
int iterator=0;
while((iterator==reader.read())!= -1)
out.write(iterator);
reader.close();
out.close();
}
//Errors were caught
catch(Exception error){ }
%>