Download pdf - A Novel Vlog Management Model of Automatic Vlog Annotation and Search With Context and Content Analysis

8/7/2019 A Novel Vlog Management Model of Automatic Vlog Annotation and Search With Context and Content Analysis

1/83

CHAPTER 1

INTRODUCTION

1.1 ABOUT THE PROJECT

The main aim of this project is to effectively manage the video blogs

(vlogs) and make them more conveniently accessible for users.

A video blog (vlog) is a blog which uses video as the primary content, often

accompanied by supporting text, image, and additional metadata to provide

context.Compared with general videos, vlogs have several unique

characteristics. A vlog often provides textual content as description of the

video. As a medium for communication, a vlog usually has some comment

entries conveying vlog viewers opinions. Some unique but useful

information of a vlog, such as submitting time, viewed times, comment entry

number, and popularity rating, can be easily obtained.

We proposed a novel vlog management model which is comprised of

automatic vlog annotation and user-oriented vlog search. For vlog

annotation, we extract informative keywords from both the target vlog itself

and relevant external resources; besides semantic annotation, and we

perform sentiment analysis on comments to obtain the overall evaluation.

For vlog search, we present saliency-based matching to simulate human

perception of similarity, and organize the results by personalized ranking

and category-based clustering.

1


2/83

1.2 LITERATURE SURVEY

Blogging has been a textual activity, but text is only one aspects of the

diverse skills which are needed in order to understand and manage different

aspects of modern communication. Broadband connections are likely to

stimulate a rapid increase in audio-visual services on the web, presumably

changing the future conditions for blogging. Videoblogs can facilitate

practices which promote media literacy and collaborative learning through

the making of collective documentaries. Videoblogs with wiki-like functions

promise to turn users into producing collectives rather than individual

consumers of audiovisual content.

Arguably textual blogging has an effect on journalism by making the

process of gathering information more dynamic, potentially involving the

public before the moment of print publication, or even as providers of

content. When it comes to audiovisual media the users begin to abandon

broadcasting in favour of broadband services . This development will

probably continue as new technologies transform the viewing experience

into a more personalized activities, potentially by-passing the traditional

broadcaster completely .

2


3/83

1.2.1 VIDEOBLOGS

The importance of video, being a very powerful medium, the increased

amount of video material on the web, and the possibilities offered byweblogs when it comes to collaboration sums up in videoblogs as one of

the most promising tools which may foster media literacy.

Blogs began as a textual genre of personal publishing, but the genre has

developed visual expressions, like phototblogs, and more recently adapting

sound and video. Most bloggers publish short posts, they write quite often

with an extensive use of hypertext-linking. Linking and commenting makes

blogging a kind of collaboration, where individual bloggers are writing in

context created through collective, but often unorganised effort.

Textual blogs have at least three characteristics, apart from usable and easy

accessible software, which have made them easy to use whether as a

"producer" (writer), a "consumer" (reader) or both: Textblogs are based on

non-temporal media which is easily controllable, they are easy to cite and

they are part of long textual tradition, "re-mediating" many of the features

known from diaries and journals. Even though there are several substantial

differences the easiest way to explain blogging often is to begin with

reffering to "an online diary".

When it comes to audiovisual blogs these are more difficult to explain:

Audio- and videoblogs are based on temporal media and there are no

established tradition which they are closely related to. Audioblogs can

3


4/83

hardly be compared to radio or recorded sound and videoblogs are not like

television or private filmmaking: In contrast to broadcasting blogs are

personal and at the same time they are shared by people outside the private

sphere.Production, presentation and distribution: In all these areas bloggingpromise to be close to the opposite of broadcasting. Looking for the sources

of a videoblogging language we therefore have to explore other aspects of

audiovisual culture.

1.2.1.1 SEVERAL VIDEOBLOG-TRADITIONS

First one have to come to terms with what characterizes a videoblog. We

have to distinguish between several significantly different technical

solutions which claim to be videoblogs, ranging from simply uploading

unedited videofiles, via play-lists to edited sequences, sometimes with

complex interactivity. It becomes difficult to define distinct genres, but in

general there seems to be some major traditions, which of course interfere

into a number of sub-genres. I make a distinction between "vogs", which arebased on pre-edited sequences with interactive features, video-"moblogs",

consisting of relatively short, autonomous video-clips, and playlists, which

are collections of references to video-files on different servers.

Some of the characteristics of vogs and "vogging" are formulated in a

manifesto written by Adrian Miles, inspired by the danish filmmaking

initiative Dogme 95. In this tradition videoblogs are personal publishing of

video, exploring the potential of linking, using technology which is easy

available. Vogs are made of edited sequences which normally include

interactive elements. They are typically made with different kinds of

4


5/83

software on the producers computer and posted to individual websites. The

other major tradition has emerged along with the introduction of mobile

input-devices with internet connection (smartphones, PDAs, camera

cellphones).

Blogging from mobile devices is particularly interesting in relation to

documentary filmmaking, trying to grasp moments in life provoked or

captured by the presence of a camera. A pioneer within the moblogging

tradition is Steve Mann who has experimented with wearable cameras,

posting images to the Web since 1994 (Mann 1997). Today most moblogs

are based on technology quite similar to textual blogs with posts containing

uploaded pictures or videoclips and additional text. Moblogs are often

hosted by professional service providers where a large number of blogs

share the same infrastructure. Videos on moblogs normally contain

individual videoclips, not edited sequences. Vogs and moblogs are quite

different, both regarding they way they are produced and the way they are

consumed.

Miles makes a good distinction between the two traditions by emphasizing

that "a vog is a video blog where video in a blog must be more than video in

a blog". The posts in vogs are edited and may offer quite complex

interactivity. Therefore those who produce vogs have to combine skills in

the use of software with the ability to manipulate moving images and addhypertextual interactivity. If we restrict vogs to video-content there are not

many voggers around, even including those posting cinematic 2D- and 3D-

animations the number is still relatively small.

5


6/83

Even though there are no major technical barriers some skills are required,

preventing most users from becoming "voggers". Moblogs on the other hand

are easy to use, in most cases it is just a matter of simply uploading video-

files to a dedicated webserver. However, this is not only a question of easeof use, possibly even more important is the time which the producer has to

invest in order to get his material on the web. When posting is not time-

consuming bloggers are encouraged to post often, an aspect which has made

a lot of text-bloggers and blog-readers become avid users. The same criteria

for success apply to videoblogs.

Playlists are perhaps not videoblogs, but an interesting genre because they

use technologies which may bridge the gap between vogs and moblogs.

Playlists adress individual files on different servers and may even provide a

level of interactivity without manipulating the content in these files. One

way to achive this is by using SMIL (Synchronized Multimedia Integration

Language) an established XML-dialect defined by The World Wide Web

Consortium in order to control different media distributed through the

internet. SMIL seems to be an ideal platform for distributing various content

as "movies" without moving or manipulating the original source-files

(videoclips, pictures and text). Since SMIL-files are based on an open format

(XML) stored as ASCII-text it is quite easy to make alternative versions of

playlists, taking advantage of server-side applications and the internets

transparent nature.

6


7/83

1.2.2 COLLECTIVE DOCUMENTARIES

Looking for existing genres which videoblogs re-mediates the closest we get

are some traditions known from documentary filmmaking. One of these isdiary-films, which are personal first-person narratives. Another tradition is

found-footage films, which often are based on old private material filmed by

others than the filmmaker himself. Found-footage films are part of a larger

tradition known as compilation film, using material from a variety of

sources, including archived material. Any kind of film and video have the

possibility of ending up as found footage: Your grandfathers Super-8movies, old commercials, parts of feature films, recorded televion etc. Quite

a few excellent filmmakers have made their first movies with material

found as leftovers in a studio or in a film school. William Wees discusses

three general ways in which found footage is most often used

1. Compilation : Film where the editor cuts together pieces of footage inorder to illustrate a point. The images are intended to represent reality

and is typical in television documentaries.

2. Collage : Film which use found footage to create metaphors, provoke self-

consciousness and encourage critical viewing. The viewer is able to read

images critically with attention to the metaphors.

3. Appropriation : Film where images are reused in order to be decorative.

Representation is about surface, rather than creation of secondary

7


8/83


9/83

parts of the production process while others remain almost the same as if the

process was done with an offline computer. After the video-material is

recorded the videoblogging process can be divided into five stages :

1. "Posting", 2. "Selecting", 3. "Editing", 4. "Storing" and 5. "Re-editing".

1. Posting

The success of blogging is partly a result of cheaper net connections and the

increasing number of computer literate people using the web.Videologs may

become a genre where "href tracks", "sprites" and "interactive" elements

enhance the users personal experience in ways that are unique to computer

mediated communication. The considerable downside is that advanced

interactive features have consequences for the amount of work which each

user have to put into his posts. When posting becomes a task the most

important advantage of blogging disappears and the media literacy-

potential will decrease. Following the moblog tradition, emphasizingsimplicity, posting should be as easy as possible. The users should be

encouraged to post short clips of unedited video-material which is

transformed into a unified video-format on the server, becoming a common

resource for future editing and citation. Comments is a way of posting

which is important when building an online community for at least two

major reasons: First comments is the easiest way to become a part of a

community without having to be among those who provide the content in the

first place. Secondly comments help maintaining a community by making

different members aware of each other. Frequent comments give those who

post an explicit confirmation of their publics presence in addition to some

9


10/83

substantial feedback. This is particularly important in photo- and video-

blogs because these are personal expressions, often even more vivid than

most textual blogs which in many cases might be considered as varieties of

content management.

2.selecting

A flexible system must allow a media-object to be assigned to multiple

categories, allowing hierarchy but without enforcing it. This will in most

cases be an excellent system usable for a large videoblogging environment,

perhaps even with features combining personal and global categories.

Combined with personal information, time and date of posting and possibly

geograpgical information, this provides metadata for searching or automatic

generation of play-lists which may be re-edited manually.The problem with

searching and search results is that the information is shown out of context.

In order to get an idea of the quality of a specific clip it might be helpful to

know how many who have used it in a sequence. If a lot of people have used

it you have an indication that it might be worth looking at. Sequences are infact the best way to view a clip as long as it provides both context and an

example of how the clip might be trimmed (where to begin and where to

end).

3. Editing

Both the edit and the link provides context to separated segments, but both

theory and practice addressing editing / linking tend to concentrate on the

segments. Editing techniques which have a duration, like dissolves and

wipes, are easier to identify, but they are almost never used unless the

director wants to call for the viewers attention. In an analogous manner

10


11/83

theory and practice in hypertext design emphasize the nodes in hypertext

should be designed in order to make navigation as intuitive as possible,

emphasizing the importance of a unified design in order not to confuse the

user. The editing interface has to display all the clips which the user haveselected during a session.The editing process result in a text-document

(SMIL), small enough to allow storage of an almost infinite number of

sequences, which can be made by combining a limited number of video-

clips.

4. Storing

Collective editing capabilities relies on storage which consume as little disk-

space and bandwidth as possible. Downloading video-clips in order to re-

edit a sequence and uploading a new version will not be effective. I would

like to propose an approach to videoblogs focusing on the simplicity known

from moblogs combined with an easy to use editing-interface which makes

it possible to combine clips from different logs into sequences and store

these as SMIL-documents (Synchronized Multimedia Integration Language).In order to make a flexible system all references to users, shots and

sequences are stored in a database, generating SMIL-documents "on the fly".

Because SMIL documents are plain text files which might be played in

Quicktime it becomes an easy task to generate customized movies using

server-side applications. The online editor must be capable of combining

clips into sequences and store these as SMIL-documents in the creators blog.

The SMIL-document include references to the video-clips, controlling order

and duration, positioning, and additional text.

11


12/83

5. Re-editing

Re-editing in a videoblogging environment means that any user can take a

sequence or a number of individual video-clips into the editor, make a re-

edit and store the result as a new sequence in his own blog. The idea ofmaking collective documentaries or fiction in this way is intriguing: Those

who are unhappy with a version, may comment on the original sequence or

just make their own version, possibly adding their own content.

1.2.2.2 CURRENT VIDEO BLOGS

Current video blogs are essentially text blogs with externally linked videosfor each entry. Though the fragments of video content form a cohesive diary,

theyre always introduced and navigated to via text. . Typical examples of

current video blogs are Miles teaches the theory and practice of hypermedia

and interactive video at RMIT University, Australia, and uses his blog to

demonstrate some of the ideas.His video blog differs because he includes

timed hyperlinks to other Web resources inside his videos, and haspostproduced speech tracks with timed transcriptions of his speech inside the

QuickTime video files. Its an exceptional video blog, requiring skills and

tools not usually available to the more common blogger. Although those

pioneering the creation of video blogs are finding ample room for

expression, and they obviously enjoy pushing the limits of current

technology, users cite a few problems with current video blogs

theres no way to add comments in video form;

video items cant be easily found via search engines;

video items cant be aggregated easily;

12


13/83

interesting clips cant be viewed on their own; instead the video must

be played back in its entirety; and as with most multimedia on the

Web, sufficiently high bandwidth for reasonable quality video isnt

widely available.

1.2.2.3 VIDEO BLOG SEARCH

The aim is to make video blogs as easily searchable by Web search engines

as normal Web pages. Web search engines increasingly support scanning of

RSS and Atom feeds, which allows more tightly coupled searches of actual

blog entries. In fact, the original blogging company Blogger was acquired by Google to bring about such developments. Although a multimedia

syndication language would be just as amenable to scanning and indexing by

search engines, it would unfortunately be able to index only the metadata

referring to a video blog entry, not the actual video content itself. Ideally, the

search engine should index the video content itself, by scanning for

embedded transcriptions and timed metadata or even by performing

automated analysis of the video content directly.

1.2.2.4 VIDEO BLOG COMMENTS

That readers can add comments to blog entries is very popular, allowing

friendly advice and spontaneous discussions to take place in remote corners

of the Web. Its easy to imagine how lively these discussions would be in

video formatif any viewer could easily provide feedback for others towatch and themselves respond to. The technology for such video forums is

already being explored in various areas of the telecommunications industry,

but usually in the context of developing vendor-specific applications for

13


14/83

limited numbers of users. In such environments, users are simply able to

point-andshoot to reply to commentary posted by others.

1.2.3 WORDNET: A LEXICAL DATABASE FOR ENGLISH

WordNet provides a more effective combination of traditional lexicographic

information and modern computing. WordNet is an online lexical database

designed for use under program control. English nouns, verbs, adjectives,

and adverbs are organized into sets of synonyms, each representing a

lexicalized concept. Semantic relations link the synonym sets.

1.2.3.1 LANGUAGE DEFINITIONS

We define the vocabulary of a language as a set W of pairs (f,s), where a

formfis a string over a finite alphabet, and a sense s is an element from a

given set of meanings. Forms can be utterances composed of a string of

phonemes or inscriptions composed of a string of characters. Each form with

a sense in a language is called a word in that language. A dictionary is an

alphabetical list of words. A word that has more than one sense ispolysemous; two words that share at least one sense in common are said to

besynonymous. A words usage is the set Cof linguistic contexts in which

the word can be used. The syntax of the language partitions Cintosyntactic

categories. Words that occur in the subset Nare nouns, words that occur in

the subset Vare verbs, and so on. Within each category of syntactic contexts

are further categories ofsemantic contextsthe set of contexts in which a

particularf can be used to express a particulars. The morphology of the

language is defined in terms of a set Mof relations between word forms. For

example, the morphology of English is partitioned into inflectional,

derivational, and compound morphological relations. Finally, the lexical

14


15/83

semantics of the language is defined in terms of a set Sof relations between

word senses. The semantic relations into which a word enters determine the

definition of that word.

In WordNet, a form is represented by a string of ASCII characters, and a

sense is represented by the set of (one or more) synonyms that have that

sense. WordNet contains more than 118,000 different word forms and more

than 90,000 different word senses, or more than 166,000 (f,s) pairs. words in

WordNet are polysemous; approximately 40% have one or more synonyms.

WordNet respects the syntactic categories noun,verb, adjective, and adverb

the so-called open-class words .For example, word forms like back,

right, or well are interpreted as nouns in some linguistic contexts, as

verbs in other contexts, and as adjectives or adverbs in other contexts; each

is entered separately into WordNet. It is assumed that the closed-class

categories of Englishsome 300 prepositions, pronouns, and determiners

play an important role in any parsing system; they are given no semantic

explication in WordNet. WordNet includes the following semantic relations:

Synonymy is WordNets basic relation, because WordNet uses sets of

synonyms (synsets) to represent word senses. Synonymy (syn same, onyma

name) is a symmetric relation between word forms.

Antonymy (opposing-name) is also a symmetric semantic relation between

word forms, especially important in organizing the meanings of adjectives

and adverbs. Hyponymy (sub-name) and its inverse, hypernymy (super-name), are

transitive relations between synsets. Because there is usually only one

hypernym, this semantic relation organizes the meanings of nouns into a

hierarchical structure.

15


16/83

Meronymy (part-name) and its inverse, holonymy (whole-name), are

complex semantic relations. WordNet distinguishes component parts,

substantiveparts, and memberparts.

Troponymy (manner-name) is for verbs what hyponymy is for nouns,although the resulting hierarchies are much shallower.

Entailmentrelations between verbs are also coded in WordNet.

An XWindows interface to WordNet allows a user to enter a word form and

to choose a pull-down menu for the appropriate syntactic category. The

menus provide access to the semantic relations that have been coded into

WordNet for that word.

1.2.3.2 CONTEXTUAL REPRESENTATIONS

In information retrieval, a query intended to elicit material relevant to one

sense of a polysemous word may elicit unwanted material relevant to other

senses of that word. For example, in computer-assisted instruction, a student

asking the meaning of a word should be given its meaning in that context,

not a list of alternative senses from which to pick. WordNet lists the

alternatives from which choices must be made. WordNet would be much

more useful if it incorporated the means for determining appropriate senses,

allowing the program to evaluate the contexts in which words are used. The

limits of a linguistic context can be defined arbitrarily, but we prefer to

define it in terms of sentences. That is to say, two words co-occur in the

same context if they occur in the same sentence. A semantic concordance isa textual corpus and a lexicon combined so that every substantive word in

the text is linked to its appropriate sense in the lexicon.

16


17/83

1.2.4 WORD ASSOCIATION NORMS, MUTUAL

INFORMATION, AND LEXICOGRAPHY

1.2.4.1 MEANING AND ASSOCIATION

It is common practice in linguistics to classify words not only on the basis oftheir meanings but also on the basis of their co-occurrence with other words.

Running through the whole Firthian tradition, for example, is the theme that

"You shall know a word by the company it keeps" .On the one hand, bank

co-occurs with words and expression such as money, notes, loan, account,

investment, clerk, official, manager, robbery, vaults, working in a, its

actions, First National, of England, and so forth. On the other hand, we findbank co-occurring with river, swim, boat, east (and of course West and

South, which have acquired special meanings of their own), on top of the,

and of the Rhine.

The search for increasingly delicate word classes is not new. In

lexicography, for example, it goes back at least to the "verb patterns"

described in Hornby's Advanced Learner'sDictionary (first edition 1948).

What is new is that facilities for the computational storage and analysis of

large bodies of natural language have developed significantly in recent

years, so that it is now becoming possible to test and apply informal

assertions of this kind in a more rigorous way, and to see what company our

words do keep.

1.2.4.2 WORD ASSOCIATION AND PSYCHOLINGUISTICS

Word association norms are well known to be an important factor in

psycholinguistic research, especially in the area of lexical retrieval.

Generally speaking, subjects respond quicker than normal to the word nurse

if it follows a highly associated word such as doctor. Some results and

17


18/83

implications are summarized from reaction-time experiments in which

subjects either (a) classified successive strings of letters as words and

nonwords, or (b) pronounced the strings.

1.2.4.3 PREPROCESSING WITH A PARSER

Hindle has found it helpful to preprocess the input with the Fidditch parser

to identify associations between verbs and arguments, and postulate

semantic classes for nouns on this basis. Hindle's method is able to find

some very interesting associations, demonstrate. After running his parser

over the 1988 AP corpus (44 million words), Hindle found N = 4,112,943

subject/verb/object (SVO) triples. The mutual information between a verb

and its object was computed from these 4 million triples by counting how

often the verb and its object werefound in the same triple and dividing by

chance. Thus, for example, disconnect/V and telephone/0 have a joint

probability of7/N. In this case, chance is 84/Nx 481/N because there are 84

SVO triples with the verb disconnect, and 481 SVO triples with the object

telephone. The mutual information is log z 7N/(84 481) = 9.48. Similarly,

the mutual information fordrink/Vbeer/O is 9.9 = log 2 29N/(660 195).

(drink/V and beer/O are found in 660 and 195 SVO triples, respectively;

they are found together in 29 of these triples). This application of Hindle's

parser illustrates a second example of preprocessing the input to highlight

certain constraints of interest. For measuring syntactic constraints, it may be

useful to include some part of speech information and to exclude much ofthe internal structure of noun phrases. For other purposes, it may be helpful

to tag itemsand/or phrases with semantic labels such as *person*, *place*,

*time*, *body part*, *bad*, and so on.

18


19/83

1.2.5 AUTOMATIC SEMANTIC ANNOTATION FOR VIDEO

BLOGS

Vlog annotation is essentially a multi-labelingprocess , as a vlog can usually

be annotated with multiple words. There exist many effective approaches formulti-label image/video annotation, and it has become a trend that the

annotation should be extracted not only from the target image/video itself,

but also from other images/videos which are relevant to it.

1.2.5.1 AUTOMATIC VLOG ANNOTATION

In our vlog annotation model, the annotation of a vlog consists of two parts:

the intrinsic annotation extracted from the text of the target vlog and the

expanded annotation from relevant external resources.

1.2.5.2 INTRINSIC ANNOTATION EXTRACTION

Since a vlog often has supporting text in itself, we can extract informative

keywords as its intrinsic annotation. The textual content in a vlog mainly

comprises the title, description, and comments, among which the title and

description are closely related to the semantics of the vlog video, while the

comments are often filled with irrelevant words and thus too noisy to be

used. As a result, only the title and description are kept for annotation

extraction. As the title indicates the main topic of the whole vlog, it is of the

greatest importance for understanding the semantics of the vlog. Therefore

we first extract annotation words from the title. After stop word removal,important words are reserved in the word set Wtitle. For the textual

description, we also remove the stop words beforehand. Then, using the

standard text processing technique such as tf-idf, we can acquire the

important words, and create another word set, Wdescription. Since in the

19


20/83

description not all the words are relevant to the semantics of the central

video, Wdescription can be rather noisy. Considering the fact that in an

article, keywords are usually used to reveal the main subject, or the title, we

assume that if an annotation word is a good one, it should be highlycorrelated with at least one word in Wtitle. Therefore, we delete from

Wdescription the words which have low correlation with all the words in

Wtitle.

1.2.5.3 CONTEXT-BASED ANNOTATION EXPANSION

External annotation candidate extraction

Inspired by the search-based annotation methods , we conduct annotation

expansion for the target vlog through a searchbased mode, where a labeled

database is indispensable. As we know, YouTube4 is one of the most

popular video sharing websites which has by far the biggest collection of

videos. Each video on YouTube is labeled by one or more tags. Therefore,

we use YouTube as our labeled database. Given a keyword query, the text-

based video search engine (powered by Google) in YouTube can return

rather good results, hence we can use YouTube search to find the

semantically related videos. For the target vlog, we submit each word w in

Wintrinsic as a query to YouTube searcher, and get the corresponding search

resultsRw (for simplicity, only the top-ranked 20 results are included). For

each result rinRw, we extract the videos representative frame fr(which is

usually the first frame of the video) and the corresponding tags. Then,among the semantically related videos, visually related ones are selected

through content-based similarity between the vlog video and the result

videos found on YouTube. We define the visual similarity between a result

20


21/83

video rand the vlog video v as the maximum image similarity between the

representative framefrofrand the keyframefv ofv:

After the above two search stages, we have obtained a batch of videos which

are relevant to the vlog both semantically and visually with regard to theintrinsic annotation word w. We then gather the tags of all the reserved

videos into a tag set T(w), which is adopted as the external annotation

candidates for the vlog. This process is applied for each intrinsic annotation

word w in Wintrinsic. Finally, we obtain the word set Wexternalfor external

annotation candidates:

Context-based annotation refinement

Although the videos used for annotation expansion are all semantically and

visually relevant to the target vlog, it dose not follow that all the tags of the

videos are also relevant to the vlog. In the process of annotation expansion,

we have to deal with the serious problem of semantic drift. Therefore, we

should refine the expanded annotation candidates and delete the irrelevant

words. We calculate the relevance between an annotation candidate c and

the vlog by comparing c with the words in Wintrinsic. As we know, when

comparing two words, we should consider not only the semantics in them

but also the specific contexts they are in. In this paper, we propose a novel

context histogram to depict the semantics of a word in a specific context. For

a word w, its context is substantially a set of words which confines its

specific semantics. We first calculate the one-to-one correlation between w

and each of the words in its context Wcontext. Then, we organize all thecorrelation values into a histogram and get the context histogram forw with

respect to Wcontext. The problem of context comparison is now reduced to

histogram comparison. Here we simply use histogram intersection as a

metric of the context histogram similarity.We perform the context-based

21


22/83

external annotation refinement as follows: For an intrinsic annotation word

w ofWintrinsic, we create its context histogram with respect to Wcontext=

Wintrinsic {w}; while for an annotation candidate c in Wexternal, we also

build its context histogram with respect to the same Wcontext= Wintrinsic {w}. In order to compare c with w, we calculate both their one-to-one word

correlation Simword and their contextual similarity Simcontext. The total

correlation between c and w is defined as:

Sim ( , ) Sim ( , total c w = word c w) + Simcontext(c,w) ,

where and are adjustable parameters. Only those annotation candidates

with high relevance to Wintrinsic are kept in Wexternal. After the

refinement, we merge Wintrinsic and Wexternal to get the final annotation

for the target vlog.

1.2.6 ANNOSEARCH: IMAGE AUTO-ANNOTATION BY SEARCH

A novel solution to image auto-annotation problem is rather than training a

concept model using supervised learning techniques as most previous works

do, we propose a data-driven approach leveraging the Web-scale image

dataset and search technology to learn relevant annotations. In an ideal case,

if a well annotated and unlimitedscale image database is available, then for

any query image, we can find its duplicates in this database and simply

propagate its annotation to the query image. In a more realistic case that the

image database is of limited scale, we can still find a group of very similar

images in terms of either global features or local features, extract salient

phrases from their descriptions, and select the most salient ones to annotate

the query image. Thus to by-pass the semantic gap, we can divide and

conquer the annotation problem in two steps:

1) find one accurate keyword for a query image;

22


23/83

2) given one keyword, find complementary annotations to describe the

details of this image.

The requirement in the first step is not as lacking in subtlety as it may first

seem. For example, in a desktop photo search, users usually provide alocation or event name in the folder name. Or, in a Web image search, we

can choose one of a Web images surrounding keywords as the query

keyword.

1.2.6.1 THE ANNOSEARCH SYSTEM

It contains three stages: the text-based search stage, the content-based search

stage and the annotation learning stage.

1.2.6.2 TEXT-BASED SEARCH

Jeon et al. recommend using high quality training data to learn prediction

models as it affects greatly the annotation performance. Hence in our

approach, we collected about 2.4 million high-quality Web images

associated with meaningful descriptions from online photo forums. Thesedescriptions capture the corresponding images contents to certain degrees.

1.2.6.3 CONTENT-BASED SEARCH

Because visual features are generally of high dimensional, similarity-

oriented search based on visual features is always a bottleneck for large-

scale image database retrieval on search efficiency. To overcome this

problem, we adopt a hash encoding algorithm to speed up this procedure.

23


24/83

CHAPTER 2

SYSTEM ANALYSIS

2.1 EXISTING SYSTEM

The traditional annotation models focus exclusively on the semantic

aspect, while the sentimental aspect is totally neglected. The most existing

vlog search methods employ the traditional text-based retrieval techniques

which mainly rely on the textual content of the vlogs.

2.2 PROPOSED SYSTEM

In proposed vlog management model when a user uploads a vlog to

the database, semantic annotation will run automatically using vlog text and

relevant external resources, and sentiment evaluation is obtained from vlog

comments. After that, the vlog will be stored in the database with the

corresponding annotation and evaluation. When a user submits a query to

the search engine, the vlog search module will access the vlog database to

obtain relevant vlogs by saliency-based matching; then, using user specified

ranking strategy and clustering, the results will be returned to the user in a

well-organized manner.

24


25/83

CHAPTER 3

REQUIREMENT SPECIFICATIONS

3.1 HARDWARE REQUIREMENTS

Hard Disk : 10GB and above

RAM : 256MB and above

Processor : Pentium IV

3.2 SOFTWARE REQUIREMENTS

Windows Operating System

JDK 1.6

Web Browser Internet Explorer

Glassfish Application server

Apache Tomcat Server

Oracle 10G

JMF

3.3 SOFTWARE DESCRIPTION

3.3.1 JAVA

It is a Platform Independent. Java is an object-oriented programming

language developed initially by James Gosling and colleagues at Sun

Microsystems. The language, initially called Oak (named after the oak trees

outside Gosling's office), was intended to replace C++, although the feature

set better resembles that of Objective C.

25


26/83

3.3.1.1 INTRODUCTION TO JAVA

Java has been around since 1991, developed by a small team of SunMicrosystems developers in a project originally called the Green project.

The intent of the project was to develop a platform-independent software

technology that would be used in the consumer electronics industry. The

language that the team created was originally called Oak.

The first implementation of Oak was in a PDA-type device called Star Seven

(*7) that consisted of the Oak language, an operating system calledGreenOS, a user interface, and hardware. The name *7 was derived from the

telephone sequence that was used in the team's office and that was dialed in

order to answer any ringing telephone from any other phone in the office.

Around the time the First Person project was floundering in consumer

electronics, a new craze was gaining momentum in America; the craze was

called "Web surfing." The World Wide Web, a name applied to the Internet'smillions of linked HTML documents was suddenly becoming popular for

use by the masses. The reason for this was the introduction of a graphical

Web browser called Mosaic, developed by ncSA. The browser simplified

Web browsing by combining text and graphics into a single interface to

eliminate the need for users to learn many confusing UNIX and DOS

commands. Navigating around the Web was much easier using Mosaic.

It has only been since 1994 that Oak technology has been applied to the

Web. In 1994, two Sun developers created the first version of Hot Java, and

then called Web Runner, which is a graphical browser for the Web that

exists today. The browser was coded entirely in the Oak language, by this

26


27/83

time called Java. Soon after, the Java compiler was rewritten in the Java

language from its original C code, thus proving that Java could be used

effectively as an application language. Sun introduced Java in May 1995 at

the Sun World 95 convention.

Web surfing has become an enormously popular practice among millions of

computer users. Until Java, however, the content of information on the

Internet has been a bland series of HTML documents. Web users are hungry

for applications that are interactive, that users can execute no matter what

hardware or software platform they are using, and that travel across

heterogeneous networks and do not spread viruses to their computers. Javacan create such applications.

3.3.1.2 WORKING OF JAVA

For those who are new to object-oriented programming, the concept of a

class will be new to you. Simplistically, a class is the definition for a

segment of code that can contain both data and functions.

When the interpreter executes a class, it looks for a particular method by the

name of main, which will sound familiar to C programmers. The main

method is passed as a parameter an array of strings (similar to the argv[] of

C), and is declared as a static method.

To output text from the program, we execute the println method of

System.out, which is javas output stream. UNIX users will appreciate thetheory behind such a stream, as it is actually standard output. For those who

are instead used to the Wintel platform, it will write the string passed to it to

the users program.

27


28/83

3.3.2. JAVA MEDIA FRAMEWORK

The Java Media Framework (JMF) is an application programming interface

(API) for incorporating time-based media into java application and applets.

This guide is intended for java programmers who want to incorporate time-

based into their application and for technology providers who are interested

in extending JMF and providing JMF plug-ins to support additional media

types and perform custom processing and rendering.

The JMF 1.0 API enables programmers to develop java program that

presented time-based media .The JMF2.0 API extends the framework to

provide support for capturing and storing media data, controlling the type of

processing that is performed during addition ,JMF2.0 defines a plug-in API

that enables advanced developers and technology providers to more easily

customize and extend JMF functionality.

JMF provides a unified architecture and managing protocol for managing

the acquisition, processing and delivery of time-based media data. JMF is

designed to support most std media content types, such as AIFF, AV, AVI,

GSM, MIDI , MPEG , quickTime, RMFand , WAV.

By exploiting the advantage of the java platform, JMF delivers the promise

of write once,Run Anywhere, to developers who want to use media such

as audio and video in their java programs. JMF provides a common cross

platform java API for accessing underlying media frame works. JMF

implementation can leverage the capabilities of the underlying OS, while

developers can easily create portable java programs that features time-based

media by writing to the JMF API.

28


29/83

With JMF , creating applets and applications that present, capture,

manipulate and store time-based media .The frame work enables advanced

developers and technology provides to perform custom processing of raw

media data and seamless extend JMF to support additional content types andformats, optimize handling of supported formats and create new presentation

mechanism.

HIGH-LEVEL ARCHITECTURE

Devices such as tape decks and VCRs provide a familiar model for

recording, processing and presenting time based media. When playing a

movie using a VCR, it provides the media stream to the VCR by inserting a

video tape. The VCR reads and interprets the data on tape and sends

appropriate signals to television and speakers.

JMF uses this same basic model .A data source encapsulates the media

stream much like a video tape and a player provides processing and control

mechanism similar to VCR. Playing and capturing audio and video

microphones, cameras, speakers and monitors.

FIG: 3.1 JMF BASIC MODEL

29


30/83

Data source and players are integral parts of JMFs high-level API for

managing the capture, presentation and processing of time-based media.

JMF also provides a lower-level API that supports the seamless integration

of custom processing components and extensions. This layering providesjava developers with an easy-to-use API for incorporating time-based media

into java programs while maintaining the flexibility and extensibility

required to support advanced media application future media technology.

To present time-based media such as audio or video with JMF, player can be

used. Playback can be controlled programmatically, or it can be able to

display a control-panel comp that enables the user to control play-back

interactively. If several media streams want to be play, a separate player is to

be used for each one, to play them in sync, Player object can be used to

control the operation of others.

PLAYER

A player process as an input stream of media data renders it at a precisetime. A Data Source is used to deliver the input media stream to the player.

The rendering destination depends on the type of media being presents.

FIG: 3.2 JMF PLAYER

30


31/83

A player does not provide any control over the processing that it performs

that it performs or how it renders the media data.

PROCESSORS

Processors can also be used to present media data. A processor is just a

specialized type of player that provides control over what processing is

performed on the input media stream. A processor supports all of the same

presentation controls as a player.

In addition to rending media data to presentation devices, a Processor can

output media data through a data source so that it can be presents by anotherplayer or processor, further manipulated by another processor, or delivered

to some other destination such a file.

FIG: 3.3 JMF PROCESSORS

EXTENSIBLITY:

JMF can be extended by Implementing customs plug-ins, media handlers

and data sources. By implementing of the JMF plug-in interfaces, can be

accessed directly and manipulated the media data associated with the

processor:

Implementing the Demultiplexer interface enables to control how

individual tracks are extracted from a multiplexed media stream.

31


32/83

Implementing the code interface enables to perform the processing

required to decode compressed media data, convert media data

from one format to another and encode raw media data into a

compressed for materials.

3.3.3 APACHE TOMCAT SERVER

Apache Tomcat version 6.0 implements the Servlet 2.5 and Java Server

Pages 2.1 specifications from the Java Community Process, and includes

many additional features that make it a useful platform for developing and

deploying web applications and web services.

3.3.3.1 TOMCAT ARCHITECTURE

Tomcat is a container that is made up of pluggable components that fit

together in a nested manner. Tomcat is configurable you can set such

settings to use specialized filters, change port numbers and IP address

bindings, security settings, etc. You should always change the default setting

when using in a production environment especially the security aspects.

3.3.3.2 TOMCAT DIRECTORY OVERVIEW

Directory Files Description

bin bootstrap.jar

commons-

daemon.jar

tomcatuli.jar

startup.bat

This directory hold some of the JAR files

that are required when starting Tomcat, it

also holds the startup files themselves, the

startup.bat used to start the Tomcat as a

daemon process, the catalina.sh can be used

32
http://www.jcp.org/http://www.jcp.org/


33/83

catalina.shon a commandline and to add additional

parameters to change Tomcat when starting.

conf

catalina.policy

contains security policy statements that are

implemented by the Java SecurityManager.

It replaces the java.policy file that comes

with the JVM, it prevents rogue code of JSPs

from executing damaging code that can

affect the container. It is only used once

when Tomcat is launched thus you need to

restart Tomcat if you change this file

catalina.properties

contains a list of Java packages that cannot

be overridden by executable Java code in

servlets or JSPs which could be a security

risk.

context.xml

this file is used by all Web applications, it

explains where the web.xml should be

accessed

logging.properties

this file details the logging within Tomcat,

two default configuration are setup a

ConsoleHandler and a FileHandler, you can

change the logging level using this file.

server.xml

this is the main configuration file in Tomcat,

it is used by the "digester" to build the

container on startup

tomcat-users.xml Used for security to allow access to the

33


34/83

Administration applications section, it is

used with the default UserDatabase Realm as

referenced in server.xml.

web.xml

The default web.xml file that is used by all

Web applications, it sets up the JSPServlet to

allow your applications to handle JSPs and a

default servlet to handle static resources and

HTML files. It also sets up default session

times outs, welcome files and MIME types.

lib number of JAR files

all the JAR files that the container uses are

located here, this includes Tomcat JAR's and

the servlet and JSP application programming

interfaces (API's). Place your own JAR files

here if they will be used across all your Web

applications.

logs number of log files

contains a number of logs files, these are

produced by JULI logging which will be

discussed in a later topic. The logs are

rotated each day, so you may need to clear

them down from time to time.

temp ? used for scratch files and temporary use

webapps Web app files this is were the Web application files reside,

including your own Web applications. This

is were you place your Web Application

aRchive (WAR) file, Tomcat will then

34


35/83

deploy the file. We will get into deploying

Web applications in another topic.

There are several default Web applicationthat come with Tomcat:

ROOT - The welcome screen that you

saw when you first installed Tomcat.

This is a special directory called "/",

this gets removed when you move into

production. From this web you canaccess all the below Web applications

docs - contains the Tomcat

documentation

examples - contains some JSP and

servlet examples

host-manager - allows you to manage

the hosts that run in your application,

use the /host-manager/html URL to

access

manager - allows you to manage your

applications in Tomcat, you can start,

stop, reload, deploy and undeployyour applications. Use /manager/html/

URL to access

work used for temporary working files, it is used

heavy during JSP compilation where the

35


36/83

JSPs are converted to a Java servlet and

accessed through this directory.

3.3.3.3 TOMCAT ARCHITECTURE OVERVIEW

Tomcat 6 consists of a nested hierarchy of components, containers are

components that can contain a collection of other components.

FIG 3.4 ARCHITECTURE OF TOMCAT SERVER

36


37/83

Server

The server is Tomcat, its an instance of the Web application

server, it owns a port that is used to shutdown the server (port

8005). You can setup multiple servers on one node providing

they use different ports. The server is an implementation of the

Serverinterface, it implements the StandardServerobject.

Service

Aservice groups a container (usually an engine) with a set of

Connectors. The service is responsible for accepting requests,

routing them to the specified Web application and specific

resources and then returning the result of the processing of the

request, they are the middle man between the clients web

browser and the container.

Connectors

Connectors connect the applications to clients. They receive the

incoming requests HTTP (port 8080) or AJP (port 8009) by

default from the clients.

The default connector is Coyote which implements HTTP 1.1.

Engine

The engine is the top-level container, it cannot be contained by

another container, thus this is the parent container for all the

containers beneath it. The engine is a request-processing

component that represents the Catalina Server Engine.

It examines the HTTP headers to determine the virtual host or

context to which requests should be passed. An engine may

contain Hosts representing a group of Web applications and

Contexts representing a single Web application i.e. a virtual host

Realm

The realm for an engine manages user authentication and

authorization. Resources uses roles to allow access, the realm

enforces the security polices. A realm applies across the whole

engine, however this can be overridden by using a realm at the

Host level or the Context level, it a object that can be superceded

by its children objects.

Valves are used to intercept a request and preprocess it. They are

similar to filter mechanism of the servlet specifications but are

37


38/83

3.3.3.4 CONNECTOR ARCHITECTURE

All connectors work on the same principle, they have an Apache module

end(mod_jk or mod_proxy) that loads just like any other Apache module.

On the Tomcat end, each Web application instance has a connector module

component written in Java. In Tomcat 6 this is with the

org.apache.catalina.Connectorclass. The constructor takes one of two

connector types, HTTP/1.1 or AJP/1.3. You call the constructor indirectly

via theserver.xmlfile using the connectorandprotocoltags. Depending on

what setup you have, different classes will be used.

Apache

Portable

Runtime

(APR) is

supported

HTTP/1.1:

org.apache.coyote.http11.Http11AprProcotol

AJP/1.3: org.apache.coyote.ajp.AjpAprProtocol

APR is not

supported

HTTP/1.1: org.apache.coyote.http11.Http11Procotol

AJP/1.3: org.apache.jk.server.JkCoyoteHandler

The Web server handles all the static content, but when it comes across

content intended for a servlet container, it passes it to the module in question

(mod_jk, mod_proxy), the web server knows what content to pass to the

Connector module because the directives in the Web servers configuration

specify this.

38


39/83

FIG 3.5 INTERACTION BETWEEN TOMCAT SERVER AND WEB

SERVER

The Apache JServ Protocol (AJP) uses a binary format for transmitting data

between the Web server and Tomcat, a network socket is used for all

communication. The AJP packet consist of a packet header and a payload,below is the structure of the packet.

39


40/83

As you can see, the binary packet starts with the sequence 0X1234, this is

followed by the packet size (2 bytes) and then the actual payload. On the

return path the packets are prefixed by AB (the ASCII codes for A and B),

the size of the packet and then the payload.

The HTTP protocol is exactly as the name implies it uses the HTTP protocol

to exchange messages. You can use HTTPS but you require a SSL certificate

and make a few changes to Tomcat's configuration.

3.3.3.5 LIFECYCLE

Tomcat starts and stops the components in the order that were started, thus

when starting the parent gets started first then the children get started,

stopping is the reserve order. This is done through theLifecycle interface:

LifecycleEventandLifecycleListener.

TheLifecycle interface has two key methods start() and stop(), all major

components usually contain aLifecycleSupportobject that manages all of

theLifecycleListenerobjects for that component, it is this object that

40


41/83

propagates and fires general events. The top-level component calls all of its

child's start() methods, the reverse is true when stopping. This method

allows to to stop/start Host components without affecting any other Hosts.

TheLifecycleListenerinterface can be added at any level in the Tomcat

container that can execute specific code when a particular event is fired. By

default there are three listeners configured at the server level, they are

configured in the server.xml or context.xml file at the specific level.

Configuration

The most important file in Tomcat is the server.xml file, when Tomcat starts

it uses a version of the Apache Commons Digester to read the file, the

digester is a utility that reads XML file and creates Java objects from a set of

rules. With what you have learned above you can see that the rules in the file

follows Tomcat architecture exactly.

3.3.3.6 WORKING WITH TOMCAT SERVER

Apache Tomcat is a famous Servlet container developed at Apache

Software Foundation. This software is released under under the Apache

Software License. Anyone can use it for the development as well as

deployment of the applications. Tomcat is the official reference of

implementation of java Servlets and java Server Pages. Tomcat is very easy

to install and configure. Anyone can learn it very fast and start using the

Tomcat server for the development and deployment of the web applications.

These days many web hosting companies are providing Tomcat support on

their server. So, if you develop the application in Java technology you can

41


42/83

get any host and then deploy it on the internet. Earlier it was a difficult task

to get a good host for hosting.

3.3.3.7 DEPLOYING SERVLETS ON TOMCAT SERVER

To deploy servlets on Tomcat Server, following steps are to be taken for

example given below.

1. Create web application

To develop an application using servlets or jsp, a directory structure is to

be maintained for the example given below.

Step1: Create a web application folder (servlet-examples) under tomcat

webapps directory. The path will be C:\apache tomcat\webapps\servlets-

examples.

Step2: Create a WEB-INF folder which should be created under servlets-

examples.

Step3: Create web.xml file and classes folder under the WEB_INF folder.

2. Compile the servlet Program- Create a servlet program and compile it

on the command Prompt .The procedure is not different from any java

program. The set of classes required for writing servlets is available in

servlet-api.jar which is put into CLASSPATH.

3. Copy the Servlet class(Hello) into classes folder, which is under WEB-

INF folder.

4. Edit web.xml to include servlet's name and url pattern.

42


43/83

Hello

Hello

Hello

/Hello

5. Run Tomcat Server and execute your servlet- To run the server Go

to C:\apache tomcat\bin\startup.bat and double click on it , the server will

start up. After assuring that the server is running successfully, you can run

your servlet. To execute your servlet, open your web browser and type the

url which you have mentioned in your web.xml. The url will be like this:

http://localhost:8080/servlets-examples/Hello

3.3.4. GLASSFISH

3.3.4.1 ABOUT GLASSFISH

The GlassFish open-source application server is based on the Java Platform,

Enterprise Edition (Java EE) reference implementation and is built for

mission-critical enterprise deployments.

43
http://localhost:8080/servlets-examples/Hellohttps://glassfish.dev.java.net/http://localhost:8080/servlets-examples/Hellohttps://glassfish.dev.java.net/


44/83

Sun GlassFish Enterprise Server enables customers to leverage the benefits

of open source with a subscription that provides support, training credits,

limited indemnification and more.

3.3.4.2 BENEFITS OF GLASSFISH

The Sun GlassFish Enterprise Server provides the foundation to develop

and deploy Java EE artifacts, including Web services. It provides value-

added services for management, monitoring, diagnostics, clustering,

transaction management, and high availability of mission-critical

services.

44
http://www.sun.com/service/applicationserversubscriptions/http://www.sun.com/service/applicationserversubscriptions/


45/83

CHAPTER 4

SYSTEM DESIGN

4.1 ARCHITECTURE

FIG 4.1 ARCHITECTURE DIAGRAM FOR THE SYSTEM

when a user uploads a vlog to the database, semantic annotation will run

automatically using vlog text and relevant external resources, and sentiment

evaluation is obtained from vlog comments. After that, the vlog will bestored in the database with the corresponding annotation and evaluation.

When a user submits a query to the search engine, the vlog search module

will access the vlog database to obtain relevant vlogs by saliency-based

matching; then, using user specified ranking strategy and clustering, the

results will be returned to the user in a well-organized manner.

45


46/83

4.2 SEQUENCE DIAGRAM

FIG 4.2 SEQUENCE DIAGRAM FOR THE SYSTEM

A sequence diagram shows an interaction between the system and its

environment, arranged in a time sequence. It shows the objects participating

in the interactions by their lifelines and the messages they exchange,

arranged in a time sequence as shown in figure. The sequence diagram is

very simple and has immediate visual appeal. It is an alternative way tounderstand the overall flow of the control.

46


47/83

4.3 USE CASE DIAGRAM

FIG 4.3 USECASE DIAGRAM FOR THE SYSTEM

The use case diagram shows the relationship between the actors and the use

cases within the system. The clients are the actors who uploads and searchfor the videos in the server. And the results are received by user from the

database.

47


48/83

4.4 ACTIVITY DIAGRAM

FIG 4.4 ACTIVITY DIAGRAM FOR THE SYSTEM

An activity diagram is used to provide a view of flows and what is going on

inside an use case or among several classes. Activity diagram is used to

represent a classs method implementation as shown in figure.

48


49/83

CHAPTER 5

MODULES

5.1 MODULES

Semantic annotation

Content Analysis

Sentiment evaluation

Saliency-based matching and Ranking

5.2 MODULE EXPLANATION:

Semantic annotation

Annotation is the process of extracting informative keywords from text of

the vlog.This is necessary because the words used in the vlog texts arearbitrary and non-standard.When a user uploads a video automatic semantic

vlog annotation is run. The textual content in a vlog mainly comprises of

title, description, and comments, among which the title and description are

closely related to the semantics of the vlog video. So we use the title and

description for annotation process. The title indicates the main topic of the

whole vlog. It is of the greatest importance for understanding the semantics

of the vlog. Therefore, we first extract annotation words from the title and

then we extract annotation words from the body of the textual content, i.e.,

the vlogs description. It is then stored in the database.

49


50/83

Content Analysis

In the content analysis the video is split into a number frames and stored in

the database in blob form. Each frame consists of objects. Different object

from the each frame is detected and stored in the database. Thus no twoframe will have same object and this makes the searching process efficient.

The process of frame splitting is done with the help of Framesplitter class.

Sentiment evaluation

The main purpose of the sentiment evaluation process is to extract

annotation words from the users comments. The users can decide whether a

vlog is worth viewing based on the existing comments. Traditional

annotation models focus solely on the semantic aspect while the sentiment

aspect is totally neglected. For opinioned texts such as comments in vlog

,extra informations can be obtained through sentiment analysis. The users

comments will be predominantly text based. Sentiment evaluation is used to

obtain overall evaluation of the vlog.

Saliency-based matching and Ranking

We propose a novel saliency-based similarity matching approach for vlog

search. This is done with the help of canny edge detection algorithm.

Saliencies are nothing but the edges of an image.Even if two images are

same the saliencies will not be the same.Thus when an image is given as

input query ,this algorithm aids is resulting the exact video. After the

relevant vlogs are obtained using saliency based matching different rankingstrategies are adopted. Finally the ranked vlogs are clustered according to

the category information to further facilitate users browsing.

50


51/83

5.3 DATA FLOW DIAGRAMS:

Level 0:

FIG 5.3.1 LEVEL 0 DFD FOR THE SYSTEM

This is the Level 0 DFD for the system.When the user visits the login page

,the server checks with the database. If he is a registered user then he can

access the home page if not he has to register to access the videos.

51

Data

Base

Home PageRegistration

Server

User

Login


52/83

Level 1:


This is the Level 1 DFD for the system. When a user uploads a video

automatic semantic vlog annotation is run. Content analysis is performed

where the video is split into a number frames and stored in the database in

blob form.

52

User Upload Server

Data base

DataBase

Annotation

process

Content

Analysis

Stored in


53/83

Level 2:


This is the Level 2 DFD for the system. When the user sends a request for

searching the video,The server checks in the database. Finally the results are

given to the user in a ranking manner.

53

User Search

Server

Data

BaseResults

Sends Request to


54/83

CHAPTER 6

TABLES

6.1 USERLOGIN

FIELD NAME DATA TYPE

Username Varchar 2

Password Varchar2

6.2 VIDEOUPLOAD

FIELD NAME DATA TYPE

Videoname Varchar2

Title Varchar2

Description Varchar2Comments Varchar2

Count Number

Path Varchar2

Video blob

54


55/83

CHAPTER 7

TESTING

7.1 SOURCE CODE TESTING

This examines the logic of the system. If we are getting the output that is

required by the user, then we can say that the logic is perfect.

7.2 MODULE LEVEL TESTING

In this the error will be found in each individual module, it encourages the

programmer to find and rectify the error without affecting other modules.7.3 UNIT TESTING

Unit testing is conducted to verify the functional performance of each

modular component of the software. Unit testing focuses on the smallest unit

of the software design (i.e.), the module.

7.4 INTEGRATION TESTING

Integration testing is a systematic technique for constructing the program

structure while at the same time conducting tests to uncover errors

associated with. Individual modules, which are highly prone to interface

errors, should not be assumed to work instantly when we put them together.

The problem of course, is putting them together- interfacing. There may

be the chances of data lost across on anothers sub functions, whencombined may not produce the desired major function; individually

acceptable impression may be magnified to unacceptable levels; global data

structures can present problems.

55


56/83

7.5 FUNCTIONAL TEST

Functional test cases involved exercising the code with nominal input values

for which the expected results are known, as well as boundary values and

special values, such as logically related inputs, files of identical elements,

and empty files.

Three types of tests in Functional test:

Performance Test

Stress Test

Structure Test

7.5.1 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit,

program throughput, and response time and device utilization by the

program unit.

7.5.2 STRESS TEST

Stress Test is those test designed to intentionally break the unit. A Great

deal can be learned about the strength and limitations of a program by

examining the manner in which a programmer in which a program unit

breaks.

7.5.3 STRUCTURE TEST

Structure Tests are concerned with exercising the internal logic of a program

and traversing particular execution paths. The way in which White-Box test

strategy was employed to ensure that the test cases could Guarantee that all

56


57/83

independent paths within a module have been have been exercised at least

once.

Exercise all logical decisions on their true or false sides.

Execute all loops at their boundaries and within their

operational bounds.

Exercise internal data structures to assure their validity.

Checking attributes for their correctness.

Handling end of file condition, I/O errors, buffer problems and

textual errors in output information.

7.6 WHITE BOX TESTING

This testing is also called as Glass box testing. In this testing, by knowing

the specific functions that a product has been design to perform test can be

conducted that demonstrate each function is fully operational at the same

time searching for errors in each function. It is a test case design method that

uses the control structure of the procedural design to derive test cases. Basis

path testing is a white box testing.

7.7 BLACK BOX TESTING

In this testing by knowing the internal operation of a product, test can be

conducted to ensure that all gears mesh, that is the internal operation

performs according to specification and all internal components have been

adequately exercised. It fundamentally focuses on the functionalrequirements of the software.

57


58/83

7.8 USER ACCEPTANCE TESTING

User acceptance of the system is key factor for the success of any system.

The system under consideration is tested for user acceptance by constantly

keeping in touch with prospective system and user at the time of developing

and making changes whenever required. This is done in regarding to the

following points.

Input screen design.

Output screen design.

58


59/83

CHAPTER 8

C

ONCLUSIONAND FUTURE ENHANCEMENT

8.1 CONCLUSION:

In a vlogs annotation, we extract informative keywords not only from the

textual content of the target vlog itself but also from external recourses

which are semantically and visually relevant to it; besides semantic

annotation, we obtain sentiment evaluation from comments as guidance for

vlog browsing In the user-oriented vlog search, we adopt saliency-basedmatching to make the search results more agreeable to users; we use

different ranking strategies are adopted according to the users specific

interest.

8.2 FUTURE ENHANCEMENTS

The proposed system results in a video given a frame of the video as input.It

can further be enhanced by giving any object in the frame as input resulting

in the exact video.

59


60/83

APPENDIX 1

SCREEN SHOTS

This is the home page for video blog where the user must signup to access

the videos.

60


61/83

This is the members login page.

61


62/83

This is the video upload page where the user uploads the video by giving

title, description and comment. While uploading the video frames will be

splitted and stored in database after automatic semantic annotation.

62


63/83

This is the page for searching the video. The user browses the image and

search for the video.

63


64/83

This the page for changing the password. If the user wishes to change his

password he can do so by providing a new password.

64


65/83

APPENDIX 2

SAMPLE CODING

MONTY TAG:

import java.util.*;import montytagger.JMontyTagger;public class montytag{

JMontyTagger mon=new JMontyTagger();public montytag(){}

public static void main(String[] args){

//new montytag("this video is super");String str="sachin&dravid";//new montytag(str);

}public Vector method(String strr){

Vector vv=new Vector();try{

String sr=mon.Tag(strr);Vector v=new Vector();StringTokenizer st=new StringTokenizer(sr);while(st.hasMoreElements()){

v.add(st.nextElement());}System.out.println(v);

for(int i=0;i


66/83

String t2=stt.nextToken();vv.add(t1);

}}System.out.println("Final vecu"+vv);

}catch (Exception e){

e.printStackTrace();}return vv;

}

}

VIDEO UPLOAD:import java.io.*;import java.sql.*;import javax.servlet.ServletException;import javax.servlet.http.HttpServlet;import javax.servlet.http.HttpServletRequest;import javax.servlet.http.HttpServletResponse;import java.util.*;

/*** servlet implementation class upload

*/public class upload extends HttpServlet {private static final long serialVersionUID = 1L;public Connection conn;public montytag mm=new montytag();Video_FrameSplitter vfs;

public void doPost(HttpServletRequest request,HttpServletResponse

response) throws ServletException, IOException{

try {

vfs=new Video_FrameSplitter();Properties p=new Properties();

//System.out.println(request.getRealPath("/"));FileInputStream fis=new FileInputStream("C:/Program

Files/Apache Software Foundation/Tomcat 6.0/webapps/VideoBlogfull/src/Database.properties");

p.load(fis);

66


67/83

String system=p.getProperty("system");String username=p.getProperty("username");String password=p.getProperty("password");String video=request.getParameter("Browse");System.out.println("video path "+video);

vfs.Video_SplitterMethod("d:/f.avi");/*String title=request.getParameter("Title");Vector vt = mm.method(title);String stit=vt.get(0).toString();String desc=request.getParameter("Description");Vector vd = mm.method(title);String sdes=vd.get(0).toString();String comm=request.getParameter("Comment");Vector vc = mm.method(title);String scomm=vc.get(0).toString();DriverManager.registerDriver( new

oracle.jdbc.driver.OracleDriver());conn =DriverManager.getConnection("jdbc:oracle:thin:@"+system+"",username,password);

//System.out.println("in database");*/} catch (Exception e) {

e.printStackTrace();}

}}

DBSTORE

import java.io.ByteArrayOutputStream;import java.io.File;import java.io.FileInputStream;import java.io.InputStream;import java.io.ObjectOutputStream;import java.sql.Connection;import java.sql.DriverManager;import java.sql.PreparedStatement;

import java.sql.Statement;import java.util.Vector;

public class DBstore {

67


68/83

public DBstore( String video,Vector vec,String tit,String des,Stringcomm)

{try{

DriverManager.registerDriver( neworacle.jdbc.driver.OracleDriver() );Connection conn =

DriverManager.getConnection( "jdbc:oracle:thin:@ramarathinam","system","redhat");

System.out.println(video+vec+tit+des+comm);System.exit(0);Statement stmt = conn .createStatement();

ByteArrayOutputStream baos = new ByteArrayOutputStream();

ObjectOutputStream objOstream = newObjectOutputStream(baos);objOstream.writeObject(vec);objOstream.flush();objOstream.close();

byte[] bArray = baos.toByteArray();

System.out.println("*** bArray = " + bArray);

PreparedStatement objStatement = conn.prepareStatement("insert into samp(video,frameobj) values (?,?)");

File newfile=new File("d:/f.avi");String filename="d:/f.avi";String finame=newfile.getName();

System.out.println(" Video File NAme & Path ::::::::::: "+filename);

InputStream fis=new FileInputStream(filename);System.out.println(" Video File Length :

"+newfile.length());

System.out.println(" File InputStream "+fis.available());

objStatement.setBinaryStream(1,fis,(int)newfile.length());objStatement.setBytes(2, bArray);

68


69/83

objStatement.execute();System.out.println("stored");}

catch(Exception e){e.printStackTrace();}}}

SEARCH IMAGE:

import java.io.*;import javax.servlet.*;import javax.servlet.http.HttpServlet;import javax.servlet.http.HttpServletRequest;import javax.servlet.http.HttpServletResponse;import java.sql.*;import java.util.*;import java.awt.image.BufferedImage;import javax.imageio.ImageIO;import javax.servlet.http.HttpSession;

public class Searchimg extends HttpServlet{public Connection conn;HttpSession hs;TreeMap tm1 ;TreeMap tm2 ;TreeMap tm3 ;TreeMap tm4 ;TreeMap tm5 ;TreeMap tm6;Vector v2 ;

Vector v3 ;Vector fp ;Vector trueval;Vector resv;

ConPixel cp=new ConPixel(); public void doPost(HttpServletRequest request,HttpServletResponse

response)throws ServletException, IOException{

try{

tm1=new TreeMap();tm2=new TreeMa