Retrieving video features for language acquisition

Expert Systems with Applications 36 (2009) 5673–5683

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Retrieving video features for language acquisition

Yu-Lin Jeng, Kun-Te Wang, Yueh-Min Huang *

Department of Engineering Science, National Cheng Kung University, Taiwan, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan, ROC

a r t i c l e i n f o

Keywords:Learning English
Video concordanceEnglish-as-a-Foreign-LanguageInformation retrievalRanking-based analysisWord collocations
0957-4174/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.eswa.2008.06.117

* Corresponding author. Tel.: +886 6 2757575x633E-mail addresses: [email protected] (Y.-L. Jeng

Wang), [email protected] (Y.-M. Huang).

a b s t r a c t

Using clips from contemporary films and videos is an alternative approach for students of English-as-a-Foreign-Language that can support their acquisition of the language in a real world context. Compact,attractive and easy to use, our dynamic video retrieval system (DVRS) gives students quick access toresources that can facilitate their learning. In this study, we integrate an innovative learning-assisted sys-tem, named the dynamic video retrievals system, which allows students of English-as-a-Foreign-Lan-guage to use information retrieval techniques which examine video scripts for specific wordcollocations and subsequently utilize a ranking-based approach to analyze the collocations discovered.Such computer-mediated interaction enables students in traditionally structured English classes to findengaging, real life examples of grammar and vocabulary in use, giving them opportunities to strengthentheir language skills in a culturally relevant way.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Learning is a cognitive process that involves conscious and ac-tive behavior. Students look for similarities and differences be-tween new information and prior knowledge, and in this way areable to effectively assimilate new learning into existing cognitivestructures (Piaget, 1980). Many English-as-a-Foreign-Language(EFL) students, however, do not have a framework of experiencewith English into which new vocabulary or grammar can be easilyorganized and later retrieved. Moreover, print-based EFL materials,such as books, magazines and news articles, are simplified in orderto make the text understandable, but in the process of simplifica-tion, stripped of the experiential and contextual components nec-essary for learning beyond the literal level. The importance ofbeing able to see beyond the literal level can be illustrated witha cursory look at some common English idioms. Consider for exam-ple sentences like ‘‘John let the cat out of the bag” which describesa situation involving neither cat nor bag, or ‘‘Jill hit the roof whenshe heard about it” which Jill could have done even if she weremiles away from any type of structure or shelter. Even most nativeEnglish speakers would be hard pressed to explain why the phrase‘‘to let the cat out of the bag” means to give away a secret, yet allwould instantly recognizing what the phrase means, having heardit used in context on many occasions. Clearly meaning is often as

ll rights reserved.

36; fax: +886 6 2766549.), [email protected] (K.-T.

dependent upon connotation and context as it is upon denotationand definition.

Many EFL students in Taiwan feel a measure of frustrationwith their courses since the traditional approach to teachingEFL is to begin each lesson with a vocabulary list, followed by alist of phrases, and finally, by a piece of text to be read, translatedor studied. Vocabulary is taught first by sight recognition, fol-lowed by sound recognition, then definition and finally usage.Print-based materials are very limited in their ability to fully de-scribe ways in which words are used, simply because examplesand explanations of contextual and cultural usage take up a greatdeal of expensive space on a page. What students need is a con-venient, user-friendly way for them to see and hear how newwords and phrases are used in real life situations. To meet thisneed, we propose a dynamic video retrieval system (DVRS) tosupport students via online learning. Guided by student’s queries,video clips are retrieved to match specific language samples thestudent wishes to learn or study. The DVRS first constructs aterm–term collocation map and then expands the semantics ofthe query by an ontological structure to assist in matching withrelevant video clips. The queried terms can be analyzed furtherto find relationships to other words and phrases in order to pro-vide the fullest possible demonstration of their use in the cine-matographic context.

The rest of this paper is organized as follows: Section 2 de-scribes the development of hypermedia system and content-basedanalysis. Section 3 proposes our approach. Section 4 specifies theframework of DVRS and the interface implemented in an e-learn-ing setting. In Section 5, we present the experiments and show

mailto:[email protected]



http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

5674 Y.-L. Jeng et al. / Expert Systems with Applications 36 (2009) 5673–5683

the validity and efficiency of the system. Finally, conclusions arediscussed in Section 6.

Table 1An example of concordance using single words

1 . . .ect to disciplined procedures that check and recheck against error. In2 . . .it difficult, but he held him in check. And Anthony was busy most of the3 . . .ation was simple: ‘‘I made out the check and carried it around a few days4 . . .wife. Or the frequent need to check and discipline himself to the wise5 . . .ng the matter as he called for his check. As he went out he told Freddie

Table 2An example of concordance finding phrases and idioms

1 . . .in a large-enough suitcase. (Check on the Payne luggage.) She might n2 . . .can in your radiators. @ Now, check for leaks in your hoses and hose3 . . .In all, the Senate signed a check for $46.7 billion, which not only4 . . .is revealed in Adams’ failure to check the accuracy and authenticity of5 . . .FACILITIES IN THE AREA.-_ Check on the schools in the area,6 . . .If I could call in, they could check the story while we were on our way

2. Literature review

This paper is based on the capability of video to retrieveinformation for language learning by EFL students. The followingsection first of all introduces video content search from a techno-logical perspective, then describes the design of term Concordancefor use as a learning assistant tool to teach or tutor within an EFLclassroom.

2.1. The analysis of the content-based system

Due to the rapid development of media technology, large num-bers of videos now are stored in digital format. Predictably, withthe sheer volume of digital media available via the Internet, videoshave been increasingly been used to serve the needs of EFL learners(Chen, Huang, & Chu, 2005; Huang, Chen, Huang, Jeng, & Kuo,2008; Jeng, Huang, Kuo, Chen, & Chu, 2005). However, while videosare obviously rich sources of linguistic examples, finding the mostuseful clips with specific vocabulary or grammatical structures canbe difficult. Consider for example, the term ‘‘heading for”, which isa common enough expression in English, occurring in many films.It could be beneficial for a student to hear and see this phrase usedin context, however at this point in time there is no systematic wayto find examples in movies. An indexing and retrieving system ofvideo contents could therefore play an important supportive rolein the use of multimedia technology (Burke & Kass, 1995; Valder-rama, Ocana, & Sheremetov, 2005).

Researchers have proposed some systems to support user’srequirements. First of all, Lee, Smeaton, O’Connor, and Smyth(2006) utilized metadata which includes the characteristics of aset of the key-frames, such as video title, actors, running time, vi-deo format, reviews by users, user ratings, and so on, allowingusers to search video files according to those characteristics. Thismethod uses query terms to match metadata terms and locatethe sought after segments of video, but is not specifically tailoredto a user’s linguistic needs (Srihari, Zhang, & Rao, 2000). Whilethe video may meet user’s needs, there remains the possibility ofa mismatch between the segments located and the semantic fea-tures of user’s query. The second method is using text as a basisfor video searching (Smeaton, 2006). This approach employs auto-matic speech recognition (ASR) and can both recognize the dia-logue in each scene and store its interpreted oral text in a file.The approach searches the video and its oral text when a user re-quests certain video clips. This approach points can identify de-sired movie segments but the ASR method is time consumingand often imprecise. The third approach for analyzing video con-tents is used to key-frame matching (Kim, Kim, & Cho, 2008). Usingthe image pattern recognition approach requires a user to locate aset of images or key-frames that contain what they are searchingfor. This can be successful, if the student is looking for specific vi-sual content, but less useful linguistically, and can also be quitetime consuming. The forth method uses semantic features (Cheng,Chen, Chang, & Chou, 2007; Jin, Bae, Ro, Kim, & Kim, 2006; Naph-ade et al., 2006) to support original video retrieval. Some of the re-search topics employ automatic identification to find somesemantic features of video shots. However, there are problemswith accuracy and time costs associated with this method.

Through all of the above approaches are successful to some ex-tent, they all have shortcomings when it comes to analyzing thecontents of videos. Therefore we propose an efficient method toenhance the script-based analysis of video contents, the detailedinformation of which is depicted on the Section 3.

2.2. Concordancer

From a computer-assisted language learning (CALL) perspective,with the explosive growth of information computing, languageteaching has changed with improvements in the presentation,reinforcement and assessment of material to be learned, and inclu-sions of substantial interactive elements. As an application, a con-cordancer is a learning assistance tool used to support the teachingand learning of the language. In essence, a concordance is a textanalysis and software package incorporating a corpus of examplesapproved usage. Therefore, a concordancer is a resource for anyonewho needs to study texts closely or analyze language in depth, andcan lead to enhanced outcomes in assignments or language devel-opment. The Brown Corpus is a popular corpus established atBrown University in 1963–1964. It contains 500 samples, eachapproximately 2000 words long, for a total of approximately1,014,000 running words of text, published with the intent of pro-viding a broadly representative of American English writing. Theapplication main work includes indexing the words and generatingconcordance from a large-scale corpus, such as

(a) Analyzing word frequencies, for example:
� breed: 26 frequencies� breeding: 8 frequencies� breeder: 2 frequencies
(b) Comparing different usages of the same word, for example(see Table 1).

(c) Finding phrases and idioms, for example (see Table 2).

Nowadays, concordances on the web provide a convenient wayto support the English learning like, with resources such as

� Web concordance (http://www.edict.com.hk/concordance).� Shoebox (http://www.sil.org/computing/shoebox).

They have typically been used to investigate large-scale corporaas text searches of the Bible, or popular literary work like Shake-speare, to help learners build linguistic patterns and study interlin-ear texts.

Although they are generally successful for studies in the field oflinguistics, concordances are limited in their ability to support EFLstudents for learning English. According to Dual-Coding Theory(Paivio, 1986), it is when verbal and image system are connectedand related that students can most easily acquire and retain newlanguage structures. Therefore, by employing video clips, it maybe possible to enhance the internal dialogue needed to help

http://www.edict.com.hk/concordance

http://www.sil.org/computing/shoebox

Y.-L. Jeng et al. / Expert Systems with Applications 36 (2009) 5673–5683 5675

students make connections between academic language eventsand their actual communication needs, overcoming such problemsas poor comprehension, limited vocabulary, slow reading, badgrammar, and nonexistent conversational skills. The videos uselanguage in ways that closely approximates language use in reallife, thereby demonstrating practical applications of EFL coursecontent (Davis, 1998).

Secules, Herron, and Tomasello (1992) studied the use of filmsin class. Students were organized into two groups with the exper-imental group learning with videos, and the control group learningvia direct instruction, with no videos provided. The results showedthe listening skills of the experimental group improved more thanthose of the control group. The vocabulary and grammar skillswere also superior to those of students in the control group.Weyers (1999) also reveals the superiority of using authentic con-tents to support the listening of students. The indexing capabilityof a concordance is a student-centered approach; whereby John(1991) observed that it is closely connected with the developmentof corpus-based learning and has provided evidence that concor-dances facilitate the use of authentic language. The concordancehas become more accepted by researchers and language teachers.By virtue of its capability and development, some researchers(Lee & Liou, 2003; Stevens, 1995; Sun & Wang, 2003) have pro-vided evidence that concordances facilitate the use of authenticlanguage, give students more opportunities for efficient listeningand watching of authentic materials.

3. Dynamic video retrieval system (DVRS)

In the EFL environment, if exploration of video clips could bealigned with course goals, the acquisition of student’s languagecould become more efficient and ultimately more successful. Thescript-retrieval method allows teachers and students to tailor theprocess of mining video contents to specific needs, enhancing theirvalue in the learning process. We utilize script files as the source ofreferenced semantic parts of selected videos, since the DVRS canreveal all of relevant video clips guided by a user’s query. Our infor-mation retrieval (IR) model counts the frequency of each word inthe script and becomes a weighting schema. Then, to achieve bet-ter results, expanded query terms are considered to reduce anysemantic gaps in the original user’s query. Furthermore, the queryterms are compared to determine their degree of similarity in or-der to ensure the selection of the relevant scripts. Finally, we usea semi-automatic annotation method to accommodate any short-

Fig. 1. The Dynamic Video Retrieving System including a semi-

comings in automated retrieval video model. Thus appropriate vi-deo clips can be accessed with a high degree of precision, resultingin the exposure of the better examples to learners.

The dynamic video retrieval system (DVRS) is depicted as Fig. 1.We implement two services called the videoing indexing serviceand the web concordance service, for teachers and students onthe web-based learning system. The two processes provide the listof videos and their scripts, as well as retrieve the segments of rel-evant videos students need based on the user’s query terms. Thesemi-automatic annotation process also provides teachers withconvenient video annotation operations. Detailed illustrations areprovided below.

3.1. A semi-automatic annotation process

In this process, teachers first upload the scripts and their corre-sponding video clips. There is a function that can flag the relation-ships between the uploaded videos and the time duration of eachscript. This function also parses all of words within each scriptand creates a term–term matrix. Next, the teachers’ operations effi-ciently display the videos selected and then tag the semantic infor-mation into the targeted segments of each script as metadata.Using the metadata, the process quickly accesses the appropriateclips containing the target linguistic samples. Beside, increasingthe degree of precision in the selection of materials, the semi-auto-matic annotation interface provides four kinds of the informationfor teachers to input. The information is defined as two types:script, object, event and semantic meaning which are describedin Section 4.2. The metadata of each segment is stored in the videocorpus which is a content management system built into web con-cordance services.

3.2. Video retrieving process

Based on the four types of information, the video retrieval pro-cess accesses the vocabulary of each script file and finds relevantterms from each script using the scripts–term matrix. Furthermore,a collocation map using a Bayesian network (BN) approach isgraphed as a way of exploring the relationships among variousterms. With the collocation map, we add the expansion terms tothe original user term and the degree of precision of the semanticof user’s query is enhanced. Additionally new word collocationswould be learnt as the target words are combined into commonphrases to help for the benefit of EFL students. We also propose a

automatic annotations process and the Web concordance.


mutual information (MI) method to integrate the analysis of the vi-deo retrieval process. Usually this method is called a concordancethat is for students who need to study words and phrases in depth.With concordance analysis, students can make indexes and wordlists, determine word frequencies and find idioms from the searchresults.

4. The methodology of a four-phase designation

A four-phase design was employed to identify and select rele-vant video clips, including (a) information retrieval model, (b)semantic annotation, (c) expansion terms, and (d) mutual informa-tion, to carry out this study. One hundred and twenty one contem-porary films, containing 140,586 scripts were used as resourcematerials to support vocabulary learning, collocational words anal-ysis, and listening comprehension. To address the issue of volumeof resources, we have presented a framework to develop the DVRSas shown in Fig. 2. Overall, the methodology indicated that web-based exposure of English to EFL students had a positive impacton their learning.

4.1. Information retrieval model

In following, we give the definitions of the data for searchingand selecting relevant video scripts. We use a poplar technique,information retrieval model (Ricardo & Berthier, 1999) to imple-ment this model. First of all, a statistical model, a script-term ma-trix, is built with a set of scripts in each video. The definitions aredescribed below:

Definition 1 (Scripts set). D = {d1,d2, ... ,dn}, where i > 0, "di 2 D, isrepresented as a set of scripts, di is ith script in a film.

Definition 2 (Original query). Q ¼ fq1; q2; . . . ; qig wherei > 0; 8qi 2 Q , is represented as a set of user’s desired query terms.

Definition 3 (Vocabulary set). K = {k1,k2, . . .,ki} where i > 0, " ki 2 K,is represented as a finite set of important vocabulary in the col-lected scripts.

Definition 4 (Weight of vocabulary). Assume freq is representedas frequency of a certain vocabulary in scripts; freqij is the raw fre-quency of vocabulary ki in the script dj. Then, let max{freqij} is themaximum vocabulary computed over all vocabularies which are

Fig. 2. The methodology of the dynam

mentioned in script dj. Therefore, normalized vocabulary frequencyis representing as tfij ¼

freqij

maxffreqijg(Ricardo & Berthier, 1999). Besides,

according to IR model, vocabulary appears in many different scriptsand less indicative of overall topic. Therefore, idfij = inverse documentfrequency of term i in the script dj and is representing as idfij ¼ log N

ni.

The way of determining vocabulary weight can be computed as thefollowing equation:

wij ¼ tfij � idfi ð1Þ

where wij is the ith dimensional measure of dj in a t-dimensionalspace.

Definition 5 (Script vector). The weights of each user query qi for ascript dj form a t-dimensional vector space, and can be expressedas dj

!¼ ðw1j;w2j; . . . ;wtjÞ. The real-valued weight wij has given by

Eq. (1) and presents the important degree between vocabulary ki

in script set D. and a particular script document dj.

Definition 6 (Candidate script set). Through Definition 2, we canfind out the corresponding script Set D with each query vocabularyand have a weight xi to consider the related degree of the scripts,

q1 ¼ fx1d1;x2d2; . . . ;xmdmg;

..

.

qn ¼ fx1d1; . . . ;xidig:

Therefore, the candidate scripts have been redefined as a set ofcandidate scripts D0 = {d1,d2, . . .,di, . . .,dm}, "dk 2 D.

Definition 7 (Candidate vocabulary set). We take all vocabulary ki

in D0 to be a thesaurus. Forming a candidate vocabulary set isdefined as K0 = {k1,k2, . . .,kn}, where K0 2 K. For reasons of efficiency,the candidate vocabulary set is less than the initial vocabulary setK, but we assert that the candidate script set is related to the set K0

which still contains the information the user needs. Therefore, theproposed thesaurus adds to the resources provided by the vocabu-lary expansion model to generate a related vocabulary map.

4.2. Semantic annotation

In order to expand the user’s original query for semantic mean-ing in a video clip, we have developed an annotation approach thatallows teachers to tag each video clip from an educational perspec-

ic video retrieval system (DVRS).

Video

Segment

Script

Annotation Terms

Tag Event Topic

Fig. 3. Definition of four-type information structure in a video.

Fig. 4. A simple collocation map of vocabulary.


tive, as shown in Fig. 3. The annotated interface is called a semi-automatic annotation module. Such a module allows teachers toidentify many segments in a video to meet a variety of studentneeds, identifying these according to the four types of information:scripts, tags, events and semantics. The four-type information isdetailed described below:

� Script: A video clip includes a series of scripts. Teachers candetermine the start and end points of each dialogue clip. Thescripts in a clip are annotated as a teachable content of learninglanguage and stored in the video corpus.

� Tag: Tags are a set of related keywords used on a segmentedvideo. Thus the tags generally annotate the clip according totopic. For example. If the segment is about major league base-ball, the segment can be tagged by these terms: ‘‘sport”, ‘‘match”and ‘‘baseball”.

� Event: An event is a depiction of what happens in a video clip. Ifthe clip is about a basketball game, the segment can be anno-tated by the term ‘‘New York vs Red Sox”. The event depictionannotation helps the system to quickly find and access to therelevant video clips selected by the teacher.

� Topic: Topic refers to a set of semantic terms in the dialogue of avideo clip. For example, consider a major league baseball clip ofa commentator talking about the major competitor. This topiccould be quite complex and it could be difficult to accuratelydescribe all its implications with one label, and so could betagged with terms such as ‘‘exciting game”, ‘‘powerful competi-tor”. Teachers specify the terms as an entity so that semanticterm matching can be naturally incorporated into the retrievalmethod based on a semantic similarity function of terms com-puted on the script set.

Through such an extension, the searching process and the resultcan be more efficient and more precision when using a common-purposed keyword searching query. With teachers’ annotations,the searching results of several video clips can present that the re-sponded clips contain more meaningful examples for the demon-stration of user query. Thus with the combination of semanticannotation, DVRS can provide several video clips from alternativeresults and improve the precision degree of user query.

4.3. Vocabulary expansion

In this work, vocabulary expansion aims to close the gapbetween the original terms selected by a user and the desiredinformation. Generally, users employ just a few words to definetheir search and their queries are often neither precise enough,

adequately informed, and may not contain enough semanticallyrelevant words to select an appropriate video script. Therefore,the use of vocabulary expansion is as an important role that helpsthe finding that is useful to access to relevant video clips. As the re-gards, many studies have proposed several approaches such as La-tent Semantic Indexing (Deerwester, Dumais, Furnas, Landauer, &Harshman, 1990), Latent Sirichlet Allocation (Blei, Ng, & Jordan,2003), Similarity Thesauri (Qiu & Frei, 1993), and PhraseFinder(Jing & Croft, 1994) for improving the selection of expansion terms.We also adopt the efficient of these methods, Bayesian inferencenetwork (Han & Choi, 1993), to address the short query and wordmismatching problems.

First of all, a correlation map is indicated as an application mod-el of inference network to encode the statistical behavior of termson a given scripts collection, and to generate a graph that repre-sented a related vocabulary map. Thus, the weights of the arcs(see Fig. 4) are determined by Bayesian conditional probability(Han & Choi, 1993) as the following equation:

pij ¼ PðSjjSiÞ ¼frequencyði; jÞfrequencyðiÞ : ð2Þ

In the correlation map, a node is expressed as a vocabulary com-puted by tf-idf model based on the script set D0. Furthermore, wehave formed a smaller set of vocabulary which employed the vocab-ulary-similarity measure function vocabulary_prob(a,b) that pro-vided a weighting degree between vocabulary a and vocabulary bas shown in the following equation:

Vocabulary probða; bÞ , PðajbÞ þ PðbjaÞP e; ð3Þ

where e is a minimum threshold value of the conditional probabil-ity between the vocabulary a and b. For example, Using Eq. (3), weuse the probability computing from a vocabulary to other vocabu-laries, such as the vocabulary is ‘‘kill”, which the probability is 0.7to appear together with the vocabulary ‘‘police”. The ‘‘police” thuswill be added to an expanded vocabulary. Selecting these additionalvocabularies by a Bayesian inference network can be considered asan efficient model and thus we not accumulate the weights ofvocabulary set and having allocated enough time to update a vocab-ulary correlation matrix. Therefore, query expansion reveals the re-lated vocabularies in less computing time thus improvingperformance, particularly in the area of recall.

Accordingly, we can use Eq. (3) to present the vocabularyexpansion process. A expansion query set QE is represented asQE = {d1,d2, . . .,dm}, where m is number of script file, " Dj e D. Weare to combine the script set QE into the original query Q forimproving the recall rate. Thus we can denote the relevant queryset Q0 as shown in the following equation:

Q 0 ¼ initial query Q þ expansion query Q E

¼ fx01k1 þx02k2 þ � � � þx0lkl þ l0e1ke1 þ l0e2ke2 þ � � �þ l0emkemg: ð4Þ

Fig. 5. An inference network with query, term, query expansion and script file set.


According to Eq. (4), the original query Q and the expandedquery QE have different weight for the important degree of eachquery vocabulary respectively. The weight value x0 of initial queryQ depends on the user’s information need, that is, user assigns thex0 value depending on the degree of importance of each queryvocabulary, defaulting it to one. Besides, the l0 value depends onthe weighting score of each expanded vocabulary by Eq. (3).

By employing a scoring function of Eq. (5), we can compute thesum of the weights between candidate script set D0 and the rele-vant query set Q0, and then rank the retrieved scripts in the orderof presumed relevance as shown in Fig. 5.

The scoring function S ¼Xn

i¼1;j¼1

w0i � l0j: ð5Þ

4.4. Analyzing Word Collocations

In this section, the work is mainly to enable second languageteachers and students to find word patterns and phrases in videoscripts. A chief advantage of using analyzing words pattern is thatthe student can glean information about actual language in use, inreal to life video contexts, rather than relying on grammar rules.Besides, finding the examples of a single word within the scriptsis rarely sufficient because of the difference of the lexicographicdenotation of a word and its collocational significance. Therefore,we use a mutual information (MI) equation to look for words col-location which is relevant to the vocabularies searched by a partic-ular user. The above description is suggested in the linguisticcontext which is proposed firstly by Church and Hanks (1989).The definition of mutual information theory (Hamming, 1986;

Fig. 6. The example of an

Sproat & Shih, 1990) is described as Eq. (6). MI provides a measureof the degree of collocation of a given vocabulary with others (aswith ‘‘take off”). Assume there are two independent vocabulariesa and b. The probability of these two vocabularies are p(a) andp(b), respectively. And the co-occur probability of these two vocab-ularies are denoted as p(a,b), then the mutual information (MI) ofp(a,b) is defined as the following equation:

MI ¼ log2Pða; bÞ

PðaÞPðbÞ

� �: ð6Þ

Applying MI into the analysis of word collocations we can ob-tain the collocative words which usually co-occur with the userquery. The following explanation presents the detail operation pro-cess. Firstly, there is a given script set S = (s1, s2, . . ., st) through theresults of the former phase. Each s includes its vocabularies ass = (k1, k2, . . .,kn). Let ki is the ith keyword. To search its collocativewords, we construct a relationship between the words and the dis-tance of words position among the scripts, and its examples forshowing in Fig. 6.

From Fig. 6, the table indicates an example of words patternanalysis. As previously mentioned, the query results from DVRSare collections of video scripts which are listed as rows in the tableof Fig. 6. The query term in a script would be placed at the columnnamed ‘‘L0”. Thus, L0 means the location of the query term in ascript. Therefore, L � 1 and L + 1 mean the word’s location in ascript from the query term is 1, and so on. The first column belowthe deep-blue row contains the separate word of the query resultscripts. The separate word corresponds to the column ‘‘L � 1”means the occurring frequency of the separate word in the dis-tance of 1. Accordingly, in MI equation, the word which is distantfrom the query word check is more irrelevant to the query word.Thus, the weighting distance x can be calculated byx ¼ ð1� jdjþb

maxflgÞ, where l is total number of the words of a videoscript, d is the distance between vocabulary and the search word,and b is a fixed amount to avoid x going to zero. After the weight-ing distance is calculated, the probability can be computed by theequation pðxÞ ¼ fx

N, where x is the vocabulary and N is the longestlength of the video script. Finally, the mutual information valueis calculated by the equation MIy ¼

Pni¼1xi;y logð pðx;yÞ

pðxÞpðyÞÞ, where y isthe vocabulary which is compared to the search word. Thus, thevalue of mutual information shows the degree of relevance tothe search word and the word pattern can therefore be foundthrough this operational process.

5. Application and evaluation

In first part of this section, we present the interfaces of DVRSand demonstrate the operations in details. Then, in the second partwe focus on its efficient issue and show the questionnaire resultswhich evaluate its capability of supporting language acquisitionfor EFL students.

alyzing word pattern.


5.1. System presentation interfaces

A web interface allows the operation to access the video clipsrelated user’s topical issue as shown in Fig. 7. The left table whichis a named words list table presents the frequency of vocabulariesin DVRS. EFL students can see how many times a target word isused in the video corpus and refer to examples of common usagefrom an e-dictionary (http://www.answer.com). Furthermore, stu-dents key in their query upon the input box, and then the resultscan be shown in the interface including links to the video clipsand their scripts which the inquired vocabularies contained. Fig.8 presents the search results by DVRS, for example, students inputthe word ‘‘heading” as keyword, wanting to search the segments of

Fig. 7. The searching i

Fig. 8. Searching re

video which are relevant to the queried vocabulary. After an oper-ation of DVRS, the result is presented in such a way that the wordslist first of all shows the frequency of the word and secondly showsthe list related to the script and accompanying video clips whichprovide examples of the students’ target vocabulary. Students alsocan click the linkage of Script File Name (see Fig. 8) to access thissegment of video and then a new page can stream the video con-tent. Furthermore, teachers or instructors can edit these targetedvideo contents as an annotation of their language course (see Fig.9). If students need to find out a word pattern in order to under-stand how it is associated to a word, they can click the Concor-dance link in the interface of Fig. 8. The result of word patternanalysis will show those words which co-occur with the query

nterface of DVRS.

sult interface.

http://www.answer.com

Fig. 9. The learning interface related to course topic.


term (see Fig. 10). These word collocations can be revealed sincewe use the technique of mutual information (MI) to evaluate therelationship between words. As shown in Fig. 10, we select thetop 20 terms which have the highest MI value, that is, those wordswhich often accompany the query term appearing in the video con-tents. In our example, we take the query term ‘‘heading” to be ana-lyzed and then we can learn the way this word is commonly usedin English. Accordingly, English phrases such as ‘‘heading toward”and ‘‘heading for” have also been found in our words collocationanalysis and this exposure can help students learn more Englishcollocative words. Fig. 11 shows the display of the segment whichis found in DVRS. The video name, segment time and script contentare shown in this presentation interface. In Fig. 12, teachers canprovide the semantic annotation for certain segments of video.

Fig. 10. System presentation

In Fig. 12, the teacher or editor can describe the correspondingsegment of video with commentary on the description field. Aspreviously explained, the comments contain the following four vi-deo features which are script, tag, event and topic. With these fourfeatures, DVRS can efficiently retrieve the video segments whichhave more relevance or the same definition terms.

5.2. Learning attitude evaluation

We examined the effectiveness of DVRS on students’ learningby using questionnaires. The response items of each question inthe questionnaire were designed by using a 5-point Likert scale.Typically, the test item in a Likert scale is a statement to whichusers respond, indicating the extent to which they agree or

of word pattern analysis.

Fig. 11. Browsing the result segment of video.

Fig. 12. The semi-automated annotation interface.


disagree with the statement. In our design, 5 stands for agreestrongly and 1 stands for disagree strongly. Level 5 responses alsostand for the score of each question allowing us to calculate themean value of each item. The questions listed in the online ques-tionnaires are described as follows:

(a) I like to use DVRS to discover information which helps me tolearn second language.

(b) I like to watch video segments using the DVRS.(c) I think the retrieved segments of video in DVRS are

useful.

Table 3Questionnaire result

Item Mean Std. error Std. Skewness Score >= 4

a 4.36 0.215 1.075 �1.899 84%b 3.92 0.264 1.320 �0.786 64%c 3.80 0.252 1.258 �0.955 68%d 4.32 0.160 0.802 �1.197 88%e 4.12 0.247 1.236 �1.254 76%f 4.24 0.166 0.831 �0.969 84%


(d) It is easy for me to participate in learning activities usingDVRS.

(e) Using DVRS to watch video segments can increase my inter-est and motivation in learning second language.

(f) I have made significant progress since I began to use theDVRS.

We selected 25 graduated students in National Cheng KungUniversity in Taiwan to complete the questionnaire online. These25 students are all equipped with personal computers thus theycould make use of the learning activities on DVRS. The statisticalresults are presented in Table 3. The sixth column describes thepercentage of each item score that is greater or equal to 4. Re-sponses to the first item indicate that most students like to useDVRS to discover information which helps them to learn secondlanguage (Mean = 4.36, 84%). The answers of second item indicatethat most students like to view video segments on DVRS(Mean = 3.92, 64%). The 3rd item indicates that the informationfrom DVRS is useful (Mean = 3.80, 68%). The 4th item indicates thatDVRS provides a convenient learning environment (Mean = 4.32,88%). The 5th item indicates that DVRS can increase the students’interest in learning second language (Mean = 4.12, 76%). The 6thitem indicates that using DVRS can improve students’ learning per-formance (Mean = 4.24, 84%).

Though all the answers tend to present positive opinions forDVRS, the mean value of item b, c and e is relative small than otheritems. Through the oral interview with the students, the explana-tion can be illustrated as follows. The broadcasting of video seg-ments is based on the streaming server and the quality ofstreaming server may be affected by the network bandwidth andcapability of computers. The low broadcasting quality may de-crease the play speed of video segments which has influence onthe smoothness of system operation by the students. Accordingly,a good web-based video learning system should be equipped witheffective network bandwidth and powerful computers which canimprove the operation experience. Nevertheless, the questionnaireresult still shows the strength of DVRS for students.

6. Conclusion

The aim of the work described in this paper is to facilitate learn-ing for EFL students through the use of video segments. In terms ofproviding a meaningful intrinsically motivational learning experi-ence, the use of video segments caused students to readily engagewith course material. Furthermore, lecturers were not required tospend valuable time looking for relevant examples to illustrateusage of target language structures for their students. Instead, theywere able to add auxiliary video materials to their resource listsand to direct their students efficiently and quickly to the same.To enhance the precision of retrieved video segments, we useInformation Retrieval model and the approach of query vocabularyexpansion. Also, due to the semantic search requirements, we em-ployed the semi-automated annotation method and proposed anontological structure of the video to describe video segments. Fromthe ontological structure in video, the proposed system can re-

trieve video segments not only by a words-based query but alsoby a semantic-based query. Furthermore, we analyzed word collo-cations in video scripts to help students learn phrases and gainexperience with the usage of language. Through a prototype sys-tem, we presented the results of a video retrieval interface, thescripts, and word pattern analysis with a view to assisting EFL stu-dents to construct a meaningful and durable foundation of knowl-edge of the course material. It thus becomes possible for learners topropel their language acquisition forward with experience withreal world contexts. According to questionnaire results, the pro-posed system shows the ability to increase student achievementand enable them to explore more fully the opportunities made pos-sible by the improving communicative skills.

Acknowledgement

This work was supported in part by the National Science Coun-cil (NSC), Taiwan, ROC, under Grant NSC 95-2221-E-006-307-MY3.

References

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal ofMachine Learning Research, 3, 993–1022.

Burke, R., & Kass, A. (1995). Supporting learning through active retrieval of videostories. Expert Systems with Applications, 9(3), 361–378.

Chen, J. N., Huang, Y. M., & Chu, W. C. (2005). Applying dynamic fuzzy Petri net toweb learning system. Interactive Learning Environments, 13(3), 159–178.

Cheng, Shyi-Chyi, Chen, Ming-Yao, Chang, Hong-Yi, & Chou, Tzu-Chuan (2007).Semantic-based facial expression recognition using analytical hierarchyprocess. Expert Systems with Applications, 33(1), 86–95.

Church, K. W., & Hanks, P. (1989). Word association norms, mutual information andlexicography. In Proceedings of the 27th annual meeting of ACL (pp. 76–83).Vancouver.

Davis, R. S. (1998). Captioned video: Making it work for you. The Internet TESLJournal, 4(3).

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990).Indexing by latent semantic analysis. Journal of the American Society forInformation Science, 41(6), 391–407.

Hamming, R. W. (1986). Coding and information theory. Englewood Cliffs, NJ:Prentice-Hall.

Han, Y. S., & Choi, K. S. (1993). Lexical concept acquisition from collocation map. InProceeding of SIGLEX workshop on acquisition of lexical knowledge from text, 31stannual meeting of the ACL (pp. 22–31). Columbus.

Huang, Yueh-Min, Chen, Juei-Nan, Huang, Tien-Chi, Jeng, Yu-Lin, & Kuo, Yen-Hung(2008). Standardized course generation process using dynamic fuzzy Petri nets.Expert Systems with Applications, 34(1), 72–86.

Jeng, Y. L., Huang, Y. M., Kuo, Y. H., Chen, J. N., & Chu, W. C. (2005). ANTS: Agent-based navigational training system. In Proceedings of international conference onweb-based learning (pp. 320–325).

Jin, S. H., Bae, T. M., Ro, Y. M., Kim, H. R., & Kim, M. (2006). Intelligent broadcastingsystem and services for personalized semantic contents consumption. ExpertSystems with Applications, 31(1), 164–173.

Jing, Y., & Croft, W. B. (1994). An association thesaurus for information retrieval. InProceedings of the intelligent multimedia information retrieval systems (pp. 146–160). New York.

John, T. (1991). Should you be persuaded: Two samples of data-driven learningmaterials. ELR Journal, 4, 1–16.

Kim, J. K., Kim, H. K., & Cho, Y. H. (2008). A user-oriented contents recommendationsystem in peer-to-peer architecture. Expert Systems with Applications, 34(1),300–312.

Lee, C. Y., & Liou, H. C. (2003). A study of using web concordancing for Englishvocabulary learning in a Taiwanese high school context. English Teaching &Learning, 27(3), 35–56.

Lee, H., Smeaton, A. F., O’Connor, N. E., & Smyth, B. (2006). User valuation ofFischlar-News: An automatic broadcast news delivery system. ACM Transactionon Information System, 24(2), 145–189.

Naphade, M., Smith, J. R., Tesic, J., Chang, S. F., Hsu, W., Kennedy, L., et al. (2006).Large-scale concept ontology for multimedia. IEEE Multimedia, 13, 86–91.

Paivio, A. (1986). Mental representations: A dual coding approach. London: OxfordUniversity Press.

Piaget, J. (1980). The psychogenesis of knowledge and its epistemologicalsignificance. In M. Piattelli-Palmarini (Ed.), Language and learning: The debatebetween Jean Piaget and Noam Chomsky (pp. 23–34). Cambridge, MA: HarvardUniversity Press.

Qiu, Y., & Frei, H. P. (1993). Concept-based query expansion. In Proceedings of IGIR-93, 16th ACM international conference on research and development in informationretrieval (pp. 160–169). Pittsburgh.

Ricardo, B. Y., & Berthier, R. N. (1999). Modern Information Retrieval. ACM Press:Addison-Wesley Publishing Co.. pp. 23–38..


Secules, T., Herron, C., & Tomasello, M. (1992). The effect of video context on foreignlanguage learning. The Modern Language Journal, 76(4), 480–490.

Smeaton, A. F. (2006). Techniques used and open challenges to the analysis,indexing and retrieval of digital video. Information Systems, 32, 545–559.

Sproat, R., & Shih, C. (1990). A statistical method for finding word boundaries inChinese text. Computer Processing of Chinese and Oriental Languages, 4, 336–351.

Srihari, R. K., Zhang, Z., & Rao, A. (2000). Intelligent indexing and semantic retrievalof multimodal document. Information Retrieval, 2(3), 245–275.

Stevens, V. (1995). Concordancing with language learners: Why? When? What?CAELL Journal, 6(2), 2–10.

Sun, Y. C., & Wang, L. Y. (2003). Concordancers in the EFL classroom: Cognitiveapproaches and collocation difficulty. Computer Assisted Language Learning,16(1), 83–95.

Valderrama, R. P., Ocana, L. B., & Sheremetov, L. B. (2005). Development ofintelligent reusable learning objects for web-based education systems. ExpertSystems with Applications, 28(2), 273–283.

Weyers, J. R. (1999). The effect of authentic video on communicative competence.The Modern Language Journal, 83(3), 339–349.

Documents

Retrieving video features for language acquisition