6
A digital-video management project dealing with segmentation, annotation, retrieval, and other related topics has been active for sev- eral years at the Microgravity Advanced Research Support (MARS) Center, in Naples, Italy. Our research is part of a field that studies methods of representing and integrating heterogeneous infor- mation in distributed, Web environments. In this context, each genre of digital video (such as news, television shows, documentaries, and cultural- heritage videos) presents its knowledge in differ- ent forms and structures. Furthermore, for each genre, various methodologies and schools of thought exist that structure the knowledge in dig- ital videos differently. For example, in the field of cinematographic theory, different methods of film segmentation exist, and for those in the cul- tural-heritage field, how they classify objects in different videos depends on the classification cri- teria in each country. In our previous work, we developed some applications 1-3 for different application domains including cultural heritage, cinema, and space experimentation. Because of the heterogeneity of knowledge in videos, we’ve aimed to create a methodology that allows for structural flexibility (archiving) of the video content and to define and implement an efficient retrieval mechanism for archived videos. From an architectural point of view, our strategy has been defining the distributed video resources as agents that possess beliefs (or metadata) about video segmentation and that can carry out inferences on the relative metadata; and implementing the agents, in the form of an effective Web resource, within the Resource Description Framework (RDF). We wish to emphasize the idea of an agent as a resource in RDF. We believe that it constitutes a solution that goes beyond the digital-video domain and that researchers can apply it to most types of Web resources. Indexing video segments A fundamental operation of digital-video man- agement is indexing. By this we mean the operation of associating two indices (a beginning and an end) and a set of descriptors (metadata), which describe the contents, to each video segment (see Figure 1). The methodology and relative architectures that we’ve implemented make use of ontologies. The particular ontological structures our re- search uses are characterized by the fact that a class is like a relation, defined as a predicate. In particular, the predicate name is the class name, and the predicate arguments are the class attrib- utes. The prototyping mechanism is activated by defining each subclass with the n attributes of the class it belongs to plus the subclass’ m attributes. Furthermore, we implemented a hereditary mechanism of the class to the sub- class. We use ontologies in two ways: to classi- fy the indexed video segments into specific categories and manage other activities of the digital videos, such as querying, retrieval and browsing, and composition. In an earlier phase of the project, 1,2 we used an ontology similar to the one Levy defines as the world view (see Figure 2). 4 A hierarchy of classes, where each class is represented by a relationship having n attributes, makes up the world-view ontology’s structure. For such an approach, we cre- ate a simple prototypical hereditary mechanism between a class A and one of its subclasses B by declaring that B is a subclass of A; 96 1070-986X/01/$10.00 © 2001 IEEE Multimedia at Work Editors: Tiziana Catarci and Thomas D.C. Little Francesco Mele and Giovanni Minei Microgravity Advanced Research Support Center Digital-Video Management for Heterogeneous and Distributed Resources

Digital-video management for heterogeneous and distributed resources

  • Upload
    g

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Digital-video management for heterogeneous and distributed resources

Adigital-video management project dealingwith segmentation, annotation, retrieval,

and other related topics has been active for sev-eral years at the Microgravity Advanced ResearchSupport (MARS) Center, in Naples, Italy. Ourresearch is part of a field that studies methods ofrepresenting and integrating heterogeneous infor-mation in distributed, Web environments. In thiscontext, each genre of digital video (such as news,television shows, documentaries, and cultural-heritage videos) presents its knowledge in differ-ent forms and structures. Furthermore, for eachgenre, various methodologies and schools ofthought exist that structure the knowledge in dig-ital videos differently. For example, in the field ofcinematographic theory, different methods offilm segmentation exist, and for those in the cul-tural-heritage field, how they classify objects indifferent videos depends on the classification cri-teria in each country.

In our previous work, we developed someapplications1-3 for different application domainsincluding cultural heritage, cinema, and spaceexperimentation. Because of the heterogeneity ofknowledge in videos, we’ve aimed to create amethodology that allows for structural flexibility(archiving) of the video content and to define andimplement an efficient retrieval mechanism forarchived videos. From an architectural point ofview, our strategy has been

❚ defining the distributed video resources asagents that possess beliefs (or metadata) aboutvideo segmentation and that can carry outinferences on the relative metadata; and

❚ implementing the agents, in the form of aneffective Web resource, within the ResourceDescription Framework (RDF).

We wish to emphasize the idea of an agent asa resource in RDF. We believe that it constitutes asolution that goes beyond the digital-videodomain and that researchers can apply it to mosttypes of Web resources.

Indexing video segmentsA fundamental operation of digital-video man-

agement is indexing. By this we mean the operationof associating two indices (a beginning and an end)and a set of descriptors (metadata), which describethe contents, to each video segment (see Figure 1).

The methodology and relative architecturesthat we’ve implemented make use of ontologies.The particular ontological structures our re-search uses are characterized by the fact that aclass is like a relation, defined as a predicate. Inparticular, the predicate name is the class name,and the predicate arguments are the class attrib-utes. The prototyping mechanism is activatedby defining each subclass with the n attributesof the class it belongs to plus the subclass’ mattributes. Furthermore, we implemented ahereditary mechanism of the class to the sub-class. We use ontologies in two ways: to classi-fy the indexed video segments into specificcategories and manage other activities of thedigital videos, such as querying, retrieval andbrowsing, and composition.

In an earlier phase of the project,1,2 we used anontology similar to the one Levy defines as theworld view (see Figure 2).4 A hierarchy of classes,where each class is represented by a relationshiphaving n attributes, makes up the world-viewontology’s structure. For such an approach, we cre-ate a simple prototypical hereditary mechanismbetween a class A and one of its subclasses B by

❚ declaring that B is a subclass of A;

96 1070-986X/01/$10.00 © 2001 IEEE

Multimedia at Work Editors: Tiziana Catarci and Thomas D.C. Little

Francesco Meleand

Giovanni MineiMicrogravity

Advanced ResearchSupport Center

Digital-Video Management forHeterogeneous and DistributedResources

Page 2: Digital-video management for heterogeneous and distributed resources

❚ representing A with n attributes and B onlywith its specific attributes; and

❚ transmitting, for example, to B, via inheritance,the attributes of A.

Implementing sophisticated apparatus such asFrame Logic5 and relatively widely distributedtools like PosgreSQL (see http://postgresql.rmplc.co.uk/) makes this schematic process possible.Because our goal is to integrate distributedsources, we’ve often used a simple model for basicrepresentation, preferring ease of use and inter-operability to the expressive richness of other for-malisms in this field. Our research isn’t orientedtoward studying new types of ontological appara-tus to define and realize new approaches for dis-tributed systems that use ontologies as anintegration tool.

Annotating film with Frame LogicDuring a second phase of the project, we need-

ed procedures that were simple to define and at thesame time helped us efficiently evaluate queries.This led us to insert, within the architectures forvideo management, logical languages based onHorn clauses associated with inferential engines.The Horn clauses represent a highly expressivesubset of the First Order Logic (FOL), which havethe following syntax: A1 ∧ A2 ∧ … ∧ An ⇒ B,where the Ai and B are first order predicates. In par-ticular, we used a formalism called Frame Logic.This language let us construct ontologies andhereditary mechanisms between classes via logicalinferential rules (in the form of Horn clauses).

Frame Logic lets us define procedures that acti-vate two computational levels. In the first level,the queries use the ontologies as an auxiliaryapparatus in search of the data (the values). In thesecond level, the relations that define the ontol-ogy and the names of the attributes (themetadata) can be processed—for example, by aFrame Logic inferential engine—as a query argu-

ment, thus activating a second computationallevel (metaprogramming).

Therefore, in the approach we adopted, theindexing process of an example film takes placerelative to a definite ontology. The followingontology, in F-Logic formalism, is a model of clas-

97

Figure 1. A simple

example of indexed

video segment.

video

video_head

video_subj

head_titles

end_titles

architecture

civil_building

sub_civil_building

sight_building

primitive_building

painting

civil_towerrushsquareforumstreetfountaincivil_porticobridgeaqueductgatearchcolumnloggia

sub_holy_building

church_buildingholy_basilicaabbeycathedralsanctuary

funeral_buildingmausoleumtombpyramidcatacombsarcophagus

sculpturebass_reliefhalf_reliefhigh_reliefmezzotondotuttotondo

staturebusthead

holy_monumenttemplebaptisterymonasterynunnerymosquepagoda

altarholy_porticocloisterchapelsacristyfontportalcellconcistoriumpresbyterytabernaclechoircryptapsefrontonnaosholy_tower

circustheateramphitheaterdromos

menhirbaetulusdolmentrilitenuraghe

civil_monumentpalacecivil_basilicathermaecastlemuseumhousevilla

water_colorfrescoencausticgraffitogouachemosaicoilpasteltemperavascular

minor_artminiaturegoldsmithenamelsglasswaretoreuticslitostroticlothsivoriesceramicsbronzes

holy_building

Figure 2. An example of

world view for cultural

heritage.

Page 3: Digital-video management for heterogeneous and distributed resources

sication (and of segmentation of films) that wecan implement in many film genres in the cine-matographic environment:

%% Schema

film_head::film.

film_segment::film.

episode::film_segment.

ord_sequence::film_segment.

sce_sequence::film_segment.

pla_sequence::film_segment.

event::film_segment.

%% Classes definition

film [title => string;

file_name => string;

contxt => film].

. . . . .

film_segment

[description => string;

start_index => integer;

end_index => integer;

runtime => integer].

episode

[ep_subtitle => string;

ep_mev =>> string;

ep_set => string;

ep_temd => tempde;

ep_cast =>> string].

sce_sequence

[ssq_mev =>> string;

ssq_set => string;

ssq_temd => tempde;

ssq_cast =>> string ].

. . . . .

With this approach, the data in the film makeup the instances of the ontology classes. In AlfredHitchcock’s film Psycho, for example, we canarchive the scene where Marion Crane getsstabbed with the following instances:

c1:film

[title -> <<Psycho>>;

file_name -> <<psycho.mpg>>;

contxt -> c1].

c3:sce_sequence

[title -> <<Psycho>>;

file_name -> <<psycho.mpg>>;

description -> <<Marion Crane is

stabbed by murderer in the

shower>>;

start_index -> 0;

end_index -> 190;

runtime -> 190;

contxt -> c1;

ssq_mev ->> {<<murder>>};

sq_set -> <<shower>>;

ssq_temd -> from_to(t1, t2));

ssq_cast ->> {<<murderer>>,

<<Marion Crane>>};

Logical rules for retrieving and managingfilm segments

Representing a film’s metadata in Frame Logiclets us define simple queries for retrieving thedata, implementing the same representation for-malism as Frame Logic. For example, in Psycho, ifwe want to define a procedure to find the nameof the film’s director, we can do this with a logical

98

IEEE

Mul

tiM

edia

Multimedia at Work

Figure 3. Indexed

browser.

Page 4: Digital-video management for heterogeneous and distributed resources

query of this type:

director_of_film(Film_title,

Director_name):-

C:film_head[title->Film_title;

directed_by->>Director_name].

Activating the query director_of_film

(<<Psycho>>,Director_name). gives theanswer Hitchcock.

Using Frame Logic, we can also easily definemany procedures useful for managing a film(querying, browsing, and composition). A goodexample is the query definitions that let us gener-ate an indexed browser (see Figure 3):

browser_indice(Title) :-

C:film[title ->Title],

display_title(‘Film index browser’,

Title),

gen_browser(C,5,L).

gen_browser(C,L,L1) :-

not C1:film[contxt -> C].

gen_browser(C,L,L1) :-

C1:film[contxt -> C],

display_index(C1,L,L1),

gen_browser(C1,L1,L2),

other_index.

gen_browser(C,G,F).

From the same type of query, we also can gener-ate a graphical browser (see Figure 4) togetherwith opportune visualization routines.

Agents as Web video resourcesIn the Web’s distributed environment, an

author or groups of authors are continually estab-lishing a proliferation of private archives—notonly digital videos—and proposing their ownmetadata language. Some of these tools (for exam-ple, the well-known tool in the bibliographic dataof Dublin Core) have defined a set of ontologicalstructuring standards to describe a specificdomain. Unfortunately, the development andaffirmation of these standards is slow with respectto the rapidity with which new objects, and there-fore new ontologies, are produced on the Web.Furthermore, retrieving and integrating existingdata is often a problem because of a descriptivedeficiency in the metadata (too few and minimal-ly expressive).

The digital video domain is then, in general, onein which it’s difficult to reach a descriptive stan-dard. For example in each country, one or moremetadata standardization proposals exist in the cul-tural-heritage video domain. (The standardizationswe’re referring to are those put forward as referencesfor the construction of outlines and ontologies).Another example is the cinema domain, where thesearch for a standard is even more difficult, becauseit’s impossible to univocally segment and annotatea film at a theoretical level.

99

July–Septem

ber 2001

Figure 4. Graphical

browser.

Page 5: Digital-video management for heterogeneous and distributed resources

The difficulty in standardizing metadata led usto propose an integration approach for heteroge-neous videos. Our approach doesn’t restructurethe source and therefore doesn’t change the rela-tive metadata. We constructed an integrationapproach between video sources by adding somecapabilities of information exchange relative tothe video metadata and the basic ontologies usedfor the indexing.

To achieve our objectives, we defined andtested an architecture, based on software agents,for integration management (see Figure 5). Afirst significant component of our proposedarchitecture is that we represent an agent’s men-tal attitudes (beliefs, goals, and so on) as Webresources. We’ve focused on RDF for a languageto represent resources. We believe RDF will soonestablish itself as the standard formalism for rep-resenting metadata and successively as the lan-guage for exchanging information across theWeb. In fact, the World Wide Web Consortiumdeveloped RDF with the intent of developing aparadigm for a homogeneous representation ofWeb resources. RDF introduces a novel resourceconcept, where each object to which it’s possi-ble to associate a URL address can be considereda resource.

One further extension of RDF, the RDFSchema, adopts the same formalism the RDF

language uses to describe resourcesand lets us define ontologies thesame way we do with Frame Logic.We’ve implemented the RDFSchema to represent the mentalattitudes as an ontology’s classes. Inthis way, we define a mental atti-tude, such as a belief, as a class (seeFigure 6).

Figure 6 illustrates the RDFSchema representation of a belief asan instance of the class bel (belief)having agt and content attribut-es. The attribute agt takes on thevalue of the Universal ResourceIdentifier (URI) address where welocalize the agent’s representationa0.rdf in RDF in the form of aWeb resource. The value of theattribute content is also represent-ed by a URI address and therefore asa Web resource. Finally, the objectof the agent’s belief, http://epp70-02.marscenter.it:8080/masname-space/a0.rdf, is the video segment

identified by c3, to which we associate thisaddress: http://epp70-01.marscenter.it:8081/videospace/psycho.rdf#c3.

Our second architecture component adopts arepresentation in which ontologies describing thedomain constitute the object of the agent’sbeliefs. For example, for bel(X,sub_classe(A,B)), the agent X believes A is a subclass of B.Various languages and tools exist that can easilyretrieve the data and metadata of an agent’sbeliefs in this form. The representation we chosefor an agent’s mental attitudes, besides facilitat-ing the retrieval of domain information, allowsfor the definition of ulterior and more complexdeduction mechanisms. In fact, we associateinferential rules with the agents that allow for thededuction of facts from other facts.

Our architecture’s third component is respon-sible for the agents’ inferential activities. Wechose Frame Logic to implement such activities,because RDF isn’t suitable for capturing inferen-tial activities.

The fourth and final module in Figure 5 imple-ments the interface functions between the agentand the Web network. For each type of architec-ture, we’ve selected existing tools and methods inthe RDF approach. Figure 5 shows we have usedthe logical-deductive tool Simple Logic-based RDFInterpreter.6

100

IEEE

Mul

tiM

edia

Multimedia at Work

Agent

Mental attitudes (beliefs, goals,…) in the RDF Schema

Video data and metadata as agent beliefs

Frame Logic for defining deductive rules

SILRI for querying on RDF Web video resources

Figure 5. Agents as Web

resources: components

and architecture.

Page 6: Digital-video management for heterogeneous and distributed resources

ConclusionFuture developments of the defined architecture

foresee the implementation of a Web interface thatlets generic users define their own community ofagents via an RDF template. In this manner, eachagent will constitute a Web resource, which will beavailable at a particular Web address. MM

References1. C. Di Napoli et al., “A Methodology to Annotate

Cultural Heritage Digital Video,” Lecture Notes in

Computer Science 1513, Springer Verlag, New York,

1998, pp. 649-650.

2. F. Mele et al., “Film Digital Segmentazione e Archivi-

azione,” AI*IA Notizie Anno (AI*IA News), Year XII,

no. 4, Dec. 1999, pp. 39-42.

3. E. Ceglia, F. Mele, and G. Minei, “An Architecture for

Distributed Resources in Space Information Manage-

ment,” MSSU: Microgravity and Space Station Utiliza-

tion, vol. 1, no. 1, Jan. 2001.

4. A.Y. Levy et al., “Answering Queries Using Views,”

Proc. 14th ACM SIGACT-SIGMOD-SIGART Symp. Prin-

ciples of Database Systems, ACM Press, New York,

1995, pp. 95-104.

5. M. Kifer, G. Lausen, and J. Wu, “Logical Foundations

of Object-Oriented and Frame-Based Languages,” J.

ACM, vol. 42, no. 4, July 1995, pp. 741-843.

6. S. Decker et al., “A Query and Inference Service for

RDF,” Proc. Query Languages Workshop (QL 98),

http://www.w3.org/TandS/QL/QL98/pp/

queryservice.html.

Readers may contact Mele at the Microgravity Advanced

Research Support Center, Via Gianturco 31, 80144 Naples,

Italy, email [email protected].

Readers may contact Multimedia at Work editors Catarci

at the Dept. Information Systems, Univ. of Rome “La Sapien-

za,” Via Salara 113, 00198 Rome, Italy, email catarci@

dis.uniroma1.it, and Little at the Multimedia Communica-

tions Lab, Dept. of Electrical Eng., Boston Univ., 8 Saint

Mary’s St., Boston, MA 02215, email [email protected].

101

July–Septem

ber 2001

http://epp700-01.marscenter.it:8081/videospace/psycho.rdf#c3

subClassOfinstanceOf

rdfs:Resource

rdfs:Class rdfs:Property

appl:web_resourceappl:web_resource appl:agt appl:contentappl:mentatt

appl:bel

rdf:descripton ID = #KB_INSTANCE_00017

appl:s appl:agt

rdfutil:range

rdfutil:domain

appl:agt appl:content

http://epp700-02.marscenter.it:8080/masnamespace/a0.rdf

Figure 6. An example of

mental attitude

(beliefs) representation

in RDF.