Upload
g
View
213
Download
1
Embed Size (px)
Citation preview
Adigital-video management project dealingwith segmentation, annotation, retrieval,
and other related topics has been active for sev-eral years at the Microgravity Advanced ResearchSupport (MARS) Center, in Naples, Italy. Ourresearch is part of a field that studies methods ofrepresenting and integrating heterogeneous infor-mation in distributed, Web environments. In thiscontext, each genre of digital video (such as news,television shows, documentaries, and cultural-heritage videos) presents its knowledge in differ-ent forms and structures. Furthermore, for eachgenre, various methodologies and schools ofthought exist that structure the knowledge in dig-ital videos differently. For example, in the field ofcinematographic theory, different methods offilm segmentation exist, and for those in the cul-tural-heritage field, how they classify objects indifferent videos depends on the classification cri-teria in each country.
In our previous work, we developed someapplications1-3 for different application domainsincluding cultural heritage, cinema, and spaceexperimentation. Because of the heterogeneity ofknowledge in videos, we’ve aimed to create amethodology that allows for structural flexibility(archiving) of the video content and to define andimplement an efficient retrieval mechanism forarchived videos. From an architectural point ofview, our strategy has been
❚ defining the distributed video resources asagents that possess beliefs (or metadata) aboutvideo segmentation and that can carry outinferences on the relative metadata; and
❚ implementing the agents, in the form of aneffective Web resource, within the ResourceDescription Framework (RDF).
We wish to emphasize the idea of an agent asa resource in RDF. We believe that it constitutes asolution that goes beyond the digital-videodomain and that researchers can apply it to mosttypes of Web resources.
Indexing video segmentsA fundamental operation of digital-video man-
agement is indexing. By this we mean the operationof associating two indices (a beginning and an end)and a set of descriptors (metadata), which describethe contents, to each video segment (see Figure 1).
The methodology and relative architecturesthat we’ve implemented make use of ontologies.The particular ontological structures our re-search uses are characterized by the fact that aclass is like a relation, defined as a predicate. Inparticular, the predicate name is the class name,and the predicate arguments are the class attrib-utes. The prototyping mechanism is activatedby defining each subclass with the n attributesof the class it belongs to plus the subclass’ mattributes. Furthermore, we implemented ahereditary mechanism of the class to the sub-class. We use ontologies in two ways: to classi-fy the indexed video segments into specificcategories and manage other activities of thedigital videos, such as querying, retrieval andbrowsing, and composition.
In an earlier phase of the project,1,2 we used anontology similar to the one Levy defines as theworld view (see Figure 2).4 A hierarchy of classes,where each class is represented by a relationshiphaving n attributes, makes up the world-viewontology’s structure. For such an approach, we cre-ate a simple prototypical hereditary mechanismbetween a class A and one of its subclasses B by
❚ declaring that B is a subclass of A;
96 1070-986X/01/$10.00 © 2001 IEEE
Multimedia at Work Editors: Tiziana Catarci and Thomas D.C. Little
Francesco Meleand
Giovanni MineiMicrogravity
Advanced ResearchSupport Center
Digital-Video Management forHeterogeneous and DistributedResources
❚ representing A with n attributes and B onlywith its specific attributes; and
❚ transmitting, for example, to B, via inheritance,the attributes of A.
Implementing sophisticated apparatus such asFrame Logic5 and relatively widely distributedtools like PosgreSQL (see http://postgresql.rmplc.co.uk/) makes this schematic process possible.Because our goal is to integrate distributedsources, we’ve often used a simple model for basicrepresentation, preferring ease of use and inter-operability to the expressive richness of other for-malisms in this field. Our research isn’t orientedtoward studying new types of ontological appara-tus to define and realize new approaches for dis-tributed systems that use ontologies as anintegration tool.
Annotating film with Frame LogicDuring a second phase of the project, we need-
ed procedures that were simple to define and at thesame time helped us efficiently evaluate queries.This led us to insert, within the architectures forvideo management, logical languages based onHorn clauses associated with inferential engines.The Horn clauses represent a highly expressivesubset of the First Order Logic (FOL), which havethe following syntax: A1 ∧ A2 ∧ … ∧ An ⇒ B,where the Ai and B are first order predicates. In par-ticular, we used a formalism called Frame Logic.This language let us construct ontologies andhereditary mechanisms between classes via logicalinferential rules (in the form of Horn clauses).
Frame Logic lets us define procedures that acti-vate two computational levels. In the first level,the queries use the ontologies as an auxiliaryapparatus in search of the data (the values). In thesecond level, the relations that define the ontol-ogy and the names of the attributes (themetadata) can be processed—for example, by aFrame Logic inferential engine—as a query argu-
ment, thus activating a second computationallevel (metaprogramming).
Therefore, in the approach we adopted, theindexing process of an example film takes placerelative to a definite ontology. The followingontology, in F-Logic formalism, is a model of clas-
97
Figure 1. A simple
example of indexed
video segment.
video
video_head
video_subj
head_titles
end_titles
architecture
civil_building
sub_civil_building
sight_building
primitive_building
painting
civil_towerrushsquareforumstreetfountaincivil_porticobridgeaqueductgatearchcolumnloggia
sub_holy_building
church_buildingholy_basilicaabbeycathedralsanctuary
funeral_buildingmausoleumtombpyramidcatacombsarcophagus
sculpturebass_reliefhalf_reliefhigh_reliefmezzotondotuttotondo
staturebusthead
holy_monumenttemplebaptisterymonasterynunnerymosquepagoda
altarholy_porticocloisterchapelsacristyfontportalcellconcistoriumpresbyterytabernaclechoircryptapsefrontonnaosholy_tower
circustheateramphitheaterdromos
menhirbaetulusdolmentrilitenuraghe
civil_monumentpalacecivil_basilicathermaecastlemuseumhousevilla
water_colorfrescoencausticgraffitogouachemosaicoilpasteltemperavascular
minor_artminiaturegoldsmithenamelsglasswaretoreuticslitostroticlothsivoriesceramicsbronzes
holy_building
Figure 2. An example of
world view for cultural
heritage.
sication (and of segmentation of films) that wecan implement in many film genres in the cine-matographic environment:
%% Schema
film_head::film.
film_segment::film.
episode::film_segment.
ord_sequence::film_segment.
sce_sequence::film_segment.
pla_sequence::film_segment.
event::film_segment.
%% Classes definition
film [title => string;
file_name => string;
contxt => film].
. . . . .
film_segment
[description => string;
start_index => integer;
end_index => integer;
runtime => integer].
episode
[ep_subtitle => string;
ep_mev =>> string;
ep_set => string;
ep_temd => tempde;
ep_cast =>> string].
sce_sequence
[ssq_mev =>> string;
ssq_set => string;
ssq_temd => tempde;
ssq_cast =>> string ].
. . . . .
With this approach, the data in the film makeup the instances of the ontology classes. In AlfredHitchcock’s film Psycho, for example, we canarchive the scene where Marion Crane getsstabbed with the following instances:
c1:film
[title -> <<Psycho>>;
file_name -> <<psycho.mpg>>;
contxt -> c1].
c3:sce_sequence
[title -> <<Psycho>>;
file_name -> <<psycho.mpg>>;
description -> <<Marion Crane is
stabbed by murderer in the
shower>>;
start_index -> 0;
end_index -> 190;
runtime -> 190;
contxt -> c1;
ssq_mev ->> {<<murder>>};
sq_set -> <<shower>>;
ssq_temd -> from_to(t1, t2));
ssq_cast ->> {<<murderer>>,
<<Marion Crane>>};
Logical rules for retrieving and managingfilm segments
Representing a film’s metadata in Frame Logiclets us define simple queries for retrieving thedata, implementing the same representation for-malism as Frame Logic. For example, in Psycho, ifwe want to define a procedure to find the nameof the film’s director, we can do this with a logical
98
IEEE
Mul
tiM
edia
Multimedia at Work
Figure 3. Indexed
browser.
query of this type:
director_of_film(Film_title,
Director_name):-
C:film_head[title->Film_title;
directed_by->>Director_name].
Activating the query director_of_film
(<<Psycho>>,Director_name). gives theanswer Hitchcock.
Using Frame Logic, we can also easily definemany procedures useful for managing a film(querying, browsing, and composition). A goodexample is the query definitions that let us gener-ate an indexed browser (see Figure 3):
browser_indice(Title) :-
C:film[title ->Title],
display_title(‘Film index browser’,
Title),
gen_browser(C,5,L).
gen_browser(C,L,L1) :-
not C1:film[contxt -> C].
gen_browser(C,L,L1) :-
C1:film[contxt -> C],
display_index(C1,L,L1),
gen_browser(C1,L1,L2),
other_index.
gen_browser(C,G,F).
From the same type of query, we also can gener-ate a graphical browser (see Figure 4) togetherwith opportune visualization routines.
Agents as Web video resourcesIn the Web’s distributed environment, an
author or groups of authors are continually estab-lishing a proliferation of private archives—notonly digital videos—and proposing their ownmetadata language. Some of these tools (for exam-ple, the well-known tool in the bibliographic dataof Dublin Core) have defined a set of ontologicalstructuring standards to describe a specificdomain. Unfortunately, the development andaffirmation of these standards is slow with respectto the rapidity with which new objects, and there-fore new ontologies, are produced on the Web.Furthermore, retrieving and integrating existingdata is often a problem because of a descriptivedeficiency in the metadata (too few and minimal-ly expressive).
The digital video domain is then, in general, onein which it’s difficult to reach a descriptive stan-dard. For example in each country, one or moremetadata standardization proposals exist in the cul-tural-heritage video domain. (The standardizationswe’re referring to are those put forward as referencesfor the construction of outlines and ontologies).Another example is the cinema domain, where thesearch for a standard is even more difficult, becauseit’s impossible to univocally segment and annotatea film at a theoretical level.
99
July–Septem
ber 2001
Figure 4. Graphical
browser.
The difficulty in standardizing metadata led usto propose an integration approach for heteroge-neous videos. Our approach doesn’t restructurethe source and therefore doesn’t change the rela-tive metadata. We constructed an integrationapproach between video sources by adding somecapabilities of information exchange relative tothe video metadata and the basic ontologies usedfor the indexing.
To achieve our objectives, we defined andtested an architecture, based on software agents,for integration management (see Figure 5). Afirst significant component of our proposedarchitecture is that we represent an agent’s men-tal attitudes (beliefs, goals, and so on) as Webresources. We’ve focused on RDF for a languageto represent resources. We believe RDF will soonestablish itself as the standard formalism for rep-resenting metadata and successively as the lan-guage for exchanging information across theWeb. In fact, the World Wide Web Consortiumdeveloped RDF with the intent of developing aparadigm for a homogeneous representation ofWeb resources. RDF introduces a novel resourceconcept, where each object to which it’s possi-ble to associate a URL address can be considereda resource.
One further extension of RDF, the RDFSchema, adopts the same formalism the RDF
language uses to describe resourcesand lets us define ontologies thesame way we do with Frame Logic.We’ve implemented the RDFSchema to represent the mentalattitudes as an ontology’s classes. Inthis way, we define a mental atti-tude, such as a belief, as a class (seeFigure 6).
Figure 6 illustrates the RDFSchema representation of a belief asan instance of the class bel (belief)having agt and content attribut-es. The attribute agt takes on thevalue of the Universal ResourceIdentifier (URI) address where welocalize the agent’s representationa0.rdf in RDF in the form of aWeb resource. The value of theattribute content is also represent-ed by a URI address and therefore asa Web resource. Finally, the objectof the agent’s belief, http://epp70-02.marscenter.it:8080/masname-space/a0.rdf, is the video segment
identified by c3, to which we associate thisaddress: http://epp70-01.marscenter.it:8081/videospace/psycho.rdf#c3.
Our second architecture component adopts arepresentation in which ontologies describing thedomain constitute the object of the agent’sbeliefs. For example, for bel(X,sub_classe(A,B)), the agent X believes A is a subclass of B.Various languages and tools exist that can easilyretrieve the data and metadata of an agent’sbeliefs in this form. The representation we chosefor an agent’s mental attitudes, besides facilitat-ing the retrieval of domain information, allowsfor the definition of ulterior and more complexdeduction mechanisms. In fact, we associateinferential rules with the agents that allow for thededuction of facts from other facts.
Our architecture’s third component is respon-sible for the agents’ inferential activities. Wechose Frame Logic to implement such activities,because RDF isn’t suitable for capturing inferen-tial activities.
The fourth and final module in Figure 5 imple-ments the interface functions between the agentand the Web network. For each type of architec-ture, we’ve selected existing tools and methods inthe RDF approach. Figure 5 shows we have usedthe logical-deductive tool Simple Logic-based RDFInterpreter.6
100
IEEE
Mul
tiM
edia
Multimedia at Work
Agent
Mental attitudes (beliefs, goals,…) in the RDF Schema
Video data and metadata as agent beliefs
Frame Logic for defining deductive rules
SILRI for querying on RDF Web video resources
Figure 5. Agents as Web
resources: components
and architecture.
ConclusionFuture developments of the defined architecture
foresee the implementation of a Web interface thatlets generic users define their own community ofagents via an RDF template. In this manner, eachagent will constitute a Web resource, which will beavailable at a particular Web address. MM
References1. C. Di Napoli et al., “A Methodology to Annotate
Cultural Heritage Digital Video,” Lecture Notes in
Computer Science 1513, Springer Verlag, New York,
1998, pp. 649-650.
2. F. Mele et al., “Film Digital Segmentazione e Archivi-
azione,” AI*IA Notizie Anno (AI*IA News), Year XII,
no. 4, Dec. 1999, pp. 39-42.
3. E. Ceglia, F. Mele, and G. Minei, “An Architecture for
Distributed Resources in Space Information Manage-
ment,” MSSU: Microgravity and Space Station Utiliza-
tion, vol. 1, no. 1, Jan. 2001.
4. A.Y. Levy et al., “Answering Queries Using Views,”
Proc. 14th ACM SIGACT-SIGMOD-SIGART Symp. Prin-
ciples of Database Systems, ACM Press, New York,
1995, pp. 95-104.
5. M. Kifer, G. Lausen, and J. Wu, “Logical Foundations
of Object-Oriented and Frame-Based Languages,” J.
ACM, vol. 42, no. 4, July 1995, pp. 741-843.
6. S. Decker et al., “A Query and Inference Service for
RDF,” Proc. Query Languages Workshop (QL 98),
http://www.w3.org/TandS/QL/QL98/pp/
queryservice.html.
Readers may contact Mele at the Microgravity Advanced
Research Support Center, Via Gianturco 31, 80144 Naples,
Italy, email [email protected].
Readers may contact Multimedia at Work editors Catarci
at the Dept. Information Systems, Univ. of Rome “La Sapien-
za,” Via Salara 113, 00198 Rome, Italy, email catarci@
dis.uniroma1.it, and Little at the Multimedia Communica-
tions Lab, Dept. of Electrical Eng., Boston Univ., 8 Saint
Mary’s St., Boston, MA 02215, email [email protected].
101
July–Septem
ber 2001
http://epp700-01.marscenter.it:8081/videospace/psycho.rdf#c3
subClassOfinstanceOf
rdfs:Resource
rdfs:Class rdfs:Property
appl:web_resourceappl:web_resource appl:agt appl:contentappl:mentatt
appl:bel
rdf:descripton ID = #KB_INSTANCE_00017
appl:s appl:agt
rdfutil:range
rdfutil:domain
appl:agt appl:content
http://epp700-02.marscenter.it:8080/masnamespace/a0.rdf
Figure 6. An example of
mental attitude
(beliefs) representation
in RDF.