10
Querying Multimedia Presentations Chao-Hui Wu, Renke J. Miller and Ming T. Liu Department of Computer and Information Science The Ohio State University Columbus, Ohio 43210-1277 E-mail: {wuc, rjmiller, liu}@cis.ohio-state.edu Abstract We examine the querying requirements of large libraries of multimedia presentations. Given these requirements, we examine work on querying temporal and sequence data, and identzfy the similarities and dissimilarities between presentations and these other forms of data. From this analysis, we propose an integrated composi- tion and query capability to permit the reuse of multi- media objects, presentations and presentation segments. The query facility permits content and attribute based queries along with queries over temporal synchroniza- tion characteristics. The main contributions of our work are: it addresses both determinant and indeter- minant intervals; it permits querying over presenta- tion libraries with heterogeneous structure; it builds o n work from temporal and sequence database to address the unique semantics of presentations. 1 Introduction 1.1 Motivation The advent of multimedia technologies has led to an ex- plosion in the use of multimedia to communicate ideas. One important and useful application which has gener- ated much interest in the research community is mul- timedia presentation and authoring systems. A multi- media presentation and authoring system provides users with a model or a tool to create their own presentations chronization constraints on presentations. Examples of multimedia presentations include news broadcasts or university lectures where video clips might be accompa- nied by images, audio, close-captioning, etc. The com- ponents of a presentation must be synchronized both spatially and temporally. Presentation models are used to represent spatial and temporal relationships together and to specify the Quality of Service (QoS) and syn- with quality of service characteristics of both the pre- sentation and its components [l, 21. A primary role of presentation models is to enable the specification of these synchronization constraints and to facilitate the retrieval and play back of presenta- tions [3, 11. An equally important role is to enable the browsing and searching of presentation libraries to sup- port the reuse of presentations and presentation com- ponents. When users compose a new presentation, they might want to browse through the existing presentation database and extract some useful presentation compo- nents. By providing efficient composition and query operators, the objects in the system can be fully uti- lized, the relations between objects can be well doc- umented and maintained, and users can save time in creating new presentations. For example, in a library of university lectures, an instructor may wish to find video clips demonstrating a specific volume rendering technique. To get ideas on how to introduce or ex- plain the demonstration, the instructor may also wish to retrieve all objects synchronized, within a presenta- tion, to such a video. Alternatively, a user may wish to retrieve presentations based on the synchronization or temporal relationships illustrated by the presentation. A presentation author may wish to retrieve portions of presentations that illustrate how multiple video clips may be simultaneously played with at least one audio segment. It is our thesis that a multimedia presenta- tion and authoring system should provide an integrated, flexible composition and query capability. It must be possible to issue complex queries and use query resulte directly to create new presentations. To understand the querying requirements of presen- tations, it is instructive to consider the basic character- istics of presentations. Heterogeneous objects (images, video, sound, text, etc.) may be combined to form a single presentation. These objects may have different attributes and accessing methods. The presentation 0-8186-7916-6197 $10.00 0 1997 IEEE 64

[IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

  • Upload
    mt

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

Querying Multimedia Presentations

Chao-Hui Wu, Renke J. Miller and Ming T. Liu

Department of Computer and Information Science The Ohio State University

Columbus, Ohio 43210-1277 E-mail: wuc, rjmiller, [email protected]

Abstract

W e examine the querying requirements of large libraries of mul t imedia presentations. G iven these requirements, w e examine work o n querying temporal and sequence data, and identzfy the similarities and dissimilarities between presentations and these other f o r m s of data. From this analysis, we propose a n integrated composi- t i on and query capability t o permi t t he reuse o f mul t i - media objects, presentations and presenta t ion segments. T h e query faci l i ty permi ts content and attribute based queries along wi th queries over temporal synchroniza- t i on characteristics. T h e m a i n contributions of our work are: it addresses both de terminant and indeter- m i n a n t intervals; it permi ts querying over presenta- t i on libraries wi th heterogeneous structure; it builds o n work f r o m temporal and sequence database to address the unique semant ics of presentations.

1 Introduction

1.1 Motivation

The advent of multimedia technologies has led to an ex- plosion in the use of multimedia to communicate ideas. One important and useful application which has gener- ated much interest in the research community is mul- timedia presentation and authoring systems. A multi- media presentation and authoring system provides users with a model or a tool to create their own presentations

chronization constraints on presentations. Examples of multimedia presentations include news broadcasts or university lectures where video clips might be accompa- nied by images, audio, close-captioning, etc. The com- ponents of a presentation must be synchronized both spatially and temporally. Presentation models are used to represent spatial and temporal relationships together

and to specify the Quality of Service (QoS) and syn-

with quality of service characteristics of both the pre- sentation and its components [l, 21.

A primary role of presentation models is to enable the specification of these synchronization constraints and to facilitate the retrieval and play back of presenta- tions [3, 11. An equally important role is to enable the browsing and searching of presentation libraries to sup- port the reuse of presentations and presentation com- ponents. When users compose a new presentation, they might want to browse through the existing presentation database and extract some useful presentation compo- nents. By providing efficient composition and query operators, the objects in the system can be fully uti- lized, the relations between objects can be well doc- umented and maintained, and users can save time in creating new presentations. For example, in a library of university lectures, an instructor may wish to find video clips demonstrating a specific volume rendering technique. To get ideas on how to introduce or ex- plain the demonstration, the instructor may also wish to retrieve all objects synchronized, within a presenta- tion, to such a video. Alternatively, a user may wish to retrieve presentations based on the synchronization or temporal relationships illustrated by the presentation. A presentation author may wish to retrieve portions of presentations that illustrate how multiple video clips may be simultaneously played with at least one audio segment. It is our thesis that a multimedia presenta- tion and authoring system should provide an integrated, flexible composition and query capability. It must be possible to issue complex queries and use query resulte directly to create new presentations.

To understand the querying requirements of presen- tations, it is instructive to consider the basic character- istics of presentations. Heterogeneous objects (images, video, sound, text, etc.) may be combined to form a single presentation. These objects may have different attributes and accessing methods. The presentation

0-8186-7916-6197 $10.00 0 1997 IEEE 64

Page 2: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

specification must include the temporal relationships between the presentation components. Some media ob- jects in presentations will need to be synchronized with outside events such as a user interaction. Consider a slide show presentation as an example. Each image is displayed for an interval of time that is determined by the speaker. As the presentation progresses, the speaker determines when to terminate the display of each slide. The actual temporal relations between objects in a pre- sentation may therefore be unpredictable until run time. Such temporal relations are called inde terminate tem- poral relations [l]. Queries must be permitted over de- terminant (fixed) and indeterminate relations alike.

To support the retrieval and reuse of presentations and presentation components, a multimedia presenta- tion and authoring system must support ad hoc query- ing including the following types of queries.

Attribute based queries. Most presentation mod- els permit the association of attributes, including text and numerical attributes, with presentation components. Queries over these attributes, which may represent features extracted from the multi- media objects, are common [4]. Example queries include, “Select all the presentations related to Vir- tual Reality” or “Show the components related to Vir tual Reality in the presentation library”. These queries involve attributes of the presentation or ob- jects within the presentations.

Multimedia content based queries. To query the content of multimedia objects within presentations, special operators such as the QBIC operators [5], which permit queries over color composition and other image or media characteristics, may be used. For example, the query “Select all the images which contain round shapes (abnormal tumors) as shown in image X” will involve special query operators, such as pattern recognition operators.

Temporal queries. Presentation queries may in- volve the synchronization constraints within a pre- sentation. The query “Select all the video clips played while (overlapping) Ravel’s Bolero is play- ing” may help in analyzing how this piece of music is used. The query “Find all the images with the same subject played right before and right after a particular image”’ might be helpful for users wish- ing to find any related images with the database. These queries involve reasoning about temporal re- lationships between objects.

1.2 Current Approaches

Despite the extensive work done on specifying multi- media presentations in the context of a database sys- tem, including several existing or developing standards [6, 7, 1, 2, 8, 9, 10, 4, 11, 12, 13, 141, the work on query- ing presentations is much more sparse. Most models are either developed without consideration for query- ing issues (and may therefore not be amenable to ef- ficient set-based querying) or are coupled with limited querying facilities. The former class of models includes language-based models, event-based models, and such models as timed or extended petri nets [l].

Other models have been developed in concert with querying facilities to be used in the context of a mul- timedia information system. Among these approaches, the use of abstract data types is common. New data types for the various components of a presentation (video, audio, etc.) are introduced [2]. The type defini- tions include structure and type-specific operations for retrieving presentation components. The type-specific operators are typically called from within a declarative query language. However, this approach limits the op- timization that can be done and may lead to inefficient query executions [15]. The alternative we present here is the use of a declarative query language which permits efficient evaluation and optimization strategies. The approach is motivated by experience with temporal [16] and sequence [15] databases. Indeed, presentations may be modeled using temporal interval-based models where each component of the presentation is mapped to a tem- poral interval [14]. The approach has the advantage of permitting the vast wealth of work on temporal query- ing to be applied to multimedia presentations.

Given this starting point, it is important to clar- ify the differences between multimedia presentations and other forms of temporal data. Temporal models are typically designed to store and manipulate homo- geneous collections of data over an absolute time-line. For example, the time sequence model of [16] models sequences of values for a single entity over the time do- main. Multimedia presentations, however, model het- erogeneous collections of data which are synchronized in time. The intervals of a presentation represent relative temporal ordering, not absolute times. The temporal constraints are relative not absolute. Similar represen- tations, specifically values mapped to intervals (a start and end time stamps) may be used for both. In the case of temporal data, the values may be those of an entity valid for the given interval or time period. In the case of multimedia presentations, the values are multi- media objects to be displayed over the given interval. However, the two types of data differ semantically and

65

Page 3: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

this differences gives rise to different types and frequen- cies of queries.* Consider a few examples. Many of the queries common on temporal databases do not make sense for presentations. These include temporal selec- tions such as “Retrieve the stock trading index on April 13, 1996” or temporal joins such as the super star ex- ample “Retrieve any faculty who are promoted from the assistant professor to full professor while there are some faculty remaining at the associate professor level for over the same time” [16]. Much of the work on tempo- ral database has focused on the specification and study of such operators ( temporal projection, temporal join, etc.) [17]. Temporal constraints on presentations are relative not absolute constraints. Doing temporal joins between presentations are not meaningfult . Similarly, presentations give rise to new usage patterns. Presenta- tions containing user interacts will have objects mapped to intervals of unknown duration. The query “Find all objects that could overlap the music of “Round Mid- night” in any presentation” requires reasoning about indeterminate intervals and leads to access patterns un- common in temporal or sequence data.

1.3 Summary and Organization

We have outlined the requirements for querying of large multimedia presentation libraries. Given these require- ments, a data model together with a set of query oper- ators for multimedia presentations is proposed.

Our focus is on the composing and querying issues for temporal constraints. We do not consider spatial or QoS constraints. However, such constraints can be stored as attributes associate with the presentation or the component media objects.

The details of our model are presented in the follow- ing sections. We start by describing a set of require- ments for a multimedia presentation model in Section 2. In Section 3, a data model for multimedia presenta- tion data is defined. A set of operators for the proposed model is presented in Section 4. Some query examples are described in Section 5. A discussion of issues raised by this work is presented in Section 6.

*As pointed out in [16], it is a similar distinction that has motivated much of the work on temporal databases. While tem- poral data can be modeled using integer attributes in a tradi- tional data model, the semantics of time stamps leads to different query patterns that benefit from new evaluation and optimization techniques.

$Self-joins, that is a join of a presentation with itself based on temporal properties, are meaningful.

2 Model Requirements In this section, we summarize the requirements of a pre- sentation model and the language for multimedia pre- sentation. We justify why the interval-based model is chosen over the others.

A presentation model should support the following features [2].

The model should enable the specification and querying of the 13 temporal constraints [18] and indeterminant constraints [l].

The model should permit the representation of syn- chronization relationships at multiple levels of ab- straction. It should be possible to synchronize en- tire presentations or components of presentations (that is, individual objects).

The model should be easily mapped to existing database management systems.

The query language must support the retrieval of presentations by object properties and by temporal relationships.

The model should be conceptually simple, so that, it is easy to use.

A comparison study of different models can be found in [2, 11. We adopt the interval-based model in our framework due to its strengths which include the fol- lowing.

The interval concept is easier for users to visual- ize and verify. In contrast, language-based models are more procedural than declarative, making them harder to understand, use and debug.

A time interval concept can be represented eas- ily in a DBMS. In contrast, in language-based or Petri-Net models, the temporal constraints are im- plicitly specified in the net structure or in the lan- guage structure. Extracting the objects or con- straints from hierarchical structures and mapping those constraints into a set-oriented database is much more difficult.

The interval model can be extended to specify the indeterminant temporal synchronization con- straints The Petri-Net model cannot easily model the indeterminant relations [l].

An interval-based model can easily specify all the temporal requirements supported in two multime- dia standards Hytime [6, 71 and MHEG [6] ex- cept conditional synchronization constraints ( 2 . e.

66

Page 4: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

if then else type of constraints). Both Hytime and MHEG are language based models in which queries over the presentations and temporal constraints are difficult to express. Our model will be extended to cope with the conditional synchronization con- straints in [19].

0 It is difficult to query language-based models.

Giving the above motivation, we use a model based on intervals. We enhance the model to address both determinant and indeterminant constraints.

In the following sections, we define a generic model for multimedia presentations and a set of operators for this type of data model.

3 Model of Presentations We begin defining an abstract model of presentations that is independent of the types used to define individ- ual presentation objects (such as a video or audio seg- ment). We do this to clearly delineate the important characteristics of presentations, specifically the tempo- ral synchronization constraints.

Let r be a set of types supported by a data model. The set r may be a set of relation schemas in the re- lational model or a set of object types in an object- oriented model. These types are used to represent all characteristics of presentations that are not related to their temporal synchronization (for example, quality of service attributes and the media itself). In the follow- ing, we consider 0 to be a set of presentation objects whose types are in r . For the purposes of our model, a presentation object is any object that can have a dura- tion.

We consider determinant and indeterminant events in our model. Determinant events are represented by elements of the non-negative integers Z+. Indetermi- nant events are represented by symbols from a set U . Let P = (S, S p ) be a partially ordered finite set (do- main) whose elements lie in Z+ or U , S c Z+ U U . We restrict our consideration to orders that respect the or- dering on integers. Specifically, if i l , iz E S are integers then il <p i2 if and only if il 5 iz. We also require that P contain a unique minimal element start(P) that is from 2'. For all s E S, start(P) L p s.

A presentation is composed of media objects each of which maps to an interval that models the duration of the object.

Definition 1 A n interval is a pair of elements (s,e) from a partially ordered domain P such that s < p e together with a duration D(s , e ) E Z+.

~

67

The interval (s,e) is a determinant interval i f s , e E Zf, otherwise it is an indeterminant interval.

Definition 2 A n interval domain I p defined on P is the set of all intervals ( s , e ) such that s,e E P .

Given a set of presentation objects 0, each object can be mapped to an interval or intervals to form a presentation. We model this using interval orderings. Note that an object may map to many intervals and many objects may map to the same interval. The only restriction is that every object in the presentation must map to at least one interval.

Definition 3 Let 0 be a set of objects and let I p be an interval domain. A n interval ordering is a relation L : 0 - I p defined on every element an 0.

We use the notation L ( O ) to denote the set of intervals to which the object o is mapped, ~ ( o ) = i : (o , i ) E 1 ) .

Similarly, ~ ( i ) denotes the set of objects mapped to the interval i, ~ ( i ) = o : (o , i ) E 6 ) .

A presentation is an interval ordering that maps a set of objects to intervals. A presentation is empty if and only if its set of objects 0 is empty. To define the start of a presentation, we require that all non- empty presentations contain at least one object that is synchronized to the minimal element start(P).

Definition 4 A presentation is a triple ( 0 , I p , ~ ) where 0 is a set of objects, I p is an interval domain, L : 0 - I p is an interval ordering of 0 by I p and if 0 # 8 there exists at least one object o E 0 such that ~ ( o ) = ( s tar t (P) ,p) for some (s tart(P) ,p) E l p .

Definition 5 A presentation component of a pre- sentation (0, I p , L ) is a pair (0, i) E L , where o E 0 and i E I p .

Presentations can be represented as rooted trees with a node for each event in P and an edge s + e for each interval ( s , e) to which an object is mapped. An edge is labeled with the objects that map to the corresponding interval.

Definition 6 Given a presentation (0, I p , 1.1, a tem- poral relation tree TRT = (V,E) i s defined as V = P and E = ( ~ 1 , w ) I ( ~ 1 , ~ 2 ) E I p ) and ~ ( ( . u 1 , ~ 2 ) ) # 0. A n edge (VI, w 2 ) is labeled b y ~ ( ( v 1 , v z ) ) .

If P does not contain any indeterminant events, then the set of presentations that can be defined includes the temporal relations definable in the Object Composition Petri-Net model (BCPN) (that is, Allen's 13 relations) [3, 141. For example, as illustrated in Figure 1, a slide

Page 5: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

ry+m Movies

Images Slides '

I I I 1 1 I

( a )

I 10.301 130.351 IO, 351

A3

Sei of ohjects 0 Interval domain I P

Figure 1: The slide show example specified in (a) OCPN and (b) our model (c) Temporal Relation Tree for Do- main P

show presentation in the OCPN model can be easily specified in our presentation model. The total duration of the slide show presentation example is 35, which is the highest domain element used in an interval to which an object is mapped.

In a presentation, every interval must have a non- zero duration. However, given the presence of inde- terminant events, the duration may be indeterminant. Consider the slide show example of Figure 2 that begins with an opening movie to introduce a topic followed by a set of slides. Each slide is replaced by the next in response to a user interaction. The slides are displayed over a background image. At the end of the slides, a closing movie is displayed. The duration of the intro- ductory movie is fixed and the movie is mapped to a determinant interval of this duration (in the figure, the duration is 5 ) . The duration of the slides is indetermi- nant and each is mapped to an indeterminant interval (the interval (5, u1) for Slide S1 and the interval (u2, u3) for Slide S3). The duration of the final movie is also fixed, but its start time (u3) is not. Hence, it is mapped to an indeterminant interval with a fixed duration. The event u4 is indeterminant in that its value cannot be determined until runtime. However, it always occurs a fixed amount of time after the indeterminant event u3. That is, u4 synchronizes with u3 some fixed time units n after 213 takes place. The event u4 is called a depen- dent indeterminant event since it depends on another

I I t I I I TimeLine I b

0 5 U 1 U 2 U3 U4

Temporal Relation Tree

Figure 2: A slide show example with indeterminant in- tervals

indeterminant event.

the following rules. The duration of an interval is defined according to

Definition 7 The duration of an interval i=(s,e), de- noted D ( i ) , must satisfy the following conditions. Here, uo is a special symbol denoting an indeterminant dura- tion.

I f s , e E Z+ then D ( i ) = e - s.

0 Ifs E Z+ and e E U then D ( i ) = UO.

0 I f s E U and e E Z+ then D( i ) = u g .

If s , e E U then D ( i ) E Zf > 0 i f e is dependent on s, and D ( i ) = uo otherwise.

A presentation (0, I p , L ) is an indeterminant presen- tation if there is at least one indeterminant interval i E I p . Otherwise, the presentation is said to be de- terminant. In a determinant presentation, there is a unique maximal element that participates in some in- terval to which object(s) are mapped. This element defines the duration of the presentation. However, the interval domain I p of an indeterminant presentation is a partially ordered domain. If a unique maximal ele- ment m E Z+ exists and it participates in an interval to which object(s) are mapped, then the duration of the presentation is m. If there is not such a maximal event

68

Page 6: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

or the unique maximal event is an indeterminant event then the duration of the presentation is indeterminant.

4 Operators

We now turn to the definition of a query language over our presentation model. We begin by specifying a template for retrieving specified parts of a presenta- tion. Unlike traditional data models, we do not assume that we have a fixed schema over which queries can be specified. In particular, the objects of a presentation may be heterogeneous, containing objects with differ- ent attributes and structure. In addition, in a library of presentations, each presentation may have a structure that is different from other presentations. That is, we cannot, in general define a fixed schema describing the structure of all presentations. Rather each presentation is an arbitrary graph. Our query language is therefore based on a compound operator that expresses “pattern matching” style queries over presentations with hetero- geneous structure. The language is influenced by work on querying semi-structured data [20], such as collec- tions of hypertext documents.

Our goal is to define a query language for brows- ing, retrieving and reusing presentation components. We will therefore restrict our attention to operators that temporally restructure presentation components to form new components. We will not consider oper- ators that modify presentation objects such as projec- tion or join operators on objects. We also assume that the temporal synchronization represented by an object mapping L is a relative one, providing a relative ordering of objects within a presentation.

The operators we define are unary or binary oper- ators that take as input one or two presentations and return a single transformed presentation. The opera- tors transform the objects and the interval ordering. We begin by defining a set of unary selection opera- tors. We use the notation Pin = (Oin,lpin,~in) to denote an input presentation to an operator Op and Op(P;,) = Pout = (OOut,Ipout,~out) to denote the out- put.

4.1 Query Operators

To motivate our query language, we return to some of the examples from the introduction. Consider the fol- lowing query.

Query Q1: Find all video clips played at the same time as any music with a harpsichord.

The audio objects are modeled by presentation ob- jects mapped to a set of intervals in the presentation. It must be possible, in the query language, to select these objects based on their properties. We may select objects based on the result of a content-based operator (such as the audio operator of Q l ) , their QoS charac- teristics, or any of their attributes. This functionality is captured by the selection operator.

4.1.1 Selection

Definition 8 T h e object selection ~4~ takes a selec- t ion predicate 4o defined o n 0 and applies it t o the input presentat ion objects a t each interval i E Ipin. For each interval i and for each o E ~ i ~ ( i ) , ~ $ ~ ( o ) if and only if 0 E Lout (2).

The predicate 4 is any predicate valid on the ob- jects in 0. Notice that with this operator, the selection is done solely based on the non-temporal synchroniza- tion characteristics of the model. Since the objects of a presentation will likely not share identical schemas or types, there is some advantage to using a semi- structured query language for specifying the predicate do [20]. However, this is not required.

In addition, it must be possible to select presentation components based on the properties of an interval. Ex- amples include the queries “Find all components with duration greater than 10” or “Find all components that are synchronized to start at the start of the presentation ( s t a r t (P))” .

Definition 9 T h e interval selection u@i takes a se- lection predicate 4i defined o n I? and applies it to the input presentat ion intervals. For each interval i z f ~ d ; ( i ) t h e n b o u t ( i ) = h i n ( i ) , else ~,,t(i) = 0 .

4.1.2 Match

The Query Q1 requires the ability to perform additional operations on all objects mapped to intervals that are temporally related to the given interval. In the Query Q1, the relation is the temporal overlap relation. Once all specified audio objects are found in the presentation, for each corresponding interval, all overlapping intervals must be found. The temporal relations used in a query may include Allen’s 13 temporal relations [18] or any of the relations over indeterminant intervals, such as “may overlap” enumerated in other proposals [l].

For brevity, we present here only Allen’s temporal re- lations defined by temporal predicates on two intervals Intwl = (sl,el) and Intwz = ( . s2 , e2 ) .

before: Intvl before I n t v ~ if and only if e l 5 s2

69

Page 7: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

0 during : Intvl during Intvz if and only if el 5 e2

and S I 2 s2

0 overlap: Intwl overlap I n t ~ if and only if (1) S I < s2 < e l , (2) s1 < e2 < e l , (3) Intv2 during I n t q or (4) Intvl during Intv2

0 meet: Intvl meet I n t w ~ if and only if el = s2 or e2 = SI

0 parallel-start: Intvl parallel-start Intvz if and only if s1 = sa

0 parallel-end: Intvl parallel-end Intwz if and only if el = e2

0 equal: I n t q equal Intv2 if and only if s1 = s2

and el = e2

We define a general operator that can be used to ex- press Query Q1. We use a selection condition to select a subset of the presentation components that are of inter- est. For each interval to which a selected object maps, we use a function on intervals to define the related in- tervals of interest. To the corresponding presentation components, we apply a final selection condition (for example, “select all video clips” as in Query Q l ) .

The match operator p is parameterized by three functions, g 1 , 6 and UZ. For each presentation compo- nent (0, i ) of gz(Ptn) , the /I operator finds all intervals i‘ that satisfy 6 , (that is, e(i,i’) is true). The operator then returns all presentation components (o’, i’) that are in v2(Pzn). The interval select function 13 is a rel- ative temporal operator, such as meet, overlap, before, after, parallel-start, and parallel-end.$

Definition 10 The match operator p(~l,6,v2) ap- plzes the condstzonal predzcate 8 to the zntervals of

(Pan) and a2(Pa,), returnzng the correspondzng com- ponents of 02(P,,). For each znterval i E Ip,=, zf ( 0 , s )

zs a presentatzon component of al(P,,) and (o ’ ,~ ’ ) zs a presentatzon component of v2 (P,,) and O ( i , z’), then (or , i ’ ) zs a presentatzon component of Pout.

We can express Query Q1 by using the operator p where gl = (gsounds- lzke(’harpszchord‘) , U2 = aMedzaType=”uzdeo” .

= overlap and

4.2 Composition Operators After selecting interesting presentation components, users will need a set of composition operators to edit or integrate the components to form new presentations. Consider the following example:

$Note that match is a restricted form of temporal self semijoin.

Composition Query C1: Compose a new presentation from video presentations P1, P2, P3 and an audio presentation A1 such that P1, P2, P3 are played sequentially, and A1 starts at 5 minutes after P1 starts.

We need a set of compose operators to synchronize presentation components with certain temporal con- straints. To define these operators, we will make use of a shift operator that temporally moves a presenta- tion a specified amount. The shift operator makes it possible to synchronize two presentations. In order to synchronize a presentation after another that has inde- terminant duration, we may need to shift by an inde- terminant amount.

4.2.1 Shift

The shift operator & takes a parameter shifting distance 1 and then shifts the input presentation 1 units. We first define the shift operator for events.

Definition 11 S h i f t l ( s ) (shifts point s b y 1 amount) transforms event s as follows.

0 If s , l E Z+, Sh i f t l ( s ) = s + l ; else, Sh i f t i ( s ) = U .

where U is a new indeterminant event which is different form other transformed events.

Then, we define the shift operator on a presentation as follows.

Definition 12 The shzft operator takes a shift dis- tance 1 as a parameter and applies the function Shi f tl on all points in Pa, for a given input presentation Ptn = (@an, Ip,,, Lzn), and then returns an output pre- sentation Pout = (OoUt, Ipout , L o u t ) such that Oout = O,,, and for each intw = (s,,,e) E Ip,,, there exzsts intwo,t = (Sh i f t l ( s ) , Sh i f t l ( e ) ) E Ipout and D(intv,,t) = D(intw,,).

The shift operator is used with composition opera- tors to synchronize two presentations at a certain point. A set of composition operators are defined in the fol- lowing.

A composition operator is a binary operator which takes input presentations PI = (01, I p l , L I ) and P2 = ( 0 2 , I p 2 , ~ 2 ) and returns an output presentation Pout = (@out , IPout, Lout) .

4.2.2 Basic Compose Operators

The operators Union, Intersection, and Difference com- pose two presentations together by applying set union,

70

Page 8: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

intersection, and difference operations respectively on the two object sets at each interval without altering the interval ordering in input presentations P1 and P2. In the basic compose operators, Ipout = l p l U 1p2 .

0 Union ( U ) For each interval i E Ipout7 h o u t ( i ) =

0 Intersection (n) For each interval i E ,Ipout,

Ll(i) U L Z ( i ) .

L o u t ( i ) = h l ( i ) n h 2 ( i ) .

L o u t ( i ) = Ll(i) - L 2 ( i ) .

0 Difference (-) For each interval i E Ipout ,

The above operators can express queries such as "Re- turn the identical parts of P1 and P2" or "Return differ- ent components of P1 and P2". The interval ordering in interval domain does not change. However, to answer the Query C1, P2 should be shifted first to synchro- nize the P2's start point with Pl's end point, then P1 unioned with the new P 2 to form a new presentation. Most of the composition cases involving temporal con- straints will require both shift and basic compose opera- tors. Hence, we define a set of frequently used compose operators in the following subsection.

4.2.3 Compose Operators

The operators defined here allow users to specify Allen's 13 relations between two presentations. We do not di- rectly support operators for after, during, and equal re- lations because after can be modeled by before; during can be modeled by overlap; equal can be covered by parallel-begin and parallel-end.

Given presentations PI and Pz, the following opera- tors synchronize PZ with PI.

0 Compose PI before PZ by 1 = P ~ U ( ' D ( P ~ ) + ~ ( P Z ) .

0 Compose PI meet Pz at point a = PI U &(Pz)

0 Compose PI parallel-begin PZ = PI U Pz.

0 Compose PI parallel-end Pz = PI U ' b ( P 1 ) - D ( P 2 ) ( p 2 ) , if D(P1) and D(P2) are deter- minant and D(P1) < D(P2).

0 Compose PI overlap Pz by I = P~U( 'D(P~)-L(PZ) , if D(P1) is determinant.

The query C l can then be answered by the following series of operations

P1' = Compose(P1 meet A1 at point 5 mins) P2' = Compose(P2 be f o r e P3 by 0 ) Pout = Compose(P1' b e f o r e P2' by 0 )

t I

1 P,' U P,"

I T f

Presentation Database

Figure 3: Query processing graph for temporal and mul- timedia content-based query example

5 Query Examples Some query examples are given in this section to demon- strate how queries are answered using the proposed op- erators.

0 To identify an abnormal tumor pattern, a medi- cal researcher might want to browse through the database to see how many similar cases existed. The following query can be made: "Find all the presentations which contained X-ray images sim- ilar to A followed by images similar to B; fol- lowed by images similar to C." The query can be answered by the query graph Figure 3. The "method' in the query process graph can invoke the access method such as " patternlshape recogni- tion method" during processing.

0 To prepare a lecture for "Web page design in- troduction", a lecturer may wish to compose a video/audio presentation including the top- ics "Object-Oriented", "HTML language", and "Java" from the existing presentation library with hislher own introduction clip PI. The following query can help the person compose herlhis pre- sentation with components whose duration is less than 30 minutes. "Find all the objects related to 'Object-Oriented', 'HTML language' or 'Java', with duration less than 30 minutes; then com-

71

Page 9: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

Figure 4: Query processing graph for query and com- position type of query example

pose them as PI meets ” Object-Oriented concept” meets ”HTML language” meets ”Java”. See the query processing graph of Figure 4.

6 Conclusions In this work, we considered the querying of large mul- timedia presentation libraries. We have identified the importance of an integrated composition and query ca- pability in such libraries. The proposed data model is based on a representation of temporal synchronization using intervals, an approach which is used in some tem- poral database models and authoring models. We en- hanced the model to address both determinant and in- determinant constraints. The proposed model is declar- ative rather than procedural making it easier for users to understand, use and debug. The model is a log- ical one that can be mapped to any physical data model. This permits the storage of presentations in any database management system (DBMS) including object-oriented or relational DBMS. The proposed op- erators build on the query operators from the tradi- tional database models, sequence data models [15], and temporal data models[l6]. Some query processing and optimization techniques developed in those models can

be applied to our model. We are currently investigat- ing the use of these techniques for presentations. Notice that results on stream processing for temporal and se- quence models [2l, 16, 221 rely on two facts that are not true of presentations. First, the operators are spec- ified over totally ordered domains (rather than the par- tially ordered determinant and indeterminant events of our model). Second, certain operators require only a fixed, bounded portion of the input data at once to compute a response. These two differences make ef- ficient query processing over sets of presentations an interesting, challenging area for future investigation.

The queries we have described are tailored to the task of providing more flexible access to a presenta- tion library, mainly for individual users. In addition to this browsing style of querying, a presentation library must also support more advanced query features includ- ing decision support queries and data mining. Due to their cost and complexity, it will be imperative that administrators are able to justify the expense, in time and money, of supporting multimedia libraries. Deci- sion support type queries facilitate the determination of aggregate usage statistics and other characteristics of the entire data set. An example would be a query to determine which types of media objects are reused most often for diverse tasks. Additional complex queries for discovering data patterns will be required over presen- tation databases. Examples include queries to find sets of objects that are commonly (or with a specific proba- bility) used together or to find sets of objects appearing commonly in the same temporal sequence [23, 241. We are currently extending our query language to handle aggregate, decision support style queries.

References

[l] M. J. Pkrez-Luque and T. D. C. Little, “A tem- poral reference framework for multimedia syn- chronization,” IEEE Journal on Selected Areas in Communications, January 1996.

[2] G. Blakoowski and R. Steinmetz, “A media syn- chronization survey: Reference model, specifica- tion, and case studies,” IEEE Journal on Selected Areas in Communications, January 1996.

[3] T . D. C. Little and A. Ghafoor, “Multimedia ob- ject models for database and synchronization,” in Proc. of the Int ’I Conf. on Data Engineering, Feb. 1990.

72

Page 10: [IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

R. Weiss, A. Duda, and D. K. Gifford, “Composi- tion and Search with a Video Algebra,” in IEEE Multimedia, pp. 12-25, Spring 1995.

W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Falout- SOS, and G. Taubin, “The QBIC Project: Query- ing Images By Content Using Color, Texture, and Shape,” in SPIE, vol. 1908, pp. 173-187, 1993.

B. D. Markey, “Emerging hypermedia standards - hypermedia marketplace prepa,res for HyTime and MHEG,” in Proceedings of USENIX, pp. 59-74, 1991.

S. R. Newcomb, N. A. Kipp, and V. T . Newcomb, “HyTime - The Hypermedia/Time-based Docu- ment Structuring Language,” Communications of the ACM, vol. 34, no. 11, pp. 67-83, 1991.

R. Hamakawa and J. Rekimoto, “Object composi- tion and playback models for handling multimedia data,” in Proceedings of ACM Conf. on Multime- dia, (Anaheim CA), pp. 273-281, aug 1993.

S. B. Eun, E. S. No, H. C. Kim, H. Yoon, and S. R. Maeng, “Specification of Multimedia Composition and a Visual Programming Environment ,” in Pro- ceedings of ACM Conf. on Multimedia, (Anaheim, CA), pp. 167-173, Aug. 1993.

R. Weiss, A. Duda, and D. K. Gifford, “Content- Based Access to Algebraic Video,” in Proc. of the International Conf. on Multimedia Computing and Systems, (Boston, MA), pp. 140-151, IEEE Com- puter Society Press, May 1994.

M. Iino, Y. F. Day, and A. Ghafoor, “An Object- Oriented Model for Spatio-Temporal Synchroniza- tion of Multimedia Information,” in Proc. of the International Conf. on Multimedia Computing and Systems, (Boston, MA), pp. 110-119, IEEE Com- puter Society Press, May 1994.

G. A. Schloss and M. J. Wynblatt, “Building temporal structures in a layered multimedia data model,” in Proceedings of ACM Conf. on Multime- dia, (San Francisco, CA), pp. 271-278, oct 1994.

K. Aberer and W. Klass, “Supporting Tempo- ral Multimedia Operations in Object-Oriented Database Systems,” in Proc. of the International Conf. on Multimedia Computing and Systems, (Boston, MA), pp. 352-361, may 1994.

[14] T. D. C. Little and A. Ghafoor, “Interval-based conceptual models for time-dependent multimedia data,” IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 551-563, Aug. 1993.

[15] P. Seshadri, M. Livny, and R. Ramakrishnan, “SEQ: Design and Implementation of a Sequence Database System,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, (Montreal, Canada), .June 1996.

[16] C. T . Y. Leung and R. R. Muntz, Temporal Databases, Theory, Design, and Implementation. Benjamin/Cummings, 1993.

[17] R. Snodgrass, “TSQL2 language specification,” in Report of a n Invited ARPA/NSF Workshop, March 1 1994.

[18] J. F. Allen, “Maintaining knowledge about tem- poral intervals,” Communications of the ACM, vol. 26, pp. 832-843, November 1983.

[19] C.-H. Wu, R. J. Miller, and M. T. Liu, “Multi- media Presentations System,” in Working paper, (Ohio State University), 1997.

[20] P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu, “A Query Language and Optimization Techniques for Unstructured Data,” in Proc. of the ACM SIGMOD Int ’1 Conf. on Management of Data, (Montrkal, Canada), May 1996.

[21] T. Y. Leung and R. Muntz, “Query Processing for Temporal Databases,” in Proceedings of the 6th In- ternational Conference on Data Engineering, Feb. 1990.

[22] P. Seshadri, M. Livny, and R. Ramakrishnan, “Se- quence Query Processing,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, May 1994.

[23] R. Agravval, T . Imielinksi, and A. Swami, ‘‘Database Mining: A Performance Perspective,” IEEE Transactions on Knowledge and Data Engi- neering, vol. 5, pp. 914-925, Dec. 1993. Special issue on Learning and Discovery in Knowledge- Based Databases.

[24] R. J. Miller and Y. Yang, “Association Rules over Interval Data,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, (Tuscon, AZ), May 1997.

73