[IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference

Querying Multimedia Presentations

Chao-Hui Wu, Renke J. Miller and Ming T. Liu

Department of Computer and Information Science The Ohio State University

Columbus, Ohio 43210-1277 E-mail: wuc, rjmiller, [email protected]

Abstract

W e examine the querying requirements of large libraries of mul t imedia presentations. G iven these requirements, w e examine work o n querying temporal and sequence data, and identzfy the similarities and dissimilarities between presentations and these other f o r m s of data. From this analysis, we propose a n integrated composi- t i on and query capability t o permi t t he reuse o f mul t i - media objects, presentations and presenta t ion segments. T h e query faci l i ty permi ts content and attribute based queries along wi th queries over temporal synchroniza- t i on characteristics. T h e m a i n contributions of our work are: it addresses both de terminant and indeter- m i n a n t intervals; it permi ts querying over presenta- t i on libraries wi th heterogeneous structure; it builds o n work f r o m temporal and sequence database to address the unique semant ics of presentations.

1 Introduction

1.1 Motivation

The advent of multimedia technologies has led to an ex- plosion in the use of multimedia to communicate ideas. One important and useful application which has gener- ated much interest in the research community is multimedia presentation and authoring systems. A multimedia presentation and authoring system provides users with a model or a tool to create their own presentations

chronization constraints on presentations. Examples of multimedia presentations include news broadcasts or university lectures where video clips might be accompa- nied by images, audio, close-captioning, etc. The components of a presentation must be synchronized both spatially and temporally. Presentation models are used to represent spatial and temporal relationships together

and to specify the Quality of Service (QoS) and syn-

with quality of service characteristics of both the presentation and its components [l, 21.

A primary role of presentation models is to enable the specification of these synchronization constraints and to facilitate the retrieval and play back of presentations [3, 11. An equally important role is to enable the browsing and searching of presentation libraries to support the reuse of presentations and presentation components. When users compose a new presentation, they might want to browse through the existing presentation database and extract some useful presentation components. By providing efficient composition and query operators, the objects in the system can be fully uti- lized, the relations between objects can be well doc- umented and maintained, and users can save time in creating new presentations. For example, in a library of university lectures, an instructor may wish to find video clips demonstrating a specific volume rendering technique. To get ideas on how to introduce or ex- plain the demonstration, the instructor may also wish to retrieve all objects synchronized, within a presentation, to such a video. Alternatively, a user may wish to retrieve presentations based on the synchronization or temporal relationships illustrated by the presentation. A presentation author may wish to retrieve portions of presentations that illustrate how multiple video clips may be simultaneously played with at least one audio segment. It is our thesis that a multimedia presentation and authoring system should provide an integrated, flexible composition and query capability. It must be possible to issue complex queries and use query resulte directly to create new presentations.

To understand the querying requirements of presentations, it is instructive to consider the basic characteristics of presentations. Heterogeneous objects (images, video, sound, text, etc.) may be combined to form a single presentation. These objects may have different attributes and accessing methods. The presentation

0-8186-7916-6197 $10.00 0 1997 IEEE 64

mailto:[email protected]

specification must include the temporal relationships between the presentation components. Some media objects in presentations will need to be synchronized with outside events such as a user interaction. Consider a slide show presentation as an example. Each image is displayed for an interval of time that is determined by the speaker. As the presentation progresses, the speaker determines when to terminate the display of each slide. The actual temporal relations between objects in a presentation may therefore be unpredictable until run time. Such temporal relations are called inde terminate temporal relations [l]. Queries must be permitted over determinant (fixed) and indeterminate relations alike.

To support the retrieval and reuse of presentations and presentation components, a multimedia presentation and authoring system must support ad hoc querying including the following types of queries.

Attribute based queries. Most presentation models permit the association of attributes, including text and numerical attributes, with presentation components. Queries over these attributes, which may represent features extracted from the multimedia objects, are common [4]. Example queries include, “Select all the presentations related to Vir- tual Reality” or “Show the components related to Vir tual Reality in the presentation library”. These queries involve attributes of the presentation or objects within the presentations.

Multimedia content based queries. To query the content of multimedia objects within presentations, special operators such as the QBIC operators [5], which permit queries over color composition and other image or media characteristics, may be used. For example, the query “Select all the images which contain round shapes (abnormal tumors) as shown in image X” will involve special query operators, such as pattern recognition operators.

Temporal queries. Presentation queries may involve the synchronization constraints within a presentation. The query “Select all the video clips played while (overlapping) Ravel’s Bolero is play- ing” may help in analyzing how this piece of music is used. The query “Find all the images with the same subject played right before and right after a particular image”’ might be helpful for users wish- ing to find any related images with the database. These queries involve reasoning about temporal relationships between objects.

1.2 Current Approaches

Despite the extensive work done on specifying multimedia presentations in the context of a database system, including several existing or developing standards [6, 7, 1, 2, 8, 9, 10, 4, 11, 12, 13, 141, the work on querying presentations is much more sparse. Most models are either developed without consideration for querying issues (and may therefore not be amenable to efficient set-based querying) or are coupled with limited querying facilities. The former class of models includes language-based models, event-based models, and such models as timed or extended petri nets [l].

Other models have been developed in concert with querying facilities to be used in the context of a multimedia information system. Among these approaches, the use of abstract data types is common. New data types for the various components of a presentation (video, audio, etc.) are introduced [2]. The type defini- tions include structure and type-specific operations for retrieving presentation components. The type-specific operators are typically called from within a declarative query language. However, this approach limits the optimization that can be done and may lead to inefficient query executions [15]. The alternative we present here is the use of a declarative query language which permits efficient evaluation and optimization strategies. The approach is motivated by experience with temporal [16] and sequence [15] databases. Indeed, presentations may be modeled using temporal interval-based models where each component of the presentation is mapped to a temporal interval [14]. The approach has the advantage of permitting the vast wealth of work on temporal querying to be applied to multimedia presentations.

Given this starting point, it is important to clar- ify the differences between multimedia presentations and other forms of temporal data. Temporal models are typically designed to store and manipulate homo- geneous collections of data over an absolute time-line. For example, the time sequence model of [16] models sequences of values for a single entity over the time domain. Multimedia presentations, however, model heterogeneous collections of data which are synchronized in time. The intervals of a presentation represent relative temporal ordering, not absolute times. The temporal constraints are relative not absolute. Similar represen- tations, specifically values mapped to intervals (a start and end time stamps) may be used for both. In the case of temporal data, the values may be those of an entity valid for the given interval or time period. In the case of multimedia presentations, the values are multimedia objects to be displayed over the given interval. However, the two types of data differ semantically and

65

this differences gives rise to different types and frequen- cies of queries.* Consider a few examples. Many of the queries common on temporal databases do not make sense for presentations. These include temporal selec- tions such as “Retrieve the stock trading index on April 13, 1996” or temporal joins such as the super star example “Retrieve any faculty who are promoted from the assistant professor to full professor while there are some faculty remaining at the associate professor level for over the same time” [16]. Much of the work on temporal database has focused on the specification and study of such operators ( temporal projection, temporal join, etc.) [17]. Temporal constraints on presentations are relative not absolute constraints. Doing temporal joins between presentations are not meaningfult . Similarly, presentations give rise to new usage patterns. Presenta- tions containing user interacts will have objects mapped to intervals of unknown duration. The query “Find all objects that could overlap the music of “Round Mid- night” in any presentation” requires reasoning about indeterminate intervals and leads to access patterns un- common in temporal or sequence data.

1.3 Summary and Organization

We have outlined the requirements for querying of large multimedia presentation libraries. Given these requirements, a data model together with a set of query operators for multimedia presentations is proposed.

Our focus is on the composing and querying issues for temporal constraints. We do not consider spatial or QoS constraints. However, such constraints can be stored as attributes associate with the presentation or the component media objects.

The details of our model are presented in the following sections. We start by describing a set of requirements for a multimedia presentation model in Section 2. In Section 3, a data model for multimedia presentation data is defined. A set of operators for the proposed model is presented in Section 4. Some query examples are described in Section 5. A discussion of issues raised by this work is presented in Section 6.

*As pointed out in [16], it is a similar distinction that has motivated much of the work on temporal databases. While temporal data can be modeled using integer attributes in a traditional data model, the semantics of time stamps leads to different query patterns that benefit from new evaluation and optimization techniques.

$Self-joins, that is a join of a presentation with itself based on temporal properties, are meaningful.

2 Model Requirements In this section, we summarize the requirements of a presentation model and the language for multimedia presentation. We justify why the interval-based model is chosen over the others.

A presentation model should support the following features [2].

The model should enable the specification and querying of the 13 temporal constraints [18] and indeterminant constraints [l].

The model should permit the representation of synchronization relationships at multiple levels of ab- straction. It should be possible to synchronize entire presentations or components of presentations (that is, individual objects).

The model should be easily mapped to existing database management systems.

The query language must support the retrieval of presentations by object properties and by temporal relationships.

The model should be conceptually simple, so that, it is easy to use.

A comparison study of different models can be found in [2, 11. We adopt the interval-based model in our framework due to its strengths which include the following.

The interval concept is easier for users to visual- ize and verify. In contrast, language-based models are more procedural than declarative, making them harder to understand, use and debug.

A time interval concept can be represented easily in a DBMS. In contrast, in language-based or Petri-Net models, the temporal constraints are im- plicitly specified in the net structure or in the language structure. Extracting the objects or constraints from hierarchical structures and mapping those constraints into a set-oriented database is much more difficult.

The interval model can be extended to specify the indeterminant temporal synchronization constraints The Petri-Net model cannot easily model the indeterminant relations [l].

An interval-based model can easily specify all the temporal requirements supported in two multimedia standards Hytime [6, 71 and MHEG [6] ex- cept conditional synchronization constraints ( 2 . e.

66

if then else type of constraints). Both Hytime and MHEG are language based models in which queries over the presentations and temporal constraints are difficult to express. Our model will be extended to cope with the conditional synchronization constraints in [19].

0 It is difficult to query language-based models.

Giving the above motivation, we use a model based on intervals. We enhance the model to address both determinant and indeterminant constraints.

In the following sections, we define a generic model for multimedia presentations and a set of operators for this type of data model.

3 Model of Presentations We begin defining an abstract model of presentations that is independent of the types used to define individual presentation objects (such as a video or audio segment). We do this to clearly delineate the important characteristics of presentations, specifically the temporal synchronization constraints.

Let r be a set of types supported by a data model. The set r may be a set of relation schemas in the relational model or a set of object types in an object- oriented model. These types are used to represent all characteristics of presentations that are not related to their temporal synchronization (for example, quality of service attributes and the media itself). In the following, we consider 0 to be a set of presentation objects whose types are in r . For the purposes of our model, a presentation object is any object that can have a duration.

We consider determinant and indeterminant events in our model. Determinant events are represented by elements of the non-negative integers Z+. Indetermi- nant events are represented by symbols from a set U . Let P = (S, S p ) be a partially ordered finite set (domain) whose elements lie in Z+ or U , S c Z+ U U . We restrict our consideration to orders that respect the ordering on integers. Specifically, if i l , iz E S are integers then il <p i2 if and only if il 5 iz. We also require that P contain a unique minimal element start(P) that is from 2'. For all s E S, start(P) L p s.

A presentation is composed of media objects each of which maps to an interval that models the duration of the object.

Definition 1 A n interval is a pair of elements (s,e) from a partially ordered domain P such that s < p e together with a duration D(s , e ) E Z+.

~

67

The interval (s,e) is a determinant interval i f s , e E Zf, otherwise it is an indeterminant interval.

Definition 2 A n interval domain I p defined on P is the set of all intervals ( s , e ) such that s,e E P .

Given a set of presentation objects 0, each object can be mapped to an interval or intervals to form a presentation. We model this using interval orderings. Note that an object may map to many intervals and many objects may map to the same interval. The only restriction is that every object in the presentation must map to at least one interval.

Definition 3 Let 0 be a set of objects and let I p be an interval domain. A n interval ordering is a relation L : 0 - I p defined on every element an 0.

We use the notation L ( O ) to denote the set of intervals to which the object o is mapped, ~ ( o ) = i : (o , i ) E 1 ) .

Similarly, ~ ( i ) denotes the set of objects mapped to the interval i, ~ ( i ) = o : (o , i ) E 6 ) .

A presentation is an interval ordering that maps a set of objects to intervals. A presentation is empty if and only if its set of objects 0 is empty. To define the start of a presentation, we require that all non- empty presentations contain at least one object that is synchronized to the minimal element start(P).

Definition 4 A presentation is a triple ( 0 , I p , ~ ) where 0 is a set of objects, I p is an interval domain, L : 0 - I p is an interval ordering of 0 by I p and if 0 # 8 there exists at least one object o E 0 such that ~ ( o ) = ( s tar t (P) ,p) for some (s tart(P) ,p) E l p .

Definition 5 A presentation component of a presentation (0, I p , L ) is a pair (0, i) E L , where o E 0 and i E I p .

Presentations can be represented as rooted trees with a node for each event in P and an edge s + e for each interval ( s , e) to which an object is mapped. An edge is labeled with the objects that map to the corresponding interval.

Definition 6 Given a presentation (0, I p , 1.1, a temporal relation tree TRT = (V,E) i s defined as V = P and E = ( ~ 1 , w ) I ( ~ 1 , ~ 2 ) E I p ) and ~ ( ( . u 1 , ~ 2 ) ) # 0. A n edge (VI, w 2 ) is labeled b y ~ ( ( v 1 , v z ) ) .

If P does not contain any indeterminant events, then the set of presentations that can be defined includes the temporal relations definable in the Object Composition Petri-Net model (BCPN) (that is, Allen's 13 relations) [3, 141. For example, as illustrated in Figure 1, a slide

ry+m Movies

Images Slides '

I I I 1 1 I

( a )

I 10.301 130.351 IO, 351

A3

Sei of ohjects 0 Interval domain I P

Figure 1: The slide show example specified in (a) OCPN and (b) our model (c) Temporal Relation Tree for Do- main P

show presentation in the OCPN model can be easily specified in our presentation model. The total duration of the slide show presentation example is 35, which is the highest domain element used in an interval to which an object is mapped.

In a presentation, every interval must have a non- zero duration. However, given the presence of indeterminant events, the duration may be indeterminant. Consider the slide show example of Figure 2 that begins with an opening movie to introduce a topic followed by a set of slides. Each slide is replaced by the next in response to a user interaction. The slides are displayed over a background image. At the end of the slides, a closing movie is displayed. The duration of the intro- ductory movie is fixed and the movie is mapped to a determinant interval of this duration (in the figure, the duration is 5 ) . The duration of the slides is indeterminant and each is mapped to an indeterminant interval (the interval (5, u1) for Slide S1 and the interval (u2, u3) for Slide S3). The duration of the final movie is also fixed, but its start time (u3) is not. Hence, it is mapped to an indeterminant interval with a fixed duration. The event u4 is indeterminant in that its value cannot be determined until runtime. However, it always occurs a fixed amount of time after the indeterminant event u3. That is, u4 synchronizes with u3 some fixed time units n after 213 takes place. The event u4 is called a dependent indeterminant event since it depends on another

I I t I I I TimeLine I b

0 5 U 1 U 2 U3 U4

Temporal Relation Tree

Figure 2: A slide show example with indeterminant intervals

indeterminant event.

the following rules. The duration of an interval is defined according to

Definition 7 The duration of an interval i=(s,e), de- noted D ( i ) , must satisfy the following conditions. Here, uo is a special symbol denoting an indeterminant duration.

I f s , e E Z+ then D ( i ) = e - s.

0 Ifs E Z+ and e E U then D ( i ) = UO.

0 I f s E U and e E Z+ then D( i ) = u g .

If s , e E U then D ( i ) E Zf > 0 i f e is dependent on s, and D ( i ) = uo otherwise.

A presentation (0, I p , L ) is an indeterminant presentation if there is at least one indeterminant interval i E I p . Otherwise, the presentation is said to be determinant. In a determinant presentation, there is a unique maximal element that participates in some interval to which object(s) are mapped. This element defines the duration of the presentation. However, the interval domain I p of an indeterminant presentation is a partially ordered domain. If a unique maximal element m E Z+ exists and it participates in an interval to which object(s) are mapped, then the duration of the presentation is m. If there is not such a maximal event

68

or the unique maximal event is an indeterminant event then the duration of the presentation is indeterminant.

4 Operators

We now turn to the definition of a query language over our presentation model. We begin by specifying a template for retrieving specified parts of a presentation. Unlike traditional data models, we do not assume that we have a fixed schema over which queries can be specified. In particular, the objects of a presentation may be heterogeneous, containing objects with different attributes and structure. In addition, in a library of presentations, each presentation may have a structure that is different from other presentations. That is, we cannot, in general define a fixed schema describing the structure of all presentations. Rather each presentation is an arbitrary graph. Our query language is therefore based on a compound operator that expresses “pattern matching” style queries over presentations with heterogeneous structure. The language is influenced by work on querying semi-structured data [20], such as collections of hypertext documents.

Our goal is to define a query language for browsing, retrieving and reusing presentation components. We will therefore restrict our attention to operators that temporally restructure presentation components to form new components. We will not consider operators that modify presentation objects such as projection or join operators on objects. We also assume that the temporal synchronization represented by an object mapping L is a relative one, providing a relative ordering of objects within a presentation.

The operators we define are unary or binary operators that take as input one or two presentations and return a single transformed presentation. The operators transform the objects and the interval ordering. We begin by defining a set of unary selection operators. We use the notation Pin = (Oin,lpin,~in) to denote an input presentation to an operator Op and Op(P;,) = Pout = (OOut,Ipout,~out) to denote the output.

4.1 Query Operators

To motivate our query language, we return to some of the examples from the introduction. Consider the following query.

Query Q1: Find all video clips played at the same time as any music with a harpsichord.

The audio objects are modeled by presentation objects mapped to a set of intervals in the presentation. It must be possible, in the query language, to select these objects based on their properties. We may select objects based on the result of a content-based operator (such as the audio operator of Q l ) , their QoS characteristics, or any of their attributes. This functionality is captured by the selection operator.

4.1.1 Selection

Definition 8 T h e object selection ~4~ takes a selec- t ion predicate 4o defined o n 0 and applies it t o the input presentat ion objects a t each interval i E Ipin. For each interval i and for each o E ~ i ~ ( i ) , ~ $ ~ ( o ) if and only if 0 E Lout (2).

The predicate 4 is any predicate valid on the objects in 0. Notice that with this operator, the selection is done solely based on the non-temporal synchronization characteristics of the model. Since the objects of a presentation will likely not share identical schemas or types, there is some advantage to using a semi- structured query language for specifying the predicate do [20]. However, this is not required.

In addition, it must be possible to select presentation components based on the properties of an interval. Ex- amples include the queries “Find all components with duration greater than 10” or “Find all components that are synchronized to start at the start of the presentation ( s t a r t (P))” .

Definition 9 T h e interval selection u@i takes a selection predicate 4i defined o n I? and applies it to the input presentat ion intervals. For each interval i z f ~ d ; ( i ) t h e n b o u t ( i ) = h i n ( i ) , else ~,,t(i) = 0 .

4.1.2 Match

The Query Q1 requires the ability to perform additional operations on all objects mapped to intervals that are temporally related to the given interval. In the Query Q1, the relation is the temporal overlap relation. Once all specified audio objects are found in the presentation, for each corresponding interval, all overlapping intervals must be found. The temporal relations used in a query may include Allen’s 13 temporal relations [18] or any of the relations over indeterminant intervals, such as “may overlap” enumerated in other proposals [l].

For brevity, we present here only Allen’s temporal relations defined by temporal predicates on two intervals Intwl = (sl,el) and Intwz = ( . s2 , e2 ) .

before: Intvl before I n t v ~ if and only if e l 5 s2

69

0 during : Intvl during Intvz if and only if el 5 e2

and S I 2 s2

0 overlap: Intwl overlap I n t ~ if and only if (1) S I < s2 < e l , (2) s1 < e2 < e l , (3) Intv2 during I n t q or (4) Intvl during Intv2

0 meet: Intvl meet I n t w ~ if and only if el = s2 or e2 = SI

0 parallel-start: Intvl parallel-start Intvz if and only if s1 = sa

0 parallel-end: Intvl parallel-end Intwz if and only if el = e2

0 equal: I n t q equal Intv2 if and only if s1 = s2

and el = e2

We define a general operator that can be used to express Query Q1. We use a selection condition to select a subset of the presentation components that are of interest. For each interval to which a selected object maps, we use a function on intervals to define the related intervals of interest. To the corresponding presentation components, we apply a final selection condition (for example, “select all video clips” as in Query Q l ) .

The match operator p is parameterized by three functions, g 1 , 6 and UZ. For each presentation component (0, i ) of gz(Ptn) , the /I operator finds all intervals i‘ that satisfy 6 , (that is, e(i,i’) is true). The operator then returns all presentation components (o’, i’) that are in v2(Pzn). The interval select function 13 is a relative temporal operator, such as meet, overlap, before, after, parallel-start, and parallel-end.$

Definition 10 The match operator p(~l,6,v2) ap- plzes the condstzonal predzcate 8 to the zntervals of

(Pan) and a2(Pa,), returnzng the correspondzng components of 02(P,,). For each znterval i E Ip,=, zf ( 0 , s )

zs a presentatzon component of al(P,,) and (o ’ ,~ ’ ) zs a presentatzon component of v2 (P,,) and O ( i , z’), then (or , i ’ ) zs a presentatzon component of Pout.

We can express Query Q1 by using the operator p where gl = (gsounds- lzke(’harpszchord‘) , U2 = aMedzaType=”uzdeo” .

= overlap and

4.2 Composition Operators After selecting interesting presentation components, users will need a set of composition operators to edit or integrate the components to form new presentations. Consider the following example:

$Note that match is a restricted form of temporal self semijoin.

Composition Query C1: Compose a new presentation from video presentations P1, P2, P3 and an audio presentation A1 such that P1, P2, P3 are played sequentially, and A1 starts at 5 minutes after P1 starts.

We need a set of compose operators to synchronize presentation components with certain temporal constraints. To define these operators, we will make use of a shift operator that temporally moves a presentation a specified amount. The shift operator makes it possible to synchronize two presentations. In order to synchronize a presentation after another that has indeterminant duration, we may need to shift by an indeterminant amount.

4.2.1 Shift

The shift operator & takes a parameter shifting distance 1 and then shifts the input presentation 1 units. We first define the shift operator for events.

Definition 11 S h i f t l ( s ) (shifts point s b y 1 amount) transforms event s as follows.

0 If s , l E Z+, Sh i f t l ( s ) = s + l ; else, Sh i f t i ( s ) = U .

where U is a new indeterminant event which is different form other transformed events.

Then, we define the shift operator on a presentation as follows.

Definition 12 The shzft operator takes a shift distance 1 as a parameter and applies the function Shi f tl on all points in Pa, for a given input presentation Ptn = (@an, Ip,,, Lzn), and then returns an output presentation Pout = (OoUt, Ipout , L o u t ) such that Oout = O,,, and for each intw = (s,,,e) E Ip,,, there exzsts intwo,t = (Sh i f t l ( s ) , Sh i f t l ( e ) ) E Ipout and D(intv,,t) = D(intw,,).

The shift operator is used with composition operators to synchronize two presentations at a certain point. A set of composition operators are defined in the following.

A composition operator is a binary operator which takes input presentations PI = (01, I p l , L I ) and P2 = ( 0 2 , I p 2 , ~ 2 ) and returns an output presentation Pout = (@out , IPout, Lout) .

4.2.2 Basic Compose Operators

The operators Union, Intersection, and Difference compose two presentations together by applying set union,

70

intersection, and difference operations respectively on the two object sets at each interval without altering the interval ordering in input presentations P1 and P2. In the basic compose operators, Ipout = l p l U 1p2 .

0 Union ( U ) For each interval i E Ipout7 h o u t ( i ) =

0 Intersection (n) For each interval i E ,Ipout,

Ll(i) U L Z ( i ) .

L o u t ( i ) = h l ( i ) n h 2 ( i ) .

L o u t ( i ) = Ll(i) - L 2 ( i ) .

0 Difference (-) For each interval i E Ipout ,

The above operators can express queries such as "Re- turn the identical parts of P1 and P2" or "Return different components of P1 and P2". The interval ordering in interval domain does not change. However, to answer the Query C1, P2 should be shifted first to synchronize the P2's start point with Pl's end point, then P1 unioned with the new P 2 to form a new presentation. Most of the composition cases involving temporal constraints will require both shift and basic compose operators. Hence, we define a set of frequently used compose operators in the following subsection.

4.2.3 Compose Operators

The operators defined here allow users to specify Allen's 13 relations between two presentations. We do not directly support operators for after, during, and equal relations because after can be modeled by before; during can be modeled by overlap; equal can be covered by parallel-begin and parallel-end.

Given presentations PI and Pz, the following operators synchronize PZ with PI.

0 Compose PI before PZ by 1 = P ~ U ( ' D ( P ~ ) + ~ ( P Z ) .

0 Compose PI meet Pz at point a = PI U &(Pz)

0 Compose PI parallel-begin PZ = PI U Pz.

0 Compose PI parallel-end Pz = PI U ' b ( P 1 ) - D ( P 2 ) ( p 2 ) , if D(P1) and D(P2) are determinant and D(P1) < D(P2).

0 Compose PI overlap Pz by I = P~U( 'D(P~)-L(PZ) , if D(P1) is determinant.

The query C l can then be answered by the following series of operations

P1' = Compose(P1 meet A1 at point 5 mins) P2' = Compose(P2 be f o r e P3 by 0 ) Pout = Compose(P1' b e f o r e P2' by 0 )

t I

1 P,' U P,"

I T f

Presentation Database

Figure 3: Query processing graph for temporal and multimedia content-based query example

5 Query Examples Some query examples are given in this section to demon- strate how queries are answered using the proposed operators.

0 To identify an abnormal tumor pattern, a medi- cal researcher might want to browse through the database to see how many similar cases existed. The following query can be made: "Find all the presentations which contained X-ray images similar to A followed by images similar to B; followed by images similar to C." The query can be answered by the query graph Figure 3. The "method' in the query process graph can invoke the access method such as " patternlshape recognition method" during processing.

0 To prepare a lecture for "Web page design introduction", a lecturer may wish to compose a video/audio presentation including the top- ics "Object-Oriented", "HTML language", and "Java" from the existing presentation library with hislher own introduction clip PI. The following query can help the person compose herlhis presentation with components whose duration is less than 30 minutes. "Find all the objects related to 'Object-Oriented', 'HTML language' or 'Java', with duration less than 30 minutes; then com-

71

Figure 4: Query processing graph for query and composition type of query example

pose them as PI meets ” Object-Oriented concept” meets ”HTML language” meets ”Java”. See the query processing graph of Figure 4.

6 Conclusions In this work, we considered the querying of large multimedia presentation libraries. We have identified the importance of an integrated composition and query capability in such libraries. The proposed data model is based on a representation of temporal synchronization using intervals, an approach which is used in some temporal database models and authoring models. We en- hanced the model to address both determinant and indeterminant constraints. The proposed model is declarative rather than procedural making it easier for users to understand, use and debug. The model is a log- ical one that can be mapped to any physical data model. This permits the storage of presentations in any database management system (DBMS) including object-oriented or relational DBMS. The proposed operators build on the query operators from the traditional database models, sequence data models [15], and temporal data models[l6]. Some query processing and optimization techniques developed in those models can

be applied to our model. We are currently investigat- ing the use of these techniques for presentations. Notice that results on stream processing for temporal and sequence models [2l, 16, 221 rely on two facts that are not true of presentations. First, the operators are specified over totally ordered domains (rather than the partially ordered determinant and indeterminant events of our model). Second, certain operators require only a fixed, bounded portion of the input data at once to compute a response. These two differences make efficient query processing over sets of presentations an interesting, challenging area for future investigation.

The queries we have described are tailored to the task of providing more flexible access to a presentation library, mainly for individual users. In addition to this browsing style of querying, a presentation library must also support more advanced query features including decision support queries and data mining. Due to their cost and complexity, it will be imperative that administrators are able to justify the expense, in time and money, of supporting multimedia libraries. Deci- sion support type queries facilitate the determination of aggregate usage statistics and other characteristics of the entire data set. An example would be a query to determine which types of media objects are reused most often for diverse tasks. Additional complex queries for discovering data patterns will be required over presentation databases. Examples include queries to find sets of objects that are commonly (or with a specific proba- bility) used together or to find sets of objects appearing commonly in the same temporal sequence [23, 241. We are currently extending our query language to handle aggregate, decision support style queries.

References

[l] M. J. Pkrez-Luque and T. D. C. Little, “A temporal reference framework for multimedia synchronization,” IEEE Journal on Selected Areas in Communications, January 1996.

[2] G. Blakoowski and R. Steinmetz, “A media synchronization survey: Reference model, specification, and case studies,” IEEE Journal on Selected Areas in Communications, January 1996.

[3] T . D. C. Little and A. Ghafoor, “Multimedia object models for database and synchronization,” in Proc. of the Int ’I Conf. on Data Engineering, Feb. 1990.

72

R. Weiss, A. Duda, and D. K. Gifford, “Composi- tion and Search with a Video Algebra,” in IEEE Multimedia, pp. 12-25, Spring 1995.

W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Falout- SOS, and G. Taubin, “The QBIC Project: Query- ing Images By Content Using Color, Texture, and Shape,” in SPIE, vol. 1908, pp. 173-187, 1993.

B. D. Markey, “Emerging hypermedia standards - hypermedia marketplace prepa,res for HyTime and MHEG,” in Proceedings of USENIX, pp. 59-74, 1991.

S. R. Newcomb, N. A. Kipp, and V. T . Newcomb, “HyTime - The Hypermedia/Time-based Docu- ment Structuring Language,” Communications of the ACM, vol. 34, no. 11, pp. 67-83, 1991.

R. Hamakawa and J. Rekimoto, “Object composition and playback models for handling multimedia data,” in Proceedings of ACM Conf. on Multime- dia, (Anaheim CA), pp. 273-281, aug 1993.

S. B. Eun, E. S. No, H. C. Kim, H. Yoon, and S. R. Maeng, “Specification of Multimedia Composition and a Visual Programming Environment ,” in Pro- ceedings of ACM Conf. on Multimedia, (Anaheim, CA), pp. 167-173, Aug. 1993.

R. Weiss, A. Duda, and D. K. Gifford, “Content- Based Access to Algebraic Video,” in Proc. of the International Conf. on Multimedia Computing and Systems, (Boston, MA), pp. 140-151, IEEE Com- puter Society Press, May 1994.

M. Iino, Y. F. Day, and A. Ghafoor, “An Object- Oriented Model for Spatio-Temporal Synchroniza- tion of Multimedia Information,” in Proc. of the International Conf. on Multimedia Computing and Systems, (Boston, MA), pp. 110-119, IEEE Com- puter Society Press, May 1994.

G. A. Schloss and M. J. Wynblatt, “Building temporal structures in a layered multimedia data model,” in Proceedings of ACM Conf. on Multime- dia, (San Francisco, CA), pp. 271-278, oct 1994.

K. Aberer and W. Klass, “Supporting Tempo- ral Multimedia Operations in Object-Oriented Database Systems,” in Proc. of the International Conf. on Multimedia Computing and Systems, (Boston, MA), pp. 352-361, may 1994.

[14] T. D. C. Little and A. Ghafoor, “Interval-based conceptual models for time-dependent multimedia data,” IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 551-563, Aug. 1993.

[15] P. Seshadri, M. Livny, and R. Ramakrishnan, “SEQ: Design and Implementation of a Sequence Database System,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, (Montreal, Canada), .June 1996.

[16] C. T . Y. Leung and R. R. Muntz, Temporal Databases, Theory, Design, and Implementation. Benjamin/Cummings, 1993.

[17] R. Snodgrass, “TSQL2 language specification,” in Report of a n Invited ARPA/NSF Workshop, March 1 1994.

[18] J. F. Allen, “Maintaining knowledge about temporal intervals,” Communications of the ACM, vol. 26, pp. 832-843, November 1983.

[19] C.-H. Wu, R. J. Miller, and M. T. Liu, “Multi- media Presentations System,” in Working paper, (Ohio State University), 1997.

[20] P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu, “A Query Language and Optimization Techniques for Unstructured Data,” in Proc. of the ACM SIGMOD Int ’1 Conf. on Management of Data, (Montrkal, Canada), May 1996.

[21] T. Y. Leung and R. Muntz, “Query Processing for Temporal Databases,” in Proceedings of the 6th In- ternational Conference on Data Engineering, Feb. 1990.

[22] P. Seshadri, M. Livny, and R. Ramakrishnan, “Se- quence Query Processing,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, May 1994.

[23] R. Agravval, T . Imielinksi, and A. Swami, ‘‘Database Mining: A Performance Perspective,” IEEE Transactions on Knowledge and Data Engi- neering, vol. 5, pp. 914-925, Dec. 1993. Special issue on Learning and Discovery in Knowledge- Based Databases.

[24] R. J. Miller and Y. Yang, “Association Rules over Interval Data,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, (Tuscon, AZ), May 1997.

73

Documents

[IEEE Comput. Soc International Conference on Protocols for Multimedia Systems - Multimedia Networking - Santiago, Chile (24-27 Nov. 1997)] Proceedings of International Conference