18
We present the design and implementation of NetMedia, a middleware that supports client– server distributed multimedia applications. In NetMedia, individual clients may access multiple servers to retrieve customized multimedia presentations. Each server simultaneously supports multiple clients. NetMedia has transmission support strategies and robust software systems at both server and client ends. R esearchers have become increasingly interested in flexibly constructing and manipulating heterogeneous presenta- tions from multimedia data resources 1 to support sophisticated applications. 2 In these applications, information from multimedia data resources at one location must be made available at other remote locations for collaborative engi- neering, educational learning and tutoring, inter- active computer-based training, electronic technical manuals, and distributed publishing. Such applications require that we store basic mul- timedia objects in multimedia databases or files. The multimedia objects—such as audio samples, video clips, images, and animation—are then selec- tively retrieved, transmitted, and composed for customized presentations. For example, in asyn- chronous distance learning, we may assemble basic parts of the multimedia presentations stored in the multimedia databases into different presentations for individual learning and training; in medical diagnosis for training and electronic technical manuals, we may assemble a sophisticated presen- tation containing images, video clips, and audio samples from the data stored in the multimedia databases. Such customized presentations require highly flexible compositions among various mul- timedia streams. To support such advanced applications, we need novel approaches to network-based com- puting to position adaptive quality-of-service (QoS) management as its central architectural principle. One of the key problems is to design the end system software to couple to the network interface. Also, in the best-effort network envi- ronment such as the Internet, the transmission of media data may experience large variations in available bandwidth, latency, and latency vari- ance. Consequently, network delays may occur in delivering media data. To meet the demands of the advanced applications over the Internet in the real world, end system software must have the ability to efficiently store, manage, and retrieve multimedia data. We must develop fundamental principles and advanced techniques for designing end-system software integrated with the network to support sophisticated and customized multi- media applications. Our contribution This article focuses on the design strategies of NetMedia, middleware for client–server distrib- uted multimedia applications (please see the “Related Work” sidebar for research conducted and developed by others). NetMedia provides ser- vices to support synchronized presentations of multimedia data to higher level applications. In the NetMedia environment, an individual client may access multiple servers to retrieve customized multimedia presentations, and each server simul- taneously supports multiple clients. NetMedia is capable of flexibly supporting synchronized streaming of continuous media data across best- effort networks. To achieve this, we design a mul- tilevel buffering scheme to control the collaborations between the client and server for flexible data delivery in distributed environments, an end-to-end network delay adaptation protocol (sending and display media units in a stream— termed DSD) to adjust the sending rate for each stream according to the optimum network delay, 3 and the probe-loss utilization streaming (PLUS) protocol, a flow and congestion control protocol that uses network status probing to avoid conges- tion rather than react to it. The DSD and PLUS schemes integrate transmission support strategies and robust software systems at the server and client ends to dynamically handle adaptive QoS 56 1070-986X/02/$17.00 © 2002 IEEE NetMedia: Streaming Multimedia Presentations in Distributed Environments Aidong Zhang, Yuqing Song, and Markus Mielke State University of New York at Buffalo Feature Article

Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

We present thedesign andimplementation ofNetMedia, amiddleware thatsupports client–server distributedmultimediaapplications. InNetMedia, individualclients may accessmultiple servers toretrieve customizedmultimediapresentations. Eachserver simultaneouslysupports multipleclients. NetMedia hastransmission supportstrategies and robustsoftware systems atboth server and clientends.

Researchers have become increasinglyinterested in flexibly constructing andmanipulating heterogeneous presenta-tions from multimedia data resources1

to support sophisticated applications.2 In theseapplications, information from multimedia dataresources at one location must be made availableat other remote locations for collaborative engi-neering, educational learning and tutoring, inter-active computer-based training, electronictechnical manuals, and distributed publishing.Such applications require that we store basic mul-timedia objects in multimedia databases or files.The multimedia objects—such as audio samples,video clips, images, and animation—are then selec-tively retrieved, transmitted, and composed forcustomized presentations. For example, in asyn-chronous distance learning, we may assemble basicparts of the multimedia presentations stored in themultimedia databases into different presentationsfor individual learning and training; in medicaldiagnosis for training and electronic technicalmanuals, we may assemble a sophisticated presen-tation containing images, video clips, and audio

samples from the data stored in the multimediadatabases. Such customized presentations requirehighly flexible compositions among various mul-timedia streams.

To support such advanced applications, weneed novel approaches to network-based com-puting to position adaptive quality-of-service(QoS) management as its central architecturalprinciple. One of the key problems is to design theend system software to couple to the networkinterface. Also, in the best-effort network envi-ronment such as the Internet, the transmission ofmedia data may experience large variations inavailable bandwidth, latency, and latency vari-ance. Consequently, network delays may occur indelivering media data. To meet the demands ofthe advanced applications over the Internet in thereal world, end system software must have theability to efficiently store, manage, and retrievemultimedia data. We must develop fundamentalprinciples and advanced techniques for designingend-system software integrated with the networkto support sophisticated and customized multi-media applications.

Our contributionThis article focuses on the design strategies of

NetMedia, middleware for client–server distrib-uted multimedia applications (please see the“Related Work” sidebar for research conductedand developed by others). NetMedia provides ser-vices to support synchronized presentations ofmultimedia data to higher level applications. Inthe NetMedia environment, an individual clientmay access multiple servers to retrieve customizedmultimedia presentations, and each server simul-taneously supports multiple clients. NetMedia iscapable of flexibly supporting synchronizedstreaming of continuous media data across best-effort networks. To achieve this, we design a mul-tilevel buffering scheme to control thecollaborations between the client and server forflexible data delivery in distributed environments,an end-to-end network delay adaptation protocol(sending and display media units in a stream—termed DSD) to adjust the sending rate for eachstream according to the optimum network delay,3

and the probe-loss utilization streaming (PLUS)protocol, a flow and congestion control protocolthat uses network status probing to avoid conges-tion rather than react to it. The DSD and PLUSschemes integrate transmission support strategiesand robust software systems at the server andclient ends to dynamically handle adaptive QoS

56 1070-986X/02/$17.00 © 2002 IEEE

NetMedia:StreamingMultimediaPresentations inDistributedEnvironments

Aidong Zhang, Yuqing Song, and Markus MielkeState University of New York at Buffalo

Feature Article

Page 2: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

management for customized multimedia presen-tations. We use these schemes to enforce interac-tive control (such as fast forward or backwardplayback, pause, seek, and slow motion) withimmediate response for all multimedia streams.

To the best of our knowledge, this is the first workthat provides an integrated solution to buffermanagement, congestion control, end-to-enddelay variations, interactivity support, and streamsynchronization.

57

Various research has been conducted to develop techniquesfor distributed multimedia systems. Collaboration between clientsand servers has been addressed to support a globally integratedmultimedia system.1 In particular, researchers have studiedbuffering and feedback control for multimedia data transmissionin the distributed environment. In Ramanthan and Rangan’swork,2 they proposed server-centric feedback strategies to main-tain synchronized transmission and presentation of mediastreams. Nahrstedt and Smith3 used feedback to design robustresource management over the network. Hui et al.1 designed abackward feedback control approach to maintain loosely-coupledsynchronization between a server and a client. Feng et al.4 devel-oped an approach to use the buffer, a priori information, and thecurrent network bandwidth availability to decide whether oneshould increase the quality of the video stream when more net-work bandwidth becomes available. Mielke and Zhang5 discussan approach to the integration of buffering and feedback control.

Substantial research has been conducted to reduce the bursti-ness of variable-bit-rate streams by prefetching data.6 Researchershave proposed many bandwidth smoothing algorithms specifi-cally for this task. These algorithms smooth the bandwidthrequirements for video stream retrieval from the video server andtransmission across the network. Given a fixed client buffer, it’spossible to minimize the peak bandwidth requirement for thecontinuous video stream delivery while minimizing the numberof bandwidth rate increases.7 Another scheme aims at minimiz-ing the total number of bandwidth changes.8 It’s also possible tominimize bandwidth variability9 and buffer utilization6 or adhereto a buffer residency constraint.6

Researchers have also investigated flow and congestion con-trol for multimedia data transmission in best effort networks.10

These approaches assume no direct support for resource reser-vation in the network and attempt to adapt the media streamsto current network conditions. Best-effort transmission schemesadaptively scale (reduce or increase) the bandwidth requirementsof audio and video streams to approximate a connection that’scurrently sustainable in the network. Best-effort schemes splitinto TCP-friendly and non-TCP-friendly mechanisms. TCP-friendly algorithms avoid starvation of TCP connections by reduc-ing their bandwidth requirements in case of packet loss as anindicator of congestion.11 Non-TCP-friendly schemes reduce ratesonly to guarantee a higher QoS to their own streams, therebyforcing TCP connections to back off when competing for insuf-ficient network bandwidth.12

Enforcing synchronization in playout scheduling has also been

recognized.13 Traditionally, servers focus more on keepingresource costs low at the client side. We observe that the enforce-ment of novel dynamic synchronization algorithms at the clientside has become feasible.13 Also, others have proposed controlschemes for processing user interactions14 for either centralizedor distributed environments. The difficulty of interactive supportfor video streams is the forward motion prediction used by mostadvanced compression techniques.15 Several papers proposemethods for implementing fast forward but don’t deal ade-quately with fast backward playback. Normally streams are sentout in encoding and not in display order to make it easier for for-ward playback decoding. This complicates the video frames’decoding in the backward direction and the synchronizationbetween participating streams after an interactive request.

Recent research also addresses the need for a middlewareframework in transmitting multimedia data. The middleware islocated between the system level (operating system and networkprotocol) and the applications.16 Neufeld et al.17 describe a goodconceptual model of the middleware. They propose a real-timemiddleware for asynchronous transfer mode (ATM) systems. Itprovides virtual connection setup, bandwidth reservation, andsession synchronization. Li and Nahrstedt18 describe a middle-ware control structure, which requires applications to implementobservation methods for respective system resources and prop-er adaptation policies within the application.

However, there’s a lack of a middleware structure that caneffectively put all component techniques together as multime-dia middleware services and provide a mapping from the high-level requests of the application to the low-level implementationof the system to support customized multimedia applications.Ideally, the application should be able to treat a multimediastream like a file and access it without worrying about synchro-nization, buffer management, and transmission over the net-work. In contrast to the traditional networked file server,multimedia computing introduces two new problems. First, thesize of multimedia objects requires large aggregate I/O band-width. The bandwidth also must be large enough to deal withthe burstiness introduced by compression techniques. The mul-timedia middleware must be able to handle bursty and volumi-nous data at high speeds. Second, the service must beguaranteed timely to ensure smooth playback at the client dis-play. Main memory must be efficiently managed for staging databetween the disks and the network. To accommodate differentvariations in bit rates for compressed streams as well as some-

continued on p. 58

Related Work

Page 3: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

NetMedia architectureWe illustrate the overall system architecture in

Figure 1. The system has three main components:the client for presentation scheduling, the serverfor resource scheduling, and the database (file)systems for data management and storage. Eachclient supports a graphical user interface (GUI)and synchronizes images and audio and videopackets and delivers them to an output devicesuch as a PC or a workstation. Meanwhile, eachserver is superimposed on a database system andsupports the multiuser aspect of media datacaching and scheduling. It maintains timelymedia data retrieval from the media database andtransfers the data to the client sites through a net-work. Finally, each database system manages the

insertion, deletion, and update of the media datastored in the local database. Because certain mediastreams may be represented and stored in differ-ent formats, the underlying database systems canbe heterogeneous. The aim in the proposed mid-dleware is to give users as much flexibility andindividual control as possible. The middlewaremust support each media type individually toallow independent and interactive functionalitiesas well as QoS requirements.

Server designThe server design permits individual stream

access with support of sharing resources toenhance scalability. We can summarize the aim ofserver functionality in the following points:

58

IEEE

Mul

tiM

edia

what unpredictable transmission and retrieval delays, we alsoneed novel end-to-end control approaches. Moreover, the sys-tem must be able to respond flexibly and dynamically to varioustypes of user interactions.

References1. J. Hui et al., “Client–Server Synchronization and Buffering for

Variable Rate Multimedia Retrievals,” IEEE J. Selected Areas in

Comm., vol. 14, no. 1, Jan. 1996, pp. 226-237.2. S. Ramanathan and P.V. Rangan, “Feedback Techniques for Intra-

Media Continuity and Inter-Media Synchronization in DistributedMultimedia Systems,” The Computer J., vol. 36, no. 1, 1993, pp.19-31.

3. K. Nahrstedt and J. Smith, “New Algorithms for AdmissionControl and Scheduling to Support Multimedia Feedback RemoteControl Applications,” Proc. 3rd IEEE Int’l Conf. Multimedia

Computing and Systems (ICMCS 96), IEEE CS Press, Los Alamitos,Calif., 1996, pp. 532-539.

4. W. Feng, B. Krishnaswami, and A. Prabhudev, “Proactive BufferManagement for the Streamed Delivery of Stored Video,” Proc.

ACM Multimedia, ACM Press, New York, 1998, pp. 285-290.5. M. Mielke and A. Zhang, “A Multi-Level Buffering and Feedback

Scheme for Distributed Multimedia Presentation Systems,” Proc.

7th Int’l Conf. Computer Comm. and Networks (IC3N 98), IEEE CSPress, Los Alamitos, Calif., 1998, pp. 219-226.

6. W. Feng, Buffering Techniques for Delivery of Compressed Video in

Video-on-Demand Systems, Kluwer Academic, New York, 1997.7. W. Feng and S. Sechrest, “Smoothing and Buffering for Delivery

of Prerecorded Compressed Video,” Proc. SPIE Multimedia

Computing and Networking, SPIE Press, Bellingham, Wash., 1995,pp. 234-242.

8. W. Feng, F. Jahanian, and S. Sechrest, “An Optimal BandwidthAllocation Strategy for the Delivery of Compressed Prerecorded

Video,” ACM/Springer-Verlag Multimedia Systems J., vol. 5, no. 5,1997, pp. 297-309.

9. J. Salehi et al., “Optimal Buffering for the Delivery of CompressedPrerecorded Video,” Proc. ACM Special Interest Group on

Computer/Communication System Performance (Sigmetrics 96),ACM Press, New York, 1996, pp. 222-231.

10. I. Busse, B. Deffner, and H. Schulzrinne, “Dynamic QoS Control ofMultimedia Applications Based on RTP,” Computer Comm., vol.19, no. 1, Jan. 1996, pp. 49-58.

11. D.D. Clark, S. Shenker, and L. Zhang, “Supporting Real-TimeApplications in an Integrated Services Packet Network:Architecture and Mechanism,” Proc. ACM Special Interest Group on

Data Comm. (Sigcom 92), ACM Press, New York, 1992, pp. 14-26.12. S. Chakrabarti and R. Wang, “Adaptive Control for Packet Video,”

IEEE Multimedia and Computer Systems, IEEE CS Press, LosAlamitos, Calif., 1994, pp. 56-62.

13. T.V. Johnson and A. Zhang, “Dynamic Playout SchedulingAlgorithms for Continuous Multimedia Streams,” ACM Multimedia

Systems, vol. 7, no. 4, July 1999, pp. 312-325.14. N. Hirzalla, B. Falchuk, and A. Karmouch, “A Temporal Model for

Interactive Multimedia Scenario,” IEEE MultiMedia, vol. 2, no. 3,Fall 1995, pp. 24-31.

15. M.S. Chen and D. Kandalur, “Downloading and StreamConversion: Supporting Interactive Playout of Videos in a ClientStation,” Proc. IEEE Multimedia Computing and Systems, IEEE CSPress, Los Alamitos, Calif., 1995, pp. 73-80.

16. P.A. Bernstein, “Middleware: A Model for Distributed SystemServices,” Comm. ACM, vol. 39, no. 2, Feb. 1996, pp. 86-98.

17. G.Neufeld, D.Makaroff, and N.Hutchinson, The Design of a

Variable Bit Rate Continuous Media Server, tech. report TR-95-06,Dept. Computer Science, Univ. of British Columbia, Vancouver,Canada, March 1995.

18. B. Li and K. Nahrstedt, “A Control-Based Middleware Frameworkfor Quality of Service Adaptations,” IEEE J. Selected Areas in

Comm., vol. 17, no. 9, Sept. 1999, pp. 1632-1650.

continued from p. 57

Page 4: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

❚ resource sharing (use of common server diskreading for multiple clients),

❚ scalability (support multiple front end modules),

❚ individual QoS and congestion control (man-aging individual streams), and

❚ interactivity.

We divide the server and client design into thefront end and the back end. The server back endincludes the disk service module that fills in theserver buffer according to the request. The serverbuffer contains a sub-buffer for each stream. Allstreams and front-end modules share the back-end module.

The server front end includes communicationand packetization modules that support readingdata from the server buffer and delivering it to anetwork according to the QoS requirements foreach stream. It also deals with the clients’ admis-sion control.

Figure 2 shows the server component’s imple-mentation design for processing audio and videostreams. The disk service module is realizedthrough DiskReadThread. The packetization mod-ule is realized by SendThread, which reads mediaunits from the server buffer and sends packets tothe network. The communication module is real-ized through ServerTimeThread, AdmissionThread,and AdmittedClientSet. ServerTimeThread reportsthe server’s current time and estimates the networkdelay and time difference between server andclient. AdmittedClientSet keeps track of all admit-ted clients. The system uses Server_Probe_Threadto get feedback messages and control messagesfrom the probe thread at the client site and initi-ates control of probing the network.

Client designA client’s main functionality is to display mul-

tiple media streams to users in the specified for-mat. The client synchronizes the audio and videopackets and delivers them to the output devices.In addition, the client sends feedbacks to the serv-er, which the server uses for admission controland scheduling. Figure 3 (next page) shows theclient’s design. The client back end receives thestream packets from the network and fills in theclient caches. The client back end includes twocomponents:

❚ Communication module. The communication

module provides the interface to the net-work layer. It primarily consists of two sub-systems to handle user datagram (UDP) andtransmission control (TCP) protocols. Thislets the client use faster protocols like UDP

59

Network

Client

User

Media database

Server

DBMS

Client

User

Media database

Server

DBMS

Client

User

Media database

Server

DBMS

Figure 1. NetMedia system architecture.

Disk read thread

VIDEO BUFFER

AUDIO BUFFER

ADMITTED CLIENT

VIDEO BUFFER

AUDIO BUFFER

ADMITTED CLIENT

Videobuffer

Audiobuffer

Sendthread

Sendthread

Server_Probe_ThreadAdjust video sendingAdjust audio sending

Server time thread

Admission thread

Netw

ork

Set of admitted clients

Admitted client

Figure 2. Server design for audio and video streams.

Page 5: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

for media streams that can tolerate somedata loss and reliable protocols like TCP forfeedback messages.

❚ Buffer manager. The buffer manager maintainsthe client cache in a consistent state.

The client front end reads out the data fromthe client cache and ensures the synchronizedstream presentation. The client front end includestwo components:

❚ Synchronization manager. The synchronizationmanager schedules the multimedia data deliv-ery to the output devices. This module controlsdelivery of media streams based on QoS para-meters that users and the presentation’s statedefine.

❚ Presentation manager. The presentation manag-er interacts with the GUI and translates userinteractions such as START and STOP intomeaningful commands that the other subsys-tems can understand.

Note that we can retrieve a stream consistingof transparencies (or images) using conventionaltransmission methods such as remote procedurecalls (RPC) and synchronize it with the audio andvideo streams at the interface level. The advan-tage of RPC is that standard function calls canimplement well-known file functions—such asopening and reading from a certain position—that the server machine then transparently exe-cutes. An RPC request may contain commands tofind a stream’s file length, providing directorycontent or starting the slide session’s delivery.The RPC server retrieves all requested data andsends it back to the client side. Since the sched-uling for such data isn’t as time critical as audioor video data, the display of such data can toler-ate some delay. In our implementation, we syn-chronize the slides first according to the audioinformation. If no audio is available, we scheduleslides with video; otherwise it behaves like a reg-ular Postscript document.

To make the access to media streams transpar-ent to the application’s higher levels, we providethe MultiMedia Real-Time Interactive Stream

60

IEEE

Mul

tiM

edia

MRIS (multimedia realtime interactive streams)

Open slides

Close slides

Estimation of network delay

Request for open of streamVideostream

Videostream

GV(ghost view)

Connect

Com

mands

for interaction

Server

Synchronization between slides and video/audio

Client

Slidestream

Net

wor

k

Clientmanagement

Clientmanagement

RPCserver

Video/audioserver

Videobuffer

Videoprepare thread

Receivethread

Audiobuffer

Audioprepare thread

Receivethread

Synchronizationbetween video

and audioSynchronization of audiobetween client and server

Synchronization of videobetween client and server

Control message

Clie

ntsc

hedu

leth

read

Request for close of stream

Disconnect

Inte

rfac

e

Audioplayer

MPEG_player

Figure 3. Multiplestreams synchronized at the client.

Page 6: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

(MRIS) interface for requesting media streamdelivery such as audio, video, or any other mediain a synchronized form from the server. MRIS sep-arates the application from the software driversneeded to access multimedia data from remotesites and controls specific audio and video outputdevices. The MRIS interface has the advantagethat the transmission support is independent ofthe application. Any application that uses syn-chronized multimedia data can rely on this mod-ule. We use the MRIS class to implement ourvirtual classroom application. Users can synchro-nize the slides with the video and audio data atthe MRIS interface level.

Figure 4 illustrates NetMedia’s class hierarchy.The application only contains the transparency-reader Ghostview (GV), control Loop (gvMain-Loop), and player units (MpegPlayThread andAudioPlayThread). MRIS handles everything else.

Time controlA crucial design element of a multimedia sys-

tem is control over the timing requirements of theparticipating streams so that we can present thestreams in a satisfiable fashion. In NetMedia, arobust timing control module (named SleepClock)handles all timing requirements in the system.The novelty of SleepClock is that it can flexiblycontrol and adjust the invocation of periodicevents. We use SleepClock to enforce

❚ end-to-end synchronization between serverand client,

❚ smoothed transmission of streams,

❚ periodic checks in modules, and

❚ timing requirements of media streams.

Figure 5 (next page) shows how SleepClockworks. Initially, we set the start time of a periodicevent sequence and the time interval of each event.We can set SleepClock to sleep until a given eventnumber’s appropriate time. Sleeping renders thecurrent process inactive until it reaches the event’stime. For example, consider that the system startedat time s = 12:00:00 p.m. and the interval is set to 1minute. If the current process needs to sleep untilthe number 5 event and the current time is12:02:30, then a sleep routine will be called with itsvalue set to 2 minutes and 30 seconds. This lets theprocess sleep until the correct time. If the currenttime surpasses startime + event_number ∗ interval,then SleepClock will take no action and the processcan try to keep up to its timing requirements.

The advantage of SleepClock is that it providestwo ways to adjust the progress of events. First, theinvocation time of each event can be modified. Itdoes this by assuming a new start time for theevent sequence, which results in an acceleration

61

January–M

arch 2002

(decodeThread)(decoder)

(audiobuffer)

(vrthread)(arthread)

(schedtrhead)

(vptrhead)

AudioPlayThread

(aptrhead)

AudioPlay

gvMainLoopThread

(videobuffer)MpegPlayThread

GV

ServerTimeThread

AdmissionThread

DiskReadThread

Client

VideoFileInfoFlowControlBuffer<VideoPacket>FlowControlBuffer<AudioPacket>

SendThread<VideoPacket>SendThread<AudioPacket>

ServerScheduleThread

SleepClockSleepClock

MRIS

AdmittedMRISAdmittedMRISSet

Video/Audio server

MRISManageThread

AdmissionReply

AdmissionRequest

VideoPacket

FeedBackMsg

(videofileinfo)(videobuffer)(audiobuffer)

(videothread)(audiothread)

(schedtrhead)

(videoSleepClock)(audioSleepClock)

ClientVideoBufferClientVideoDecoder

AudioPacket

ReceiveThread<VideoPacket>ReceiveThread<AudioPacket>

ClientSchedulThread

Decoder

ClientAudioBuffer

VideoPrepareThreadAudioPrepareThread

NetMedia-VCR

Net

wor

k

Inte

rfac

e

Figure 4. NetMedia classhierarchy using theMRIS.

Page 7: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

or delay for the remaining sequence of events.SleepClock checks the current time in relationshipto new_starttime + interval ∗ current_event. Forexample, consider that the system has streamedout 100 frames when it receives a speedup com-mand. By moving the start time in the past (rela-tive to the old start time), the 101 event (and allfollowing events) will occur sooner in contrast tothe original timing schedule. In Figure 5, AdjustProgress shows an example of the modified starttime. It achieves the second form of adjustment bymodifying the interval between events. Figure 5gives an example of changing intervals.

Buffering strategiesBuffer management is a crucial part in the

NetMedia system, which has two main compo-

nents: server and client buffers.

Server flow-control bufferWe designed a straightforward server buffer for

each stream to perform fast delivery of the streamfrom the disk to the network. Figure 6 shows theserver buffer structure for a stream. A server streambuffer is a circular list of items. Each item maycontain one packet. Each server stream buffer hasthe following characteristics:

❚ One producer and one consumer. The producerwrites packets to the buffer and the consumerreads from the buffer.

❚ First-in, first-out. Packets written in first will beread out first.

62

IEEE

Mul

tiM

edia

Change interval

Start time = s s + interval s + interval*2

Event 0

Time

Events Event j

New start time = sn ns + new_interval*(n−j )

event nEvent 1

Oldinterval

Event 2

Change interval now

Initially

Start time = s s + interval s + interval*2

Event 0 Event 1 Event 2 Event n

s + interval*nTime

Events

Interval

Adjust progress

start time = s s + interval s + interval*2

Event 0 Event 1 Event 2

Time

Events Event n

ns + interval*n

Set new start time (ns) = s + howmuch Adjust progress now by

Start time = s s + interval s + interval*2

Event 0 Event 1 Event 2

Time

Events

Sleep for next m events (here, m can be a fraction.)

Event n

Sleep from now … Till here.

Sleep from ... to

m = 2

m = 0.8

Interval

Newinterval

Figure 5. SleepClockdesign.

Page 8: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

Specifically, each server streambuffer has two pointers: read point-er (readp) and write pointer (writep).Items from readp to writep are read-able; spaces from writep to readp arewriteable.

Because a server buffer is designedfor data flow control from the disk tothe network, the buffer’s size can berelatively small. In NetMedia’s imple-mentation, the video buffer’s size is100 video packets, where each videopacket is 1,024 bytes, and the audiobuffer’s size is 20 audio packets,where each audio packet is 631 bytes.

Client packet bufferClient buffer management is a

central part of the NetMedia frame-work. In contrast to the server buffer that must sup-port fast delivery of the stream from the disk to thenetwork, the purpose of a client buffer is to takecare of network delays and out-of-order arrivingpackets and to efficiently support user interactions.

The client buffer management’s novelty is itsability to support interactivity requests, like back-ward playback or seek. The difficulty experiencedwith interactive support in video streams is thevideo encoding used by most advanced compres-sion techniques.4 Several works propose methodsfor implementing forward playback but don’t ade-quately deal with backward playback. To solve thedecoding problem, we propose an approach5 touse a separate file for each video stream, pre-processed for backward playback. The advantageis that it doesn’t need special client buffer man-agement. The disadvantage is that the systemmust create a separate file for all video streams,thus increasing the storage used in the system andits cost.

Our approach is to let the client support inter-active requests rather than the server. As the clientprocesses video frames, rather than discarding thedisplayed frames, the client saves the most recentframes to a temporary buffer. We do this to reducethe network’s bandwidth requirements inexchange for memory at the client site, becausewe expect an increase in network load in the nextfew years compared to falling memory prices.

A circular buffer serves as the client bufferdesign for interactivity functions. The model pro-posed here lets all interactive functions occur atany time with an immediate response time. If thebuffer is large enough, the interactive request for

backward playback can be fed directly out of thebuffer instead of requesting a retransmission fromthe server.

The interactive buffer for a stream has two lev-els: packet and index buffers. The first level is thepacket buffer, which has a set of fixed-size pack-ets. Each packet has one pointer that can point toanother packet, so we can easily link any set ofpackets by pointer manipulation (for example, allpackets for a particular frame). The buffer main-tains a list of free packet slots.

The second level for audio buffer is the indexbuffer of a circular list, in which each entry recordsthe packet’s playback order. Each packet in anaudio stream has a unique number—PacketNo—which indicates the packets’ order in the wholestream. For the video buffer, each index in theindex buffer points to one frame (which is a list ofordered packets within the frame). We define thepoint with the minimal packet/frame number asMinIndex and the point where we read data forplayback as CurrentIndex. Both indices traverse thecircumference in a clockwise manner, as Figure 7(next page) shows. (Note that in this figure, sincethe buffer is for a video stream, we named the twopointers MinFrameIndex and CurrentFrameIndex.)

The distance between MinIndex andCurrentIndex is defined as the history area. A pack-et isn’t removed immediately after being dis-played. It’s removed only when some new packetneeds space. So valid indices from MinIndex(inclusive) to CurrentIndex (not inclusive) pointto the packets or frames that have been displayedand haven’t been released yet. (Later the systemcan use the data to support backward.) To keep

63

January–M

arch 2002

Item

Item

Item

ItemItem

Item

Item

Item

Write pointer Read pointer

WriteItem

Read

Figure 6. Server flow-control buffer design.

Page 9: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

track of the forward playback capabilities, weintroduce the packet/frame with the maximumnumber as MaxIndex. Note that MaxIndex isalways between CurrentIndex and MinIndex. Wedefine the distance between CurrentIndex andMaxIndex as the future area. Using the history andfuture areas, users have full interaction capabili-ties in the two different areas.

When a new packet arrives at the client site,the system checks its packet or frame number(denoted as seqno) and decides if the packet canbe stored. We define the valid range of the packetor frame numbers as follows:

MinIndex ≤ seqno ≤ CurrentIndex + BuffSize − 1

If the packet has a packet or frame number out-

side this range, then the system considers it apacket loss and discards it. Otherwise, the packetis stored in a packet slot either taken from the freepacket slot list or obtained from releasing somehistory data. Note that once the system releasessome history data, it also updates the MinIndex.

To increase the decoding speed of compressedmotion predicted frames, we introduce the use ofa decode buffer. The MPEG compression scheme4

introduces interframe dependencies among I, P,and B frame types. To decode a certain frame, pre-vious or future frames might be necessary. Forexample, given the following frame sequence:

frame type: I B B P B B P B B Iframe #: 1 2 3 4 5 6 7 8 9 10

frames 1, 4, and 7 must be decoded before frame9 can be decoded. To speed up the decodingprocess for the backward playback, we cache thedecoded frames within an I-to-I frame sequence.In case of dependencies, we might be able to feednecessary frames out of the decode buffer insteadof calling the decoding routine multiple times.

To implement the buffer design and let inter-active requests be fed from their local buffer, weneed to set a reasonable buffer size for the indexand packet buffers. We introduce two considera-tion factors: Max−Interactive− Support andReasonable−Delay. Max−Interactive−Support definesthe longest playback time we expect to bebuffered in the client for interactive support, thusdefining the index buffer’s size. Reasonable_Delaydefines the delay within which most packets willpass over the network, thus defining the packetbuffer’s size. In NetMedia’s implementation, wedefined Reasonable_Delay at 2 seconds. Max-_Interactive−Support’s size depends on the amountof forward and backward play support. (This canvary from 10 seconds to 5 minutes or more,depending on the memory at the client.)

At the server site, video frames are normallyrepresented in encoding order to make it easier forthe decoder to decode in the forward playback.This complicates the decoding of frames for back-ward playback. We transmit our streams in displayorder to achieve the minimal distance among thedependent frames for both forward and backwardplayback decoding. The advantage is that ourscheme at the client site has uniform design andthe same performance for both forward and back-ward playback. By using the decoding buffer andour network-control algorithms, we can controlthe amount of data in the future or backward area

64

IEEE

Mul

tiM

edia

Packet Packet

Packet Packet

Packet

PacketPacket

IndexIndex

Inde

xIndex

MinFrameIndex

CurrentFrameIndex

Free space listFirst Last

Decoder

Ring buffer for decoded frame

Packet bufferIndex buffer

Pack

et Writ

e

Dec

oded

fram

eRead

Packet

Figure 7. Packet andindex buffers for a videostream at the client.

Page 10: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

in our buffer. This lets users specify the amount ofinteractivity support that they want for backwardor forward playback without immediate requestto the server.

With our buffer design, we can use the historyand future areas to feed interactive requests direct-ly from the local buffer. If the request doesn’texceed the cache, the server won’t be notified. Thiswill highly reduce the scalability problems facedby the server in case of multiple interactivityrequests. The server must handle access outside thehistory or future area by redirect streaming, but theprobability of this case can be drastically reducedby the buffer size amount given at the client.

End-to-end flow controlVarious delays and congestion in the network

may cause the client buffer underflow or overflow,which may result in discontinuities in streamplayback. We propose DSD and PLUS schemes todynamically monitor the network situation andadapt our system to the current situation of thenetwork. The DSD scheme provides full flow con-trol on stream transmission, and the PLUS schemeuses network status probing and an effectiveadjustment mechanism to data loss to preventnetwork congestion. Both schemes work togetherto find the optimum adjustment considering net-work delay and data loss rate and ensure nounderflow or overflow in the client buffer.

DSD schemeThe basic dynamic end-to-end control scheme

used in NetMedia is the DSD algorithm presentedin Song et al.3 For DSD, we define the distancebetween sending and display at time t as the dif-ference between the nominal display time of themedia unit—which is being sent—and the mediaunit, which is being displayed. Thus, at a giventime, let the media unit being displayed have nom-inal display time T and the media unit being senthave nominal display time t. Then, DSD = t – T.Figure 8 illustrates the meaning of DSD in a stream.

The connection between DSD and the networkdelay is as follows. Comparing it with the networkdelay, if DSD is too large, then the overflow dataloss at the client buffer could be high; if DSD is toosmall, then the underflow data loss at the clientbuffer could also be high. Thus, DSD should belarge enough to cover the network delay experi-enced by most media units; it also should be smallenough to ensure the client buffer won’t overflow.

By adjusting the current DSD in the system torespond to the current network delays, we can

monitor the best DSD dynamically. We can thenadjust the sending rate to avoid underflow or over-flow in the client buffers and alleviate data loss.

Initially, we select a DSD based on the up-to-date network load and fix the sending and displayrates at the nominal display rate. In the followingwe show the current DSD calculation, the bestDSD, and the DSD adjustment.

Current DSD calculation. At time t, let tsent bethe nominal display time of the media unit beingsent at the server site and tdisplayed be the nominaldisplay time of the media unit being displayed atthe client site. By definition, the DSD is

DSDt = tsent − tdisplayed (1)

Suppose at time t, the media unit being dis-played has unit number ut

displayed, the media unitbeing sent has unit number ut

sent, and nominalduration of one unit is UnitDuration. We canthen calculate DSD as follows:

(2)

The client can always know utdisplayed directly, but

it can only estimate utsentfrom the information car-

ried by currently arriving media units. Supposethat it sends the current arriving media unit u attime SendingTimeu and its unit number is nu.Assume that the server is sending with nominaldisplay rate. At time t, we have

(3)

We then obtain

(4)

Best DSD calculation. We define an allowablerange for DSD:

MINDSD ≤ DSD ≤ MAXDSD,

where MINDSD = 0 and MAXDSD = Max−Delay.We then evenly divide interval [MINDSD,

DSD UnitDuration

+( SendingTime )displayed= − ×

( )n u

tu

t

u

u n

ttu

usent

SendingTimeUnitDuration

= +−

DSD UnitDurationsent displayedtt tu u= − ×( )

65

January–M

arch 2002

T t

DSD

Media stream

Figure 8. Definition of DSD in a stream.

Page 11: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

MAXDSD] into k − 1 subintervals with intervalpoints d1= MINDSD, d2, d3, … dk = MAXDSD. Herek can be 10, 20, or even 100. The tradeoff is thatif k is too small, the best DSD found might not begood enough; if k is too big, the overhead in cal-culating best DSD is too large. For each di, we keeptrack of data loss with respect to it.

Here’s the definition of relative data loss withrespect to di. Let d be the current DSD. Suppose wehave a virtual client, which is the same as theactual client, with the same buffer size and thesame displaying time for any media unit. The dif-ference is that, suppose for the real client, it sendsa real media unit at time t. It then should be dis-played at time t + d. For the virtual client, this unitis sent at time t and displayed at time t + di. So thevirtual DSD for this unit is di. Upon this assump-tion, we define the virtual client’s loss rate as therelative loss rate with respect to di. When a mediaunit arrives, virtual clients first check if the unit islate. Suppose the current time is currentT, the vir-tual client’s DSD is d, and the sending time of thismedia unit is sendT. If sendT + d < currentT, thenthe unit is late. If the unit is late, it’s counted as arelative loss for this virtual client.

The best DSD finder keeps track of these virtualclients. It launches the best DSD calculation oncein a fixed time period (200 milliseconds inNetMedia’s implementation). When it needs a new(best) DSD, the best DSD finder browses virtualclients and reports the DSD in the virtual clientwith minimum loss rate as the best DSD. Note that

DSD only captures relative packet loss, whichmeans that the packet arrived at the client site butwas too late for display. The PLUS protocol capturesany data loss due to network congestion.

Adjustment to network delay. The systemchooses a DSD as an application begins, based onthe current network load. If the best DSD differsfrom the current DSD, we can adjust it in twoways. One is to change the sending rate andanother is to change the display rate. We canadjust the display rate by the timing routine at theclient site. We only adjust video streams.

At any calculation point, if the client calls foradjustment, then it sends a feedback message tothe server side to adjust the sending rate, so thatit changes the system DSD to the best DSD. Thefeedback messages will carry the ∆d—the differ-ence of the best DSD and the current DSD. A neg-ative ∆d means the sending rate should slow downand a positive ∆d means the sending rate shouldspeed up. The feedback data also contains infor-mation about the loss rate to initiate frame drop-ping at the server site if necessary.

To slow down, the server stops sending for |∆d|time. To speed up, the server always assumes thatthe current display rate is at nominal display rater. Let the maximum available sending rate overthe network be R. The server will speed up usingthe maximum data rate for a time period. Afterthe reaction time finishes, the server will use thenominal display rate r as the sending rate.

66

IEEE

Mul

tiM

edia

Time

Events event n

n s + interval*n

Start time=s s + interval s + interval*2

Event 0 Event 1 Event 2

Time

Events Event n

Slow down

Initially

start time = s s + interval s + interval*2

Event 0 Event 1 Event 2 Event ns + interval*nTime

Events

Event 2Event 1Event 0

s + interval*2s + intervalStart time = s

Speed up

Current time

Wait to send next unitInterval

Set new start time (ns)

Wait to send next unit

Wait to send next unit

Set new start time (ns)

Figure 9. Time controlin implementing theDSD scheme.

Page 12: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

Figure 9 shows SleepClock’s use in the DSDscheme. The scheme adjusts the start time to changethe stream’s sending times in the following events,which either increases or decreases the sendingbandwidth temporarily. This can adjust the streamfor varying end-to-end delays in the network.

PLUS schemeThe DSD scheme doesn’t address the conges-

tion problem in the network. It also doesn’t con-sider the effect of data compression, which mayintroduce burstiness into the data streams. Theburstiness results in different bandwidth require-ments during the stream’s transmission. Thismakes it difficult to come up with a resource andadaptation scheme because the bandwidthrequirements always change. To address theseissues, we developed PLUS as a new flow and con-gestion control scheme for distributed multime-dia presentation systems. This scheme usesnetwork situation probing and an effective adjust-ment mechanism to data loss to prevent networkcongestion. We also designed the proposedscheme to scale with increasing numbers of PLUS-based streaming traffic and to live in harmonywith TCP-based traffic. With the PLUS protocol weaddress how to avoid congestion rather than reactto it with the probing scheme’s help.

The PLUS protocol eases bandwidth fluctua-tions by grouping together some media units(frames for video) in a window interval ∆W (Figure

10). We define easing bandwidth in a given win-dow as smoothing.6 One way of smoothing is tosend out the data at the average bandwidthrequirement for the window (Equation 5):

(5)

By using this method, the client can guaranteethat the bandwidth needed is minimal and con-stant through the interval. The disadvantage ofthis method is that the feedback received onlyapplies to the current or past bandwidth require-ments. It doesn’t consider that there might be ahigher bandwidth request in the future, which thenetwork may not be able to process.

For each interval ∆W, we identify a criticalinterval. The critical interval is an interval in thefuture playback time that contains the maximumnumber of packets. The aim of the PLUS probingscheme is to test if the network can support thecritical interval. We set ∆W to be a 5-secondsmoothing window in NetMedia’s implementa-tion. This is a reasonable time to reduce the band-width through smoothing. It’s also granularenough to detect sequences with fast movements(which result in higher numbers of packets perframe and therefore provide the bottleneck band-width of a stream).

Once we determine the critical interval foreach sequence, we apply our smoothing andprobing scheme. The critical bandwidth in the

∆a

W =

number_of_packets_in_current_interval

67

January–M

arch 2002

r Critical interval

r

a

ai

r

Packet type

No_of_pktsper frame:

Just-in-time

Send rate:

Send rate:

Interval

Type I B B I

3

I BBII IB I

2 1 8

Smoothing

PLUS

Send rate:

Smoothed rate Probe rate

Packet type

W

I BBBI II IB I

Figure 10. PLUS probingscheme.

Page 13: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

future, at a given interval, is provided by its criti-cal interval. To find the minimal bandwidthrequirement for the critical interval, we apply theaveraging algorithm, which spreads the constant-sized packets evenly. This leads to a sending dif-ference between consecutive packets, which wedefine by

(6)

According to Keshev,7 the packet pair approachat the receiver site can estimate a connection’sbottleneck bandwidth. The essential idea is thatthe interpacket spacing of two packets will be pro-portional to the time required for the bottleneckrouter to process the second packet. We calculatethe bottleneck bandwidth as

(7)

In MPEG-encoded streams the loss of I or Pframes results in further dependent frames loss.Only the loss of B frames doesn’t result in furtherloss. A B frame loss (in presentation) results onlyin a short degradation of the quality in playback.We use B frames to probe the network to see if itcan support the critical interval and thereby pro-tect critical frame loss like I or P frames (as thelower part of Figure 10 shows).

Instead of spreading each packet evenly withinour 5-second smoothing window, we use the bot-tleneck bandwidth for sending out all B frames.Critical packets (belonging to I or P) will still besent out with the average bandwidth calculated inEquation 5. Our scheme will thereby punctuallyprobe the network while acknowledgments willgive direct feedback ahead of time if the networkcan support the critical bandwidth. In case of con-gestion, we’ll initiate a multiplicative backoff ofthe sending rate. B frames have less chance of sur-vival in our scheme in case of congestion. Thisisn’t a disadvantage because we can proactivelyprovide a bandwidth reduction at the server site bydropping noncritical B frames in time to increasethe survival rate of critical packets (thereby increas-ing the subsequent B frames’ survival number).

Concurrent probing of different PLUS streamsreports a conservative estimation of the currentnetwork situation. We based the estimation onthe bandwidth need of the streams’ critical inter-vals and not necessarily the current average inter-val. This behavior lets the PLUS stream be awareof its surrounding and responsive to protect criti-cal packets when reaching the connection’s max-

imum capacity. If the concurrent probing causespacket loss, PLUS streams back off, letting themlive in harmony with TCP and other PLUSstreams. Sequential probing of different PLUSstreams could report an estimation that leads todata loss in case multiple PLUS streams send theircritical interval at exactly the same time, eachassuming the network can support it. The proba-bility of such a situation is low and we can reduceit further with the number of B frames in a stream.

We can also use SleepClock to sleep for the nextm events. Note that m can be a fraction of an event.We use this to implement smoothed stream trans-mission in the PLUS protocol. A video stream frameconsists of a sequence of one or more packets. Eachframe has to be sent out according to its nominaldisplay rate. However, I frames normally contain alarger number of packets than P or B frames, whichmust be sent per interval. To reduce the bandwidthrequired by I frames, we can smooth packets over anumber of n frames (which equals n events). To uni-formly send the packets, we can assign a fraction ofthe interval time for an event to each packet.

Integration of the DSD and PLUS schemesIn addition to adjusting to the best DSD, the

DSD scheme also responds to the adjustmentneeded by the PLUS scheme. It’s straightforwardto respond to slow-down requests. In case ofspeedup, the time period t of the response at theserver side must be carefully calculated. Assumethat the nominal display rate is r, which issmoothed based on the PLUS scheme, and thecurrent maximum transmission rate available is R(R > r). We then must maintain

R * t − r * t ≤ Bavl

where Bavl is the extra buffer space availablebeyond the data buffered for the nominal play-back. Thus,

During the congestion period, instead ofincreasing the DSD period with a large amount ofdata to transmit, the system drops noncritical Bframes to maintain a reasonable DSD.

Playout synchronizationConsider the issue of enforcing smooth multi-

media stream presentation at client sites. This iscritical for building an adaptive presentation sys-tem that can handle high rate stream presentation

t

BR r

avl ≤−

b

r =

packet_size∆

∆r

W=

pkts_in_critical_interval

68

IEEE

Mul

tiM

edia

Page 14: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

variance and delay variance of networks. InJohnson and Zhang,8 we observed that dynamicplayout scheduling can greatly enhance thesmoothness of media-stream presentations. Wethus formulated a framework for various presen-tation scheduling algorithms to support QoSrequirements specified in the presentations.8

We defined various QoS parameters to speci-fy requirements on multiple-stream presenta-tions.9 At the client, users submit their QoSparameters for processing their multimedia data.For example, Little and Ghafoor9 have proposedQoS parameters including average delay, speedratio, utilization, jitter, and skew to measure theQoS for multimedia data presentation. In ourplayout scheduling framework, we consider indi-vidual stream (intrastream constraints) QoSparameters:

❚ Maximum rate change denotes the range withinwhich the rendition rate of stream s can vary.This is a fraction of the nominal rate for s.

❚ Maximum instantaneous drift denotes the maxi-mum time difference at any given timebetween the presentation progress of thestream and the presentation progress at thenominal presentation rate.

We also consider QOS parameters for multiplestreams (interstream constraint):

❚ Synchronization jitter denotes the maximumallowable drift between the constituent pre-sentation streams. This drift is the differencebetween the presentation progress of the fastestand slowest streams.

We designed a scheduler within the NetMedia-client to support smooth multimedia stream pre-sentations at the client site in the distributedenvironment. Our primary goal in the design ofplayout scheduling algorithms is to create smoothand relatively hiccup-free presentations in thedelay-prone environments.

Our algorithms can dynamically adjust anyrate variations that aren’t maintained in the datatransmission. We define the rendition rate as theinstantaneous rate of a media stream’s presenta-tion. Each stream must report to the scheduler(which implements the synchronization algo-rithm) its own progress periodically. The sched-uler in turn reports to each stream the renditionrate required to maintain the desired presentation.

The individual streams must try to follow this ren-dition rate. The client has several threads runningconcurrently to achieve the presentation.

Let the nominal rendition rate for a stream si

be denoted by R nsi. We express this rate as the

number of stream granules per unit time. Forexample, the nominal rendition rate for videocould be 30 frames per second, with one framecomprising a granule. To meet this schedulingrequirement, the time period between consecutiveframes must be 1/30 seconds. Thus, we canachieve intrastream synchronization by display-ing the frames at this rate. However, because ofvarious factors affecting system load, it may notalways be possible to achieve this rate. We musttherefore specify a reasonable range within whichdisplay operations will fall. We integrate twointrastream QoS parameters—maximum ratechange csi and maximum instantaneous drift dsi—and one interstream QoS parameter, synchro-nization jitter j, into the stream presentation.

We maintain intrastream constraints byenforcing constraints imposed by csi and dsi. Themaximum rate change csi represents the rangewithin which the rendition rate of stream si canvary. It’s a fraction of Rn

si,

(8)

where R si (t) is the rendition rate of si at a timeinstant t. We assume that when we display astream granule, the rendition rate R si(t) of si

remains unchanged.We now define progress psi(t) of a stream si at

time t as the presentation time of granules dis-played to date, assuming the nominal presentationrate. In general, we measure progress in terms ofthe granules displayed to date. Let the presentationof stream si start at time t0. We say that stream si isprogressing at the nominal rate if, at time t, thegranules displayed to date are Rn

si ∗ (t−t0). In our dis-cussion, we specify progress in terms of time ratherthan in granules. Stream si is progressing at thenominal rate if, at time t, the progress measured intime is also t − t0. We arrive at this by dividing thegranules displayed to date (Rn

si ∗ (t−t0) for the nom-inal rate) by Rn

si. We can express the progress interms of time by dividing the granules displayedby Rn

si. If the progress is either less or greater thant − t0, then the presentation is either below orabove the nominal rate. The presentation is at thenominal rate if the progress equals t − t0.

The maximum instantaneous drift at any giventime dsi denotes the maximum time difference

R c R t R csn

s s sn

si i i i i* ( ) * ( ),1 1− ≤ ( ) ≤ +

69

January–M

arch 2002

Page 15: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

between the progress of the stream and theprogress assumed under the nominal presentationrate. If the stream started at time t0, then, giventhe nominal presentation rate, the progress attime t is t − t0. Thus, we should have

(9)

We assume that users can specify an interstreamjitter j to enforce synchronization requirements.The interstream synchronization requirements canbe met if, for each synchronization point at time tand all streams si and sk in the presentation,

(10)

The scheduling algorithms presented in Johnsonand Zhang8 dynamically formulate rendition ratesfor media streams such that the three conditionsgiven in Equations 8, 9, and 10 are satisfied. Thecentral idea in these algorithms is that at any syn-chronization point, the rates of all streams from thispoint to the next synchronization point are calcu-lated to meet the constraints. The intrastream con-straints give us the allowed range for each stream

rate. The interstream constraint givesthe relationship among these rates.The algorithms give two options—minimizing change in rate or mini-mizing jitter. You can find the detailsof these algorithms in Johnson andZhang.8

For synchronization betweenvideo and audio streams, we use theplayout scheduling algorithms pro-posed in Johnson and Zhang.8 Whenasynchrony occurs between audioand video streams, it changes thevideo assembling rate (hence, thevideo display rate) while maintainingthe audio assembling rate (hence, theaudio display rate) as the nominalaudio display rate. We change the

video assembling rate by changing SleepClock’sinterval.

ExperimentsWe’ve developed a multimedia presentation

application prototype called NetMedia VirtualClassroom (termed NetMedia-VCR) based on theNetMedia design. We aim to provide a distributedlearning environment for students. We imple-mented middleware architecture and prototype inC++ for Sun machines, which the campus widelyuses. As part of the preprocessing work, we record-ed and provided lectures in a database at thedepartmental server. The idea is that students canremotely connect to the server and retrieve lecturematerial for review and quiz preparation.

Figure 11 shows the graphical user interface ofNetMedia-VCR. Students can customize the pre-sentation to fit their individual learning require-ments. As Figure 11 shows, students can constructa presentation out of audio, video, and trans-parencies. Students have free access to skip, seek,rewind, and fast forward learning sessions. Theycan also mark transparencies and print them forlater use.

| ( ) ( ) |p t p t js si k− ≤

| ( ) ( ) |p t t t ds si i− − ≤0

70

IEEE

Mul

tiM

edia

Figure 11. Screen shot ofthe NetMedia-VCR userinterface.

Table 1. CSE530 test stream and its burstiness.

Course Total Size Encoding Average Size Largest SmallestTitle (Mbytes) Type Resolution (Bytes) Frame Frame

CSE530.v 17.9 MPEG-1 320 to 240 3647 9846 1001

CSE530.a 2.4 Raw audio – 556 –* –*CSE530.s 0.2 Postscript file – – – –* The audio size is constant per time unit.

– The information doesn’t apply.

Page 16: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

Users can start the server on any Sun-OS com-patible machine. We tested the system at four dif-ferent sites. We ran the server in Buffalo, NewYork, Purdue University, Middle East TechnicalUniversity, Turkey, and the University ofTechnology, Darmstadt, Germany. The serverrequires about 0.15 percent system resources on aSun Ultra 10 with 128 Mbytes, delivering audio,video, and transparencies. The server uses 2.2Mbytes of disk space. For transparency data deliv-ery, we start an additional remote procedure call(RPC) server. During the tests, the client residedin our Buffalo lab. We used a Sun Ultra 5 with 64Mbytes of memory as the client.

Researchers have defined various QoS parame-ters to specify QoS requirements on multiple-stream presentations.9-11 In our experiments, wemeasured two particularly important QoS para-meters—latency and jitter. During each experi-ment we recorded the jitter, network delay, andpacket loss. We used multiple recordings from lec-tures provided at the State University of New Yorkat Buffalo as test streams. Table 1 presents the sta-tistics about a stream from our CSE530 NetworkCommunication class lectures.

We conducted test runs every hour to all foursites over a three-month period. We determinedthe network performance at these sites by theaverage packet loss experienced during the exper-iments (Buffalo’s average packet loss was 0.01 per-cent, Purdue’s was 1.3 percent, and Germany’swas 8.34 percent). Network connections at theBuffalo campus were nearly 100 percent loss-free.All connections ran at 100baseT supported by aFiber Distributed Data Interface (FDDI) backbonepassing at most one router. We classified the net-work load during the experiments as ideal.Experiments to Purdue and Germany experiencedlight packet loss most of the time. Packets nor-mally traveled through four (Purdue) to nine(Germany) routers.

Figure 12 shows the jitter between audio andvideo during the test runs. Measured using thesynchronization algorithm given in Johnson andZhang,8 the jitter obtained on the Buffalo campusremains on average in the ranges of 0 to 60 mil-liseconds (ms) throughout the presentation,which is much less than the upper bound (80 ms)given in Steinmetz and Nahrstedt.10 Jitter obtainedfrom Purdue and Germany experienced spikes incase of packet loss. On average, the jitter was stillbelow the requested 80 ms.

Figure 13 (next page) shows loss rates for audioand video streams with respect to different DSDs

calculated by the BestDSD Finder. From these twofigures, we can observe that, at any time, the DSDswith the lowest data loss rate (0 in this case) forma range. During the entire presentation, all the

71

January–M

arch 2002

CSE

530

Ger

man

y

Jitte

r

0

20

40

60

80

100

120

140

160

200

220

240

260

280

300

0(c)

50 100 150

Time in seconds × 103

200 250 300

180

CSE

530

Purd

ue

Jitte

r

0

20

40

60

80

100

120

140

160

200

220

240

260

280

300

0(b)

50 100 150

Time in seconds × 103

200 250 300

180

CSE

530

Buffa

lo

Jitte

r

0

20

40

60

80

100

120

140

160

0(a)

50 100 150

Time in seconds × 103

200 250 300

180

Figure 12. Jitter between audio and video: servers at (a) Buffalo,(b) Purdue, and (c) Germany.

Page 17: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

ranges form a valley. Then the finder chooses thebest DSDs to form a line within this valley.

Figure 14a compares the average loss recordeddaily from the PLUS protocol against a 5-secondsmoothing implementation with the server inMiddle East Technical University, Turkey. ThePLUS protocol could reduce the loss rate mainly

because of the bandwidth reduction in case ofcongestion. To test the effect of proactively test-ing the available bandwidth of the PLUS protocol,we compared the number of received and suc-cessfully decoded frames (Figure 14b) between thesmoothed transmission and the PLUS scheme.

On average, the PLUS scheme could deliver 47percent more frames to the client than thesmoothed transmission. On one hand, we canachieve this by backing off the sending rate in con-gestion detection. To study the effect on proactive-ly testing the network, we compared the PLUSprotocol to a smoothed transmission, which backsoff if the packet loss exceeds 10 percent within a 5-second test interval. As Figure 14b shows, the PLUSprotocol doesn’t rely on just the backoff mecha-nism to protect its content. The increased numberof saved B frames (even though they send moreaggressively) could be achieved because it saves alarger number of critical frames. The probing natureof the PLUS protocol causes this. Transmitting Bframes with a higher bandwidth causes the routerto be more likely to drop this kind of frame (and ini-tiate proactive frame dropping). Probing also paus-es before a critical packet stream (thereby giving therouter a chance to switch to a different stream andprocess the critical packets later), which positivelyimpacts the survival rate of critical packets.

The experiments clearly demonstrate ourframework’s performance and usability. Interac-tive requests could be delivered immediately dur-ing any test run. As long as the network isn’t

72

1,000

2,000

3,000DSD in milliseconds

30

60

90

Tim

e in

seco

nds

00.20.40.60.8

1

1,000

2,000

3,000DSD in milliseconds

30

60

90

Tim

e in

seco

nds

00.20.40.60.8

1

Loss

rat

e

(a)

(b)

Figure 13. Loss rates atdifferent DSDs with theserver in Germany: (a) video stream and (b) audio stream.

90

80

70

60

50

40

30

20

10

0

Loss

in p

erce

ntag

es

17:15 18:20 19:20 20:30 21:30 17:15 23:0024:00:001:00 2:00Time

Packet loss in percentages

PLUS Smoothed

Average number of frames received 500

450

400

350

300

250

200

150

100

50

0Smoothed Smoothed + drop PLUS

B-framesP-framesI-frames

(a) (b)

Figure 14. (a) Comparison of the PLUS protocol against 5-second smoothing with the server in Turkey. (b) Comparison of received anddecoded frames with the server in Turkey.

Page 18: Netmedia: streaming multimedia presentations in …ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2002/Jan/Net...multimedia presentations, and each server simul-taneously supports multiple

completely cloaked we can guarantee the avail-ability and timeliness of data as demanded.

ConclusionThe advantage of the middleware NetMedia is

to give application developers a set of services thathides the complexity of distributed multimediadata retrieval, transmission, and synchronization.From the point of the application developer,access to multimedia data becomes as easy as read-ing a file from the local disk. To achieve this, thesystem has to integrate connection management,buffer management, payload delivery, QoS man-agement, and synchronization into one frame-work and provide an easily accessible applicationprogramming interface (API) to the developer.The NetMedia framework is a unicast middleware.One future aim is the integration of multicastingstreams into our proposed NetMedia middleware.

MM

References1. T. Lee et al., “Querying Multimedia Presentations

Based on Content,” IEEE Trans. Knowledge and Data

Eng., vol. 11, no. 3, May/June 1999, pp. 361-385.2. E. Fox and L. Kieffer, “Multimedia Curricula,

Courses, and Knowledge Modules,” ACM Computing

Surveys, vol. 27, no. 4, Dec. 1995, pp. 549-551.3. Y. Song, M. Mielke, and A. Zhang, “NetMedia:

Synchronized Streaming of Multimedia Presentationsin Distributed Environments,” Proc. IEEE Int’l Conf.

Multimedia Computing and Systems, IEEE CS Press, LosAlamitos, Calif., 1999, pp. 585-590.

4. B. Haskell, A. Puri, and A. Netravaldi, Digital Video:

An Introduction to MPEG2, Chapman and Hall, NewYork, 1996.

5. M. Vernick, The Design, Implementation, and Analysis of

the Stony Brook Video Server, PhD thesis, Dept. Com-puter Science, SUNY Stony Brook, New York, 1996.

6. W. Feng, F. Jahanian, and S. Sechrest, “Providing VCRFunctionality in a Constant Quality Video On-DemandTransportation Service,” Proc. IEEE Multimedia, IEEE CSPress, Los Alamitos, Calif., 1996, pp. 127-135.

7. S. Keshav, “A Control-Theoretic Approach to FlowControl,” Proc. Conf. Comm. Architecture and

Protocols, ACM Press, New York, 1991, pp. 3-15.8. T.V. Johnson and A. Zhang, “Dynamic Playout

Scheduling Algorithms for Continuous MultimediaStreams,” ACM Multimedia Systems, vol. 7, no. 4, July1999, pp. 312-325.

9. T. Little and A. Ghafoor, “Network Considerationsfor Distributed Multimedia Object Composition andCommunication,” IEEE Network, vol. 4, no. 6, Nov.1990, pp. 32-49.

10. R. Steinmetz and K. Nahrstedt, “Multimedia:Computing,” Communications and Applications,Prentice Hall, New York, 1995.

11. D. Wijesekera and J. Srivastava, “Quality of Server(QoS) Metrics for Continuous Media,” Int’l J.

Multimedia Tools and Applications, vol. 3, no. 2, Sept.1996, pp. 127-166.

Aidong Zhang is an associate pro-fessor in the Department of Com-puter Science and Engineering atState University of New York atBuffalo. She received her PhD incomputer science from Purdue

University in 1994. Her research interests include content-based image retrieval, digital libraries, and data mining.She is co-chair of the technical program committee forACM Multimedia 2001 and a NSF Career award recipient.

Yuqing Song received his BS inmathematics from Nanjing Uni-versity in China in 1990, and hisMS in computer science in 1993from the Institute of Mathematics,Chinese Academy of Sciences. He

expects to complete his PhD in computer science fromthe State University of New York at Buffalo in May 2002.His research interests are multimedia, image processing,and computer vision.

Markus Mielke is a program man-ager with the Internet Explorerteam at Microsoft Corporation.His research interests include mul-timedia systems, streaming tech-nologies, and peer-to-peer and

Web services. He earned his MS degree in computerengineering from the University of Technology,Darmstadt, Germany. He received his PhD in computerscience and engineering from the State University ofNew York at Buffalo in 2000.

Readers may contact Aidong Zhang at the Dept. of Com-puter Science and Engineering, State Univ. of New York,Buffalo, NY 14260, email [email protected].

For further information on this or any other computing

topic, please visit our Digital Library at http://computer.

org/publications/dlib.

January–M

arch 2002

73