Developing with VoiceXML Building a Video Conference Application

  • View

  • Download

Embed Size (px)

Text of Developing with VoiceXML Building a Video Conference Application

  • 1.

2. Developing with VoiceXML Building a Video Conference Application 3. Agenda

  • VoiceXML
  • Video using VoiceXML
  • Components of a Video Conference Server
  • System Architecture
  • SIP & RTP flows
  • JSLEE & Mobicents
  • Software Architecture
  • Controlling Participants
  • Putting it all together

4. VoiceXML What is it?

  • VoiceXML is an IVR scripting language
  • Used to develop complex IVR applications, such as
    • Phone-based self-help services (i.e., labyrinths )
    • Multi-level auto-attendants
    • Calling card services
  • Standardized by W3C
  • Also used as a control protocol between VoIP application servers and media servers
  • Supported by most media server vendors

5. Developing an Application with VoiceXML

  • This presentation shows how to develop a video conferencing application using VoiceXML and off-the-shelf components
  • We will use the Voxpilot/HP video extensions to VoiceXML
    • Provides playing and recording video prompt
    • Supports multiple video codecs
    • Proposed by Burke (Voxpilot) & McGlashan (HP)
    • Extensions may get integrated into VoiceXML 3.0
  • We will cover the system architecture, components, protocols, and support for multiple audio and video codecs

6. Video Conferencing Application

  • Components for building a video conferencing solution are now much cheaper
    • Good web cam
    • Good headset
    • Video softclient
    • Open source telecom framework
    • Video-enabled media server
  • Small & Medium Enterprises can now use it
  • Can even be deployed in home offices

7. Video Conference Application - Goals

  • Low Cost
    • Must use standard components & protocols
  • Easy to Use & Minimal Learning Curve
    • Same interface as existing meet-me conference bridges
    • Advanced interface accessible through web
  • Provide Common Conferencing Features
    • PIN number validation
    • Mute one or more participants
    • Prime Speaker
    • Manual or Automated Video Source Control
  • Good Video Quality
    • At least CIF (352x288) @ 15 frames/second

8. Video Conference System Architecture Video Conference Application Video-Enabled Media Server SIP SIP SIP SIP, NETANN, VoiceXML RTP RTP RTP 9. Major Component Responsibilities

  • Video Conferencing Application
    • Back-to-back SIP user agent (B2BUA)
    • Controls conference participants when conference is up
      • e.g., muting a participant, giving priority to a participant
    • Delegates to VoiceXML script the task of PIN validation
  • VoiceXML script
    • Validates PIN (uses CGI script to access a database)
    • Transfers the call to conference bridge
  • Media Server
    • Executes VoiceXML script
    • Performs audio mixing
    • Performs video processing

10. Basic Call Flow SIP & VoiceXML 1. SIP INVITE validate.cgi?phone=5551212&pin=1234 Video Conference Application Refer-To: sip:conf-1@MS 11. SIP 200 OK 3. HTTP GET 4. SIP 200 OK 5. SIP 200 OK 2. SIP INVITE voicexml=http://as/askpin.vxml 6. HTTP POST 10. SIP reINVITE 7. SIP REFER 8. SIP INVITE sip:conf-1@MS 9. SIP 200 OK 11. Video Conference Application Software

  • Video application is built on top of JSLEE, a Java real-time framework
  • Database contains a list of active conferences, phone numbers, and PINs
  • Apache provides
    • Web pages
    • Access to VoiceXML scripts
    • Access to media files
    • Execution of CGI scripts

12. Video-enabled Media Server

  • Video-enabled Media Servers are available from many vendors
  • Select a media server that supports Video IVR and Video Conferencing
    • Video Codec: H.263 and H.264 @ CIF resolution (352x288)
  • Video Conferencing mode should have at least:
    • Manual Control
    • Automated control (e.g., follow-me)
  • Audio mixing should provide:
    • Audio Codec: G.711 ulaw/A-Law, G.729, AMR
    • Audio mixing without introducing echo
    • Noise reduction
    • Packet Loss Concealment algorithm

13. Mobicents a Telecom Framework

  • Mobicents is an open source JSLEE container
    • JSLEE is a Java-based framework for real-time apps
    • JSLEE is to telecom what J2EE is to business apps
    • Mobicents is written by some JBoss developers
  • Mobicents provides:
    • Soft real-time event routing
    • SIP stack
    • Traces, logs, alarms
  • See

14. Mobicents - Internals 15. Video Conference Application Software Components 16. IVR Service

  • IVR service provides a high-level API to playback and digit collection functions
    • Hides details of SIP and media server protocols
    • Isolates applications from Media Server protocol
      • e.g., if MS protocol changes from VoiceXML to MSCML, only IVR service must change
  • IVR service implemented as a JSLEE Service Building Block (SBB)

17. IVR Service - API

  • Simple API hides IVR complexity
  • Instantiate and send an event to IvrSBB
  • Events supported:
    • CreateConnection
    • Play
    • PlayCollect
    • Release

18. VoiceXML Using video extensions

  • Playing video clips using VoiceXML 2.0
    • Two video clips are provided: one for H.263 video clients, the other for H.264 video clients
    • The example is using the VoiceXML fallback audio feature for supporting both codecs
    • The VoiceXML interpreter will try to play each video clip in the list until it finds one that is compatible with the video codec of the remote device

19. Muting a participant using RFC3264

  • The conference leader can mute a participant
  • This is achieved by the Video Conf App sending a SIP reINVITE with SDP containing a=sendonly to the media server:
  • v=0
  • o=Caller 10 20 IP4
  • s=Participant
  • c=IN IP4
  • t=0 0
  • m=audio 5004 RTP/AVP 0
  • a=rtpmap:0 PCMU/8000
  • a=sendonly

20. Manual Control of the Video Feed

  • The conference leader can manually control the video feed displayed to all participants
    • This is achieved by turning-off the video source of all participants except one
    • Send a reINVITE with SDP containing a=sendonly applied to video:
  • v=0
  • o=Caller 10 20 IP4
  • s=Participant
  • c=IN IP4
  • t=0 0
  • m=audio 5004 RTP/AVP 0
  • a=rtpmap:0 PCMU/8000
  • a=sendrecv
  • m=video 5006 RTP/AVP 98
  • a=rtpmap:98 H264/90000
  • a=sendonly

21. Putting it all Together

  • VoiceXML provides the user interface to the video conference
  • Mobicents provides an easy to use real-time framework for telecom applications
    • Mobicents hides SIP complexity
  • Building the business logic for a video conferencing application is no longer difficult
  • Low-cost video phones and softclients make this solution possible
  • Entire solution can be deployed in small and med