5
THE WEB LECTURE ARCHIVE PROJECT S. Goldfarb, University of Michigan, Ann Arbor, MI 48109, USA J. Herr, University of Michigan, Ann Arbor, MI 48109, USA H. A. Neal, University of Michigan, Ann Arbor, MI 48109, USA K. M. Storr, CERN, 1211 Geneva, Switzerland Abstract The Web Lecture Archive Project (WLAP) is a joint project between the University of Michigan and CERN [1] that has recorded, archived and published web-based lectures and tutorials for the ATLAS Collaboration, the LHC, CERN, and the University since 1999. This paper presents an overview of the project, its history, achievements and current projects. In this context, we provide a brief technical description of web lectures, how they are constructed and published, and the content they typically provide to the viewer. We focus on issues specific to lectures recorded for HEP (High-Energy Physics), the university classroom, and large-scale conferences. Finally, we describe current projects designed to address these issues and discuss possible future directions for the field. INTRODUCTION Project History WLAP [2] was launched in 1999 as a pilot project [3] coordinated by the University of Michigan ATLAS Collaboratory Project [4] and sponsored by the U. S. National Science Foundation to investigate the usefulness and feasibility of recording and archiving web-based electronic lectures for a HEP (High-Energy Physics) audience. The initial project focused on recording lectures under a variety of circumstances, including educational lectures in an auditorium setting, plenary meetings, and tutorials. Electronic archives, in which the slides of a presentation are synchronized to the video of the lecturer, were constructed using the Sync-O-Matic application [5] of Charles Severance, one of the founders of the project. The prestigious CERN Summer Student Programme [6] lectures were chosen as an initial test bed, along with plenary meetings and software tutorials of the ATLAS Collaboration [7]. No formal program was specified, as the goal was to evaluate the recording and archival methods, as well as the quality and value of the product. So, additional opportunities, such as CERN seminars, colloquia, and tutorials on LHC software projects were commonly included in the tests. The success of the pilot project [8] was immediate and evident, based on the positive feedback concerning the posted lectures and an abundance of requests for more ATLAS and CERN recordings. The University of Michigan, ATLAS, and CERN Academic and Technical Training launched follow-up programs to record and publish web lectures, and our partnership was born, with technical support from CERN IT. In the next few years, web-based repositories were brought on-line at CERN [9] and at the University [2] to host the growing supply of lectures. The project team has gained tremendous experience since that time, adapting to the ever-increasing demand and providing production services, complementary to the research. We have applied this experience to advances in the technology and methods used for capturing, building and publishing lectures, which we describe briefly here. Technological Achievements The large-scale production of web lectures (and limited resources) drove the project team to seek automation at nearly every step of the process. This led to the natural development of new recording and publishing techniques, which were written up, along with a description of the project, in 2001 [10]. Several of the more important technological advances that have been achieved or that are currently in development include: hardware and software solutions to automate the encoding, compression and synchronization of audio, video and slides; a proposed XML standard, called Lecture Object, to facilitate the archiving and sharing of multimedia presentations in an open fashion; a robotics camera tracking system to remove the need for a camera operator in tracking speakers; software to harvest text from captured slides, associating the resulting metadata with relevant sections within a lecture and radically improving search capabilities. The Lecture Object standard was first proposed in 2001 [11]. A complete description of the robotic camera project is presented as a separate contribution to the proceedings of this workshop. Both projects are discussed in more detail below. Current Activities and Research Focus Our group regularly records ATLAS, LHC, CERN and University of Michigan events, hosting a significant archive of hundreds of lectures for these communities, while simultaneously benefiting from each recording as a test bed for newly developed technologies. Over the years, our program has included the recording and publishing of web lectures for: ATLAS Collaboration plenary sessions, workshops, physics and computing tutorials; general CERN and LHC computing seminars and tutorials;

THE WEB LECTURE ARCHIVE PROJECT - University of Michiganatlascollab.umich.edu/docs/CHEP2006_WLAP-Paper.pdfa larger, high-resolution image of the slides, screen capture, or other material

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE WEB LECTURE ARCHIVE PROJECT - University of Michiganatlascollab.umich.edu/docs/CHEP2006_WLAP-Paper.pdfa larger, high-resolution image of the slides, screen capture, or other material

THE WEB LECTURE ARCHIVE PROJECT

S. Goldfarb, University of Michigan, Ann Arbor, MI 48109, USA J. Herr, University of Michigan, Ann Arbor, MI 48109, USA

H. A. Neal, University of Michigan, Ann Arbor, MI 48109, USA K. M. Storr, CERN, 1211 Geneva, Switzerland

Abstract The Web Lecture Archive Project (WLAP) is a joint

project between the University of Michigan and CERN [1] that has recorded, archived and published web-based lectures and tutorials for the ATLAS Collaboration, the LHC, CERN, and the University since 1999. This paper presents an overview of the project, its history, achievements and current projects.

In this context, we provide a brief technical description of web lectures, how they are constructed and published, and the content they typically provide to the viewer. We focus on issues specific to lectures recorded for HEP (High-Energy Physics), the university classroom, and large-scale conferences. Finally, we describe current projects designed to address these issues and discuss possible future directions for the field.

INTRODUCTION Project History

WLAP [2] was launched in 1999 as a pilot project [3] coordinated by the University of Michigan ATLAS Collaboratory Project [4] and sponsored by the U. S. National Science Foundation to investigate the usefulness and feasibility of recording and archiving web-based electronic lectures for a HEP (High-Energy Physics) audience. The initial project focused on recording lectures under a variety of circumstances, including educational lectures in an auditorium setting, plenary meetings, and tutorials. Electronic archives, in which the slides of a presentation are synchronized to the video of the lecturer, were constructed using the Sync-O-Matic application [5] of Charles Severance, one of the founders of the project.

The prestigious CERN Summer Student Programme [6] lectures were chosen as an initial test bed, along with plenary meetings and software tutorials of the ATLAS Collaboration [7]. No formal program was specified, as the goal was to evaluate the recording and archival methods, as well as the quality and value of the product. So, additional opportunities, such as CERN seminars, colloquia, and tutorials on LHC software projects were commonly included in the tests.

The success of the pilot project [8] was immediate and evident, based on the positive feedback concerning the posted lectures and an abundance of requests for more ATLAS and CERN recordings. The University of Michigan, ATLAS, and CERN Academic and Technical Training launched follow-up programs to record and publish web lectures, and our partnership was born, with technical support from CERN IT.

In the next few years, web-based repositories were brought on-line at CERN [9] and at the University [2] to host the growing supply of lectures. The project team has gained tremendous experience since that time, adapting to the ever-increasing demand and providing production services, complementary to the research. We have applied this experience to advances in the technology and methods used for capturing, building and publishing lectures, which we describe briefly here.

Technological Achievements The large-scale production of web lectures (and limited

resources) drove the project team to seek automation at nearly every step of the process. This led to the natural development of new recording and publishing techniques, which were written up, along with a description of the project, in 2001 [10].

Several of the more important technological advances that have been achieved or that are currently in development include: • hardware and software solutions to automate the

encoding, compression and synchronization of audio, video and slides;

• a proposed XML standard, called Lecture Object, to facilitate the archiving and sharing of multimedia presentations in an open fashion;

• a robotics camera tracking system to remove the need for a camera operator in tracking speakers;

• software to harvest text from captured slides, associating the resulting metadata with relevant sections within a lecture and radically improving search capabilities.

The Lecture Object standard was first proposed in 2001 [11]. A complete description of the robotic camera project is presented as a separate contribution to the proceedings of this workshop. Both projects are discussed in more detail below.

Current Activities and Research Focus Our group regularly records ATLAS, LHC, CERN and

University of Michigan events, hosting a significant archive of hundreds of lectures for these communities, while simultaneously benefiting from each recording as a test bed for newly developed technologies. Over the years, our program has included the recording and publishing of web lectures for: • ATLAS Collaboration plenary sessions, workshops,

physics and computing tutorials; • general CERN and LHC computing seminars and

tutorials;

Page 2: THE WEB LECTURE ARCHIVE PROJECT - University of Michiganatlascollab.umich.edu/docs/CHEP2006_WLAP-Paper.pdfa larger, high-resolution image of the slides, screen capture, or other material

• major events at the University or at CERN (seminars from recent Nobel prize recipients, etc.);

• CERN academic and technical training seminars and tutorials;

• Fermilab software tutorials; • University of Michigan Saturday Morning Physics

talks, from 2001 until the present; • 2005 International Conference on Systems Biology at

Harvard; • University of Michigan Medical School Grand

Rounds talks; • American Physical Society conferences. Each of these programs has presented us with new challenges, defining and motivating our research.

While WLAP was one of the first collaborations to create large-scale web lecture archives, there now exists a large and growing number of dedicated web archives, as well as a variety of software applications available for publishing lectures on the web. Lecture archiving is rapidly becoming a common – if not yet standard – procedure for conferences, tutorials, and even the classroom.

WLAP has focused its efforts on improving the technology to handle the specific requirements of large-scale global collaborations, such as ATLAS and the LHC, as well as the academic environment of the university classroom. In the text that follows, we describe the current status of the technology and present several of our ongoing projects designed to address these particular challenges. We start with a brief description of the lectures, themselves.

ADVANCING THE TECHNOLOGY Description of a Web Lecture

An electronic web-based lecture, in general, provides the audience with the audio and video of a speaker making a presentation, together with a view of supporting documents or other material, such as slides, notes, or even screen captures. The supporting material can be presented as images or video streams, often synchronized to the speaker or incorporated into the video of the speaker.

WLAP lectures present the viewer with a video of the speaker making the presentation, high-quality audio, and a larger, high-resolution image of the slides, screen capture, or other material synchronized to the audio/video. Slides change automatically, for example, as a speaker reaches the corresponding part of the presentation. In addition, supporting material is indexed and tagged with metadata to facilitate searches and random access to any location in the lecture. A screen capture of a WLAP lecture is presented in Figure 1.

Example WLAP lectures can be viewed or downloaded from the archive [2]. Viewing of the lectures requires only a web browser and the free Real Player video plug-in, and works on any modern platform.

Lecture Recording for HEP The size and globally distributed nature of modern HEP

collaborations, such as those of the LHC experiments, impose unique requirements and constraints on lecture archiving, as well as on collaborative tools, in general. A complete summary of these requirements is documented in the LCG (LHC Computing Grid) RTAG (Requirements and Technical Assessment) report on Collaborative Tools [12]. Concerning material, our experience is that there is significant demand for the recording of collaboration plenary sessions, training seminars, and tutorial sessions. Recording lectures in each of these environments presents a variety of challenges.

Figure 1: Screen capture of a typical WLAP web lecture.

Archiving of collaboration plenary sessions provides a permanent record of key stages of the planning and decision-making process of a major collaboration. If such sessions can be recorded and made available in a quick manner, however, the archive can also serve as a communication tool, complementary to the remote participation provided by web casting, phone or audio conferencing. Such asynchronous communication can be advantageous both for addressing major time-zone differences and for the capability of selecting only the material of interest.

The need for quick turnaround time from the recording of a lecture to its publication has been an important motivating factor in our development of automated techniques to record, encode and compress audio, video, and support material, as well as our methods to construct lectures in real time.

Training seminars and tutorial sessions are, in general, the most viewed of the lectures in the archive. The existence of these archives is invaluable for large collaborations that need to almost continuously train a new work force in the ever-evolving tools of the trade. Experts on particular topics, who would normally have spent a significant amount of time and resources travelling to train colleagues, can dramatically decrease this effort to a few recording sessions at their home institute, a conference, or at CERN.

Page 3: THE WEB LECTURE ARCHIVE PROJECT - University of Michiganatlascollab.umich.edu/docs/CHEP2006_WLAP-Paper.pdfa larger, high-resolution image of the slides, screen capture, or other material

The archival of training seminars and tutorials typically does not require the fast turnaround time of a plenary session. Some time must be spent on the development of a high quality record of the event, often with the inclusion of additional support material. However, a certain amount of automation is desirable, primarily to reduce the manpower necessary to construct the lectures.

Large-Scale Lecture Recording Collaboration events, such as plenary weeks or

seminars, typically require the effort of 1-2 FTE to handle the recording and archiving of lectures. Large-scale workshops, such as CHEP (Computing in High Energy and Nuclear Physics), or national physical society meetings, such as the APS (American Physical Society) March Meeting, host a large number of parallel sessions, and would require an important scaling of resources, unless a significant degree of additional automation could be achieved.

Another potentially revolutionary use-case for the large-scale recording of lectures is that of the classroom. Several educational institutes, including the University of Michigan, are now beginning to implement the systematic recording of classroom lectures. The University of Michigan School of Dentistry has conducted a successful program of pod casting audio recordings of its first and second year courses, and the Physics Department, in cooperation with our group, is launching the web lecture recording of its General Physics I course, attended by hundreds of students. Our efforts to investigate the feasibility of such large-scale programs for the University and for the APS have led to some dramatic developments toward the eventually complete automation of audio and video capture. We describe one of these developments here.

The Web Lecture Capture Device The Web Lecture Capture Device (WLCD) is a

portable kit designed to provide the tools necessary to automatically capture, construct, and upload WLAP web lectures in real time. The kit includes an audio and video capture device, associated software, and a robotic camera. Development of the kit is coordinated by the University of Michigan ATLAS Collaboratory Project, with the goal of automatically recording classroom lectures at the University for publication on its online course management system, CTools [13].

Perhaps the most challenging and intriguing component of the WLCD project is the development of the robotic camera. If successful, it would be possible to install such a system in a classroom or in the parallel session lecture rooms of major conferences, to record and publish lectures with essentially no manpower, other than that needed for installation and take down. The design and construction of such a system, however, is far from obvious, and has thus far never been successfully achieved.

One could imagine, as an alternative, using the video obtained by placing a camera with a wide-angle view of

the lecture area on a tripod, so as to capture an image of the speaker, regardless if where she or he roams during the presentation. It is our experience, however, that video obtained in such a manner does not provide the quality of images that enrich the content of a lecture, as those obtained by a camera operator. Without a clear view of facial expressions, hand gestures, and other visual cues, the video becomes more of a distraction than an aid and is better off omitted from the final lecture.

We are thus left with the difficult task of attempting to mimic the intelligence and abilities of an experienced camera operator. To achieve this, we have developed a system that is based on the detection of signals coming from an infrared emitting LED necklace, worn by the lecturer. Two cameras are required to make the system work: one camera that tracks movements of the necklace, and a second camera for capturing the video, which is directed by software that inputs and analyzes the signals coming from the first camera. Figure 2 presents the various components of the tracking camera system.

Figure 2: The IR Camera Tracking System, comprising

the camera pair (left) and the IR emitting necklace.

Although development of the robotic camera is still in its early stages, preliminary results are promising and work is now focused on refining the algorithms for differentiating the necklace signal from background due to other infrared sources (bright incandescent lights and the sun, notably) and in tuning the sensitivity of camera reaction to only major movements of the lecturer. The ability to mimic the brain of an experienced camera operator is a distant goal, but providing video from a completely automated lecture capture device with sufficient quality for web lectures appears within reach.

Lecture Viewing The diverse nature of modern HEP collaborations and

their increasingly long lifetimes (20-30 years for the LHC collaborations, e.g.), impose restrictions on the nature of the lecture archives, their storage, and the means by which they can be viewed. Proprietary solutions, which would be otherwise suitable and easily implemented in a commercial setting, pose problems in our environment for a variety of reasons.

Page 4: THE WEB LECTURE ARCHIVE PROJECT - University of Michiganatlascollab.umich.edu/docs/CHEP2006_WLAP-Paper.pdfa larger, high-resolution image of the slides, screen capture, or other material

First of all, large fees often accompany solutions that rely on one particular software technology for archiving and/or viewing. The technology might be free at the onset, but once one has built a significant library of material, one does not want to be held hostage to newly imposed and unforeseen licensing costs. Secondly, the companies providing the solutions and/or the product being exploited might not last as long as is desired for the archive. Some LHC material, such as tutorials or records of major discoveries (hopefully a recurrent theme), will have value for the entire duration of the experiments, if not longer. Very few software commercial software applications have such a long lifetime.

Finally – and perhaps most pragmatically – our HEP colleagues are not easily convinced to buy into specific technologies. In some cases, it is for the reasons cited above, in other cases due to licensing agreements already made within the home institutes, and in other cases for matters of taste. Regardless, it would not be wise to store lectures in a format that required one specific brand of software for viewing, unless that format was easily transformable.

The Lecture Object It is for the reasons cited above that the Lecture Object

was proposed as a standard format for lecture archive storage. Details of the Lecture Object can be found in the proposal [11]. It is essentially the collection of the raw material that has been captured from a presentation, such as audio, video, slides, screen captures, etc., and an XML representation of the information needed to construct a viewable lecture, such as the timing of the slides, metadata, and the names and locations of the media files.

Initial tests of the lecture object included the development of transformations to allow the construction of lectures in SMIL (Synchronized Multimedia Integration Language) [14], a W3C (World-Wide Web Consortium) standard format, or the usual Sync-O-Matic HTML lectures. It should be noted that SMIL was not chosen as the archive format, as it lacks the complete functionality needed to construct lectures for a variety of different viewing applications. For example, Lecture Objects can now be easily transformed into lectures viewable on PDA’s or even Apple iPod’s, as well as for the usual web browser interface. Figure 3 presents a small extraction from the lecture object XML description of a typical WLAP lecture. <?xml version="1.0" encoding="UTF-8"?> <lecture> <par> <video title="Welcome to WLAP" region="speaker-face“ src="rtsp://webcast.cern.ch:5540/giosue.rm" /> <seq title="Sequence of slides" region="slide"> <slide title="First" type="image/gif" begin="00:00:00" region="slide“

src="http://webcast.cern.ch/img001.gif" /> <slide title=“Second" type="image/jpeg" begin="00:00:12" region="slide“ src="http://webcast.cern.ch/img002.jpg" /> </seq> </par> </lecture>

Figure 3: Extraction from a lecture object XML description of a typical WLAP lecture.

All recent WLAP lectures are stored in the archive database as lecture objects. Media files, such as audio and video – a part of the lecture object – can be stored in a physically separate database, provided the transformation software has access. This can be advantageous, depending on the size and type of media stored. Naturally, WLAP media are stored in standard formats, such as MPEG-4 or JPEG. Metadata is stored using RDF (Resource Description Framework) [15], another W3C standard XML language.

FUTURE DIRECTIONS We present here a few ideas for advancements to the

technology, which in our opinion merit immediate effort.

Standardization As with most computing-based efforts efficiency in

lecture archival would greatly benefit from the adaptation of a standard language for lecture description. Universal acceptance of the Lecture Object or an equivalent standard would simplify the development of large-scale web-lecture databases and allow easy sharing of tools and applications.

Our focus in this direction is to complete the development of transformations from LO to a variety of viewing platforms and applications, including Real Player™, QuickTime™, and Media Player™. Much work will also be needed in the political arena, as well, convincing the major players to find agreement, both on the recording and publishing sides.

Integration Merging of web archiving with web casting and video

conferencing is a must and ought to be straightforward. Existing video encoding standards, such as the ITU (International Telecommunication Union) standard H.239 [16] already provide the ability to broadcast simultaneous video signals. Capture and archival of these signals in real time is only a matter of adapting existing technology. Demand for such functionality is evident in HEP as well as academia.

Adaptability The ability to port lectures to new technologies, such as

PDA’s or Apple iPod’s™ [17] has already been demonstrated. This flexibility would be given a serious boost by adaptation of a standard, such as the lecture object.

Page 5: THE WEB LECTURE ARCHIVE PROJECT - University of Michiganatlascollab.umich.edu/docs/CHEP2006_WLAP-Paper.pdfa larger, high-resolution image of the slides, screen capture, or other material

Our focus is on optimizing transformations to adapt to the peculiarities of these devices (small video, limited formats) to create quality lectures for their viewers. This work targets a typically younger audience and could make a lasting impact in education, but also on HEP. After all, shouldn’t CHEP presentations, such as this, be available for anyone, anywhere?

ACKNOWLEDGEMENTS We would like to thank all past and current members of

WLAP for the significant contributions that have helped to make lecture archival a growing success. We thank the CERN Summer School, Academic and Technical Training Programmes, CERN IT, the ATLAS Collaboration, the University of Michigan Department of Physics, the U. M. Media Union, and the ATLAS Collaboratory Project, for their collaboration and support. We also acknowledge and thank the U. S. National Science Foundation and the U. S. Department of Energy for funding the research. Finally, we thank the organizers of CHEP 2006 and the Tata Institute of Fundamental Research for inviting us to present our work and vision.

REFERENCES [1] CERN participants over the years have included IT,

Academic and Technical Training and the Summer Student Programme.

[2] Web Lecture Archive Project: http://www.wlap.org. [3] S. Goldfarb, “Proposal for a Web-Based Lecture

Archive System for CERN,” National Science Foundation Project Proposal (1999): http://webcast.cern.ch/Projects/WebLectureArchive/Project/Proposal99.pdf.

[4] The ATLAS Collaboratory Project: http://vesuvio.physics.lsa.umich.edu/acp/.

[5] Sync-O-matic Software: http://www.syncomat.com. [6] CERN Summer Student Programme:

http://cern.ch/HumanResources/external/recruitment/Students/summ/summ.asp

[7] The ATLAS Collaboration: http://www.atlas.ch/. [8] S. Goldfarb, E. Falaise, “Project Summary: A Web-

Based Lecture Archive System for CERN,” National Science Foundation Project Report (1999): http://webcast.cern.ch/Projects/WebLectureArchive/Project/Summary99.pdf.

[9] CERN WLAP Repository: http://webcast.cern.ch/Projects/WebLectureArchive/.

[10] N. Bousdira, et al., “WLAP: The Web Lecture Archive Project,” CERN-OPEN-2001-066 (2001).

[11] G. Vitaglione, et al., “Lecture Object: An Architecture For Archiving Lectures On The Web,” CERN-OPEN-2001-070 (2001).

[12] S. Goldfarb, et al., “Report of the LHC Computing Grid Project RTAG 12: Collaborative Tools,” CERN-LCG-PEB-2005-07 (2005).

[13] University of Michigan CTools: https://ctools.umich.edu/.

[14] W3C SMIL: http://www.w3.org/AudioVideo/. [15] W3C RDF: http://www.w3.org/RDF/. [16] International Telecommunication Union:

http://www.itu.int/ITU-T/. [17] Apple iPod™: http://www.apple.com/itunes/.