16
D..: First report on integration JSI-KA, UPVLC, RWTH, EML and XEROX Distribution: Public trans Lectures Transcription and Translation of Video Lectures ICT Project Deliverable D.. Project funded by the European Community under the Seventh Framework Programme for Research and Technological Development.

D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

Embed Size (px)

Citation preview

Page 1: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

D5.3.1: First report on integration

JSI-K4A, UPVLC, RWTH, EML and XEROX

Distribution: Public

transLecturesTranscription and Translation of Video Lectures

ICT Project 287755 Deliverable D5.3.1

Project funded by the European Communityunder the Seventh Framework Programme forResearch and Technological Development.

Page 2: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

Project ref no. ICT-287755Project acronym transLecturesProject full title Transcription and Translation of Video LecturesInstrument STREPThematic Priority ICT-2011.4.2 Language TechnologiesStart date / duration 01 November 2011 / 36 Months

Distribution PublicContractual date of delivery April 30, 2013Actual date of delivery April 30, 2013Date of last update May 17, 2013Deliverable number D5.3.1Deliverable title First report on integrationType ReportStatus & version v1.0Number of pages 16Contributing WPs allWP / Task responsible JSI-K4AOther contributors UPVLC, RWTH, EML and XEROXInternal reviewer Jorge Civera, Alfons JuanAuthor(s) JSI-K4A, UPVLC, RWTH, EML and XEROXEC project o�cer Susan Fraser

The partners in transLectures are:Universitat Politècnica de València (UPVLC)XEROX Research Center Europe (XEROX)Josef Stefan Institute (JSI) and its third party Knowledge for All Foundation (K4A)RWTH Aachen University (RWTH)European Media Laboratory GmbH (EML)Deluxe Digital Studios Limited (DDS)

For copies of reports, updates on project activities and other transLectures related information,contact:

The transLectures Project CoordinatorAlfons Juan, Universitat Politècnica de ValènciaCamí de Vera s/n, 46018 València, [email protected] +34 699-307-095, Fax +34 963-877-359

Copies of reports and other material can also be accessed via the project’s homepage:http://www.translectures.eu

© 2012, The Individual AuthorsNo part of this document may be reproduced or transmitted in any form, or by any means, electronic ormechanical, including photocopy, recording, or any information storage and retrieval system, without

permission from the copyright owner.

Page 3: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

Table of Contents

Table of Contents 3

1 Executive Summary 4

2 Introduction 5

3 Integration of the tools into VideoLectures.NET and poliMedia 6

3.1 VideoLectures.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 Updates on previous development . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.2 Report on the installation of the software . . . . . . . . . . . . . . . . . . . . . 7

3.1.3 Integration of transLectures into the VideoLectures.NET website . . . . . . . . 8

3.2 poliMedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Report on the installation of the software . . . . . . . . . . . . . . . . . . . . . 9

3.2.2 Integration structure and use cases . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.3 Integration of transLectures into the poliMedia website . . . . . . . . . . . . . 11

4 Relationship with other work packages 13

4.1 transLectures web service updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 transLectures player development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Conclusions 14

6 References 15

A Acronyms 16

3

Page 4: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

1. Executive Summary

This document contains details about WP5 task 5.3 Integration into the case studies as regards thework done in the �rst 6 months of the task. Task 5.3 itself will continue until the end of the project.

4

Page 5: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

2. Introduction

The objective of task 5.3 is to incrementally integrate the models and tools developed in WP3 (MassiveAdaptation) and WP4 (Intelligent interaction with users) into VideoLectures.NET and poliMedia. Thiswill be achieved through the following means:

• VideoLectures.NET and poliMedia work�ows and players will be modi�ed to start using theresults from these WPs and WP6 (Evaluation).

• Real-time serving of transcriptions/translations via the transLectures web service will be imple-mented.

• Some optimisations will be done to maximise scalability and response e�ectiveness.

This document describes the work done in the �rst 6 months of this task and is divided into 2 sections:

• Description of the integration of the tools into VideoLectures.NET and poliMedia

• Relationship with other work packages

5

Page 6: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

3. Integration of the tools intoVideoLectures.NET and poliMedia

3.1 VideoLectures.NET

3.1.1 Updates on previous development

Architectural changes to admin interface

The new admin interface for VideoLectures.NET is in place and is being used in daily production. Withthe new interface, VideoLectures.NET editors/admins are able to better control settings, users, content’smetadata and processes on the site. From the new main admin screen, admins can easily add newactions for a lecture like sending the video to the transLectures web service for transcription and/ortranslation.

Figure 3.1: New admin interface for VideoLectures.NET

This change to a new admin interface was a complex task, performed on a production machineafter several months of testing. Because of this, the changes to include the transLectures work�ow arestill work in progress, but now it will be easier to implement it in production because of the new adminframework already in place.

6

Page 7: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

3.1.2 Report on the installation of the software

New player

The new VideoLectures.NET site with a new player is up and running (see Figure 3.2).

Figure 3.2: New player for VideoLectures.NET

The new player supports the transLectures infrastructure, thus enabling the display of transcrip-tions and translations for the current video to users of VideoLectures.NET . This is done by query-ing the transLectures web service (running locally on the helium server) on whether any transcrip-tions/translations are available. If they are, the player downloads them (in all available languages) anddisplays them, if the user requests so by pressing the CC button in the lower right corner. The user canselect which language to show as a subtitle from the dynamically created pop-up menu, overlaid on topof the video.

Currently only the Flash version of the player is in the production stage. The development of theHTML5 version of the new player is still work in progress.

transLectures web service for serving transcriptions/translations

The transLectures web service has been installed on the helium server.The original code for the servicefrom UPV has been modi�ed to address local issues, mostly to enable direct access to the �les in thelocal GIT repository, where transcriptions and translations are stored by the consortium partners.

Some additional �xes to make the code more robust were needed because of the presence of non-XMLconformant characters in input data in translations.

7

Page 8: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

3.1.3 Integration of transLectures into the VideoLectures.NET website

transLectures player for WP6 evaluation tasks

The original code for the transLectures player from UPV has been modi�ed for its use in the VideoLec-tures.NET evaluation environment, mostly to enable direct access to the video �les for playing. A localweb page was created with selected videos for the WP6 evaluation task (see Figure 3.3).

Figure 3.3: VideoLectures.NET WP6 evaluation page

After the evaluator selects the video to edit, he/she is presented with the transLectures playerplaying the video in parallel with the transcription, which he/she can edit, saving the changes afterwards(see Figure 3.4).

Figure 3.4: Example of the transLectures player for WP6 on VideoLectures.NET

8

Page 9: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

3.2 poliMedia

3.2.1 Report on the installation of the software

As we stated on deliverable D5.2, we made a new server machine available (hereinafter fuster) for theimplementation of the transLectures services in the poliMedia environment. The implementation ofthese services involved di�erent tasks that have been performed for the successful integration of thetransLectures tools in poliMedia.

As a �rst step, the current transcriptions and translations set of poliMedia videos was locally storedin the server machine. This set is being continuously improved due to the progress being made mainlyon WP3, and its managing is being carried on in WP2. In order to provide an interface for the managingof these transcriptions and translations, development and installation of the transLectures API has been�nally accomplished and the web service is currently being used by poliMedia to make the transcriptionsavailable.

Finally, a local copy of the transLectures player has been installed on the fuster machine to enablecorrections and modi�cations by the user of the poliMedia transcriptions. This player is performing allthe transcription related operations through the previously mentioned web service.

We are now able to show and perform corrections of the current video transcriptions in poliMedia,covering the integration part for the poliMedia case study. Therefore we can declare this process as a�rst successful integration of the transLectures tools into the poliMedia website.

3.2.2 Integration structure and use cases

In this section we are presenting a set of �gures describing the transLectures integration structureinto poliMedia. We will also provide di�erent use case diagrams to describe user interaction with thesystem.

Figure 3.5: poliMedia structure before transLectures integration

Figure 3.5 shows the structure of poliMedia before the transLectures integration process begun.The �rst step was to ingest poliMedia repository videos into the transLectures system to generatethe automatic transcriptions and translations. This process is described in Figure 3.6, which showsthe transLectures production engine system and its main components. After being recorded, videolectures are ingested from the production studio to the system through the transLectures web service.Thereafter the Automatic Speech Recognition and the Machine Translation systems automaticallygenerate the respective transcriptions and translations of the video and store them in the transLecturesdatabase.

9

Page 10: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

Figure 3.6: transLectures ingest system

Once the automatic transcriptions are ready, users can choose to display them on the poliMediaplayer. To this end, the poliMedia player has been updated in order to make use of the transLecturesweb service operations. Figure 3.7 shows the elements and communications involved in this process.

Figure 3.7: Displaying a lecture with transcriptions/translations on poliMedia

Automatic transcriptions and translations are not error free, and therefore users can be dissatis�edwith the quality of the transcription. If they are willing to collaborate on the improvement of transcrip-tion accuracy, the poliMedia player interface provides an “edit transcription” button that redirects tothe transLectures editor, which has been especially designed for this purpose. This process is describedin Figure 3.8.

10

Page 11: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

Figure 3.8: Editing automatic transcriptions/translations

3.2.3 Integration of transLectures into the poliMedia website

We are following the same integration approach for both the VideoLectures.NET and poliMedia cases.The poliMedia website player has been modi�ed in order to make the video transcriptions available byaccessing the transLectures web service installed in fuster. In this way, users are able to see the currentavailable transcriptions in the same poliMedia player, avoiding changes in the player’s interface.

Figure 3.9: poliMedia player showing transLectures transcriptions

As for enabling the edition and correction of transcriptions, the solution provided has been toinclude an Edit transcription button on the poliMedia player that links to the o�cial transLecturesplayer and editor. This player, also working on fuster, takes the video data and enables users to performmodi�cations on the current video’s transcription through its interactive and especially designedinterface, which intends to provide a powerful and intuitive tool for the purpose of editing transcriptions(see Figure 3.10).

11

Page 12: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

Figure 3.10: transLectures player editing interface (horizontal layout)

12

Page 13: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

4. Relationship with other workpackages

Work on WP5 is closely connected with WP2, WP4 and WP6. The continuing development of the player,the transLectures web service and the transcription editing interface is taking place considering theneeds and suggestions from both WP4 and WP6. An open communication channel exists between thesework packages regarding the changes needed for the web service, editing interface and the featuresthat the player will �nally bring to the users.

4.1 transLectures web service updates

The transLectures web service has been successfully deployed and is currently running on both thepoliMedia and VideoLectures.NET sites, providing an interface for the management of transcriptionsand translations. The same web service API is currently being used for both case studies’ players andtheir respective transLectures players for the purpose of retrieving and/or modifying transcriptions.The API itself is described in a separate �le (tLwebservice_API.txt) in the project’s common SVNrepository (accessible by all partners).

The transLectures web service itself has been improved during the WP6 evaluation process to makethe code more robust. These changes were needed because of the presence of non-XML conformantcharacters in input data in translations.

4.2 transLectures player development

Over the last few months, there has been a special focus on the development of the transLecturesplayer. Three di�erent editing layouts have been implemented to let users decide which one of them isbetter suited to their needs. In addition, many new features have been developed, such as a new searchand replace function, a set of useful keyboard shortcuts to speed up the editing process, and a helplayout that is displayed over the player, indicating each element’s function for new users. We havebeen very conscientious on the improvement of the editing interface in order to bring the best editingconditions to the �nal user.

Some of the previously mentioned changes have been carried out in response to the feedbackreceived from the WP6 internal evaluation task. A small subset of the evaluators (teachers) involved inthe internal evaluation task have been already working with the transLectures player and have o�eredus their �rst impressions, which turned out to be very valuable for the improvement of the editinginterface.

We are also collecting detailed usage statistics of the player for the purposes of WP4, regarding usermodelling. Some small changes and �xes on the editing interface have been requested by WP4 as well.

We will continue gathering feedback from both WP4 and WP6 over the next months, which willlead to an improvement of the player interface and, even more importantly, to the �nal satisfaction ofthe users.

13

Page 14: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

5. Conclusions

In this deliverable we have presented the �rst report on integration into VideoLectures.NET andpoliMediaof the models and tools developed in WP3 and WP4. The report covers the �rst 6 months oftransLectures integration into the case studies. Task 5.3 itself will continue until the end of the project(month 36).

The main achievement of this initial integration process has been the successful integration of the�rst round of transcriptions and translations into VideoLectures.NET and poliMedia. From a technicalpoint of view, users of both repositories have the existing set of transcriptions and translations alreadyavailable, although with some restrictions due to the fact that current transcriptions and translationsare not yet in their �nal state.

There has been a signi�cant amount of work in relation with WP6. We have made a very signi�cante�ort in preparing the transLectures editor for the evaluation tasks.

A lot of detailed discussion and interchanges have taken place between WP5 and WP2 regardingthe web services and the DFXP format proposal. As stated in the DoW, we are performing manyoptimisations to maximise scalability and response e�ectiveness, and we expect to continue doing so inthe future.

Finally, we are beginning the implementation of the transLectures player for WP4, supporting theintelligent interaction features. As we did with the previous work, we will prepare a set of di�erentoptions for the use of WP4 services, and we expect to receive feedback from both WP4 and WP6 beforewe decide on a �nal design for the player.

14

Page 15: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

6. References

1. “Annex I - Description of Work” of the transLectures project Grant agreement, version 2011-09-19

2. Deliverable D5.1: Case study scenario de�nition

3. Deliverable D5.2: Prepared platform and case studies

4. transLectures project website: http://translectures.eu/

5. VideoLectures.NET website: http://videolectures.net/

6. Internal poliMedia website at UPV: http://polimedia.blogs.upv.es/

7. Latest version of the Timed Text Markup Language (TTML/DFXP) standard:http://www.w3.org/TR/ttaf1-dfxp/

8. Latest version of the Web Video Text Tracks (WebVTT) standard:http://dev.w3.org/html5/webvtt/

9. UPVLC, XEROX, JSI-K4A, RWTH, EML and DDS. Transcription and Translation of Video Lectures.In Proc. of EAMT, 2012.

[All web pages accessed on 15 May 2013]

15

Page 16: D5.3.1: First report on integration - CORDIScordis.europa.eu/docs/projects/cnect/5/287755/080/deliverables/001... · This document contains details about WP5 task 5.3 Integration

A. Acronyms

UPVLC Universitat Politècnica de ValènciaXRCE XEROX Research Center EuropeJSI Jozef Stefan InstituteK4A Knowledge for All FoundationRWTH RWTH Aachen UniversityEML European Media Laboratory GmbHWP Work PackageDoW Description of WorkTTML Timed Text Markup Language format/standardDFXP Distribution Format EXchange Pro�leWebVTT Web Video Text Tracks format/standardCC Closed CaptioningAPI Application Programming InterfaceXML EXtensible Markup Language

16