51
MASSACHUSETTS INSTIT TE OF TECHNOLOGY JUL 15 2014 LIBRARIES PENTIMENTO:RETROACTIVE EDITING FOR LECTURES by Kenny H. Lam B.S. Physics, B.S. Computer Science & Engineering Massachusetts Institute of Technology, 2013 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology June 2014 @ 2014 Massachusetts Institute of Technology. All rights reserved. Signature redacted Signature of Author: Dep Certified by: Al I "I artment of Electrical Engineering and Computer Science ay 20, 2014 Signature redacted Fredo Durand Professor of Electrical Engineering and Computer Science Thesis Supervisor Signature redacted Accepted by: Albert R. Meyer Professor of Electrical Engineering and Computer Science Chairman, Masters of Engineering Thesis Committee

JUL LIBRARIES - DSpace@MIT Home

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: JUL LIBRARIES - DSpace@MIT Home

MASSACHUSETTS INSTIT TEOF TECHNOLOGY

JUL 15 2014

LIBRARIES

PENTIMENTO:RETROACTIVE EDITING FOR LECTURES

by

Kenny H. Lam

B.S. Physics, B.S. Computer Science & Engineering

Massachusetts Institute of Technology, 2013

Submitted to the Department of Electrical Engineering and Computer Science in Partial

Fulfillment of the Requirements for the Degree of

Master of Engineering in Electrical Engineering and Computer Science

at the Massachusetts Institute of Technology

June 2014

@ 2014 Massachusetts Institute of Technology. All rights reserved.

Signature redactedSignature of Author:

Dep

Certified by:

AlI

"Iartment of Electrical Engineering and Computer Science

ay 20, 2014

Signature redacted

Fredo Durand

Professor of Electrical Engineering and Computer ScienceThesis Supervisor

Signature redactedAccepted by:

Albert R. MeyerProfessor of Electrical Engineering and Computer Science

Chairman, Masters of Engineering Thesis Committee

Page 2: JUL LIBRARIES - DSpace@MIT Home

2

Page 3: JUL LIBRARIES - DSpace@MIT Home

PENTIMENTO:RETROACTIVE EDITING FOR LECTURES

by

Kenny H. Lam

B.S. Physics, B.S. Computer Science & Engineering

Massachusetts Institute of Technology, 2013

Submitted to the Department of Electrical Engineering and Computer Scienceon May 20, 2014

in Partial Fulfillment of the Requirements for the Degree of

Master of Engineering in Electrical Engineering and Computer Scienceat the Massachusetts Institute of Technology

ABSTRACTThe boom in online education has provided for the potential of a personalized lecture

experience for every single student. These recorded lectures provide a major benefit to bothstudents and authors, but currently present several drawbacks as well. The limitations that

exist stem from the method in which lectures are created: using video recorders. Video

recordings inherently limit the editing capabilities of an author and constrain the interaction

from students, providing for a poor choice of media. An alternative encoding of a lecture

could provide for a much fuller feature set to users on both sides of a lecture.

The Pentimento system was designed to promote the expedited creation of hand-drawn

lecture notes for online education platforms such as edX or Coursera. By decoupling the

visual and audio domains of a lecture, content creators are able to more freely fix mistakes or

change small portions without the need to re-record the correct portions. Small recordings are

pieced together to give the final lecture, where the correct synchronization of edits among the

lecture is handled by the system, and the lecture appears to have been seamlessly recorded

in one session. Full control of the data also allows for the potential of increased interactivity

from students.

Thesis Supervisor: Fredo DurandTitle: Associate Professor of Electrical Engineering and Computer Science

3

Page 4: JUL LIBRARIES - DSpace@MIT Home

4

Page 5: JUL LIBRARIES - DSpace@MIT Home

Acknowledgements

First of all, I must thank Fredo Durand for his guidance in the creation of the overall

system. He was the original creator of the Pentimento system, and without his vision, none

of this would have been possible. His experience definitely played a key-role in solving many

of the technical issues which came up in the course of this work. As I encountered the

quickly-growing complexity of the system, I could not help but be amazed that one person

could have built so much alone and without guidance.

I must also thank my collaborators who diligently put work into various parts of the

system; without their efforts, the implementation would have taken dramatically longer.

Steve Komarov and others at edX and Harvard helped provide an early-stage recorder, along

with knowledge of future challenges to overcome. Halla Moore and Richard Lu also created

important building blocks for the system, the undo manager and renderer, respectively. Their

efforts and enthusiasm were invaluable to completion of the project in a timely manner.

I would also like to thank the incredible set of people whom I have met here, and

especially the friends who have helped me get through the of entirety of MIT with some

semblance of my sanity remaining. A special thank you goes to the members of the student

organization Camp Kesem. Though they often encouraged me to embrace the craziness

within, the counselors, the kids, and the families, are truly the greatest inspiration I have

ever encountered. Without their constant laughter, love, and support, I am sure that I would

not be where I am today.

Finally, I must thank both my parents, Mai and Joe Lam, and my brother, Andy Lam.

In so many ways, my brother has been the major driving force that allowed me to get to MIT

in the first place. Without him and his subtle praises or guidance, I doubt I could be here

today. My parents, being immigrants, opened up doors for me through their sheer diligence.

It is through them that I learned how to work hard, and I cannot thank them enough for all

their support throughout the years.

5

Page 6: JUL LIBRARIES - DSpace@MIT Home

6

Page 7: JUL LIBRARIES - DSpace@MIT Home

Contents

1 Introduction 10

1.1 M otivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 O verview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Related Work 16

2.1 Recording ......... ..................................... 16

2.2 Editing ......... ....................................... 17

2.3 Playback .......... ...................................... 18

3 Features 21

3.1 Recording M ode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Edit M ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Undo & Redo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Design 27

4.1 M odels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 28

4.1.1 State M odel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.2 Lecture M odel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7

Page 8: JUL LIBRARIES - DSpace@MIT Home

4.2 Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2.1

4.2.2

4.2.3

4.2.4

4.2.5

4.2.6

4.2.7

4.3 View

Tools Controller . . .

Time Controller . . .

Lecture Controller

Visuals Controller .

Recording Controller

Retiming Controller.

Undo Manager . . .

5 Implementation

5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.3 Editing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.4 Undo& Redo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 Conclusion

6.1 FutureWork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . 3 3

. . . . . . . . . . . . . . . . . . 3 5

. . . . . . . . . . . . . . . . . . 3 5

. . . . . . . . . . . . . . . . . . 3 5

. . . . . . . . . . . . . . . . . . 3 6

. . . . . . . . . . . . . . . . . . 3 6

. . . . . . . . . . . . . . . . . . 3 7

38

40

40

41

43

44

46

47

8

33

Page 9: JUL LIBRARIES - DSpace@MIT Home

List of Figures

1.1 edX Finger Exercises . . . . . . .

2.1 YouTube seek mechanism . .

2.2 YouTube seek for long videos

3.1 Features available in Pentimento .

3.2 Recording tools . . . . . . . . . .

3.3 Editing tools . . . . . . . . . . . .

4.1

4.2

4.3

4.4

4.5

5.1

5.2

MVC Layout . . . . . . . . . . .

Monotonic ordering of constraints

Lecture model hierarchy.....

Controller logic flow . . . . . . .

Overview of the view . . . . . . .

Snap-to-insert . . . . . . . . . . .

Deletion cascade . . . . . . . . .

9

13

19

20

22

23

25

. . . . . . . . . . . . . . . . 2 8

. . . . . . . . . . . . . . . . 3 0

. . . . . . . . . . . . . . . . 3 1

. . . . . . . . . . . . . . . . 3 4

. . . . . . . . . . . . . . . . 3 9

43

44

Page 10: JUL LIBRARIES - DSpace@MIT Home

Chapter 1

Introduction

The current explosion of massively open online courses (MOOCs) has meant that lectures

should no longer be thought of as limited to strictly the classroom anymore, but now have

a reach that is worldwide. With new distribution platforms such as edX and Coursera,

direct access to top-tier education instruction on a variety of subjects is freely and openly

available. The growth of technology into the field of education has also meant that lectures

can be dramatically changed so that no longer need to be delivered in real-time but can be

pre-recorded and replayed later. These pre-recorded and durably stored lectures provide

several benefits, such as:

* any-time accessibility for students to watch on-demand as they please, including the

freedom to stop and resume a lecture at a later, more convenient time

o speed control for students gives students the ability to vary the pace of a lecture

depending on the difficulty of the material, speed of the lecturer, and natural learning

speed of the student

e selective replay of sections which were confusing or misheard; while lectures were

previously monotonic in time and may have moved on without a student, students can

now individually control a personal time in the lecture

10

Page 11: JUL LIBRARIES - DSpace@MIT Home

* ability for students to pause and find external resources on the topic or underlying

principles if they are unfamiliar

However, the benefit of pre-recorded lectures is not simply limited to students, but also

provides value to the creator. Authors can now ensure that derivations and examples are

fully correct ahead of time. Later clarifications can also be inserted directly into the original

lecture, and mistakes or fixes can also be applied to the original lecture as well. Recorded

lectures also allow for incremental updating as the class is taught over several sessions, as the

relevant material should change, and as better techniques in the field arise.

This thesis presents Pentimento, a system which allows flexible authoring of hand-written

lectures. Pentimento no longer relies on the strict video format to deliver a lecture, but

the viewing experience is still seamlessly similar to viewer. The separation of a lecture into

visual and audio components is the primary differentiator of the system from other recording

applications. Section 1.1 further discusses the motivations and applications for this thesis,

while Section 1.2 outlines the remainder of the thesis.

1.1 Motivation

Much of the motivation for this thesis comes from ubiquity of MOOCs and online education

in the current world. We believe the recent entry of the edX and Coursera platforms into the

space and their numerous partnerships with acclaimed universities can bring a shift in the

way higher learning is provided, and has already changed the way in which higher education

is available. Already-existing options of Khan Academy and MIT's OpenCourseWare also

add to the variety of resources available for students seeking to further their education, each

already servicing millions of requests a month [1], [2].

As the manner in which lectures are delivered is changed, so must the tools to take full

advantage of the new media. Currently, video provides a familiar, but antiquated method

11

Page 12: JUL LIBRARIES - DSpace@MIT Home

of delivery for lectures. The first drawback of the video delivery method is that it highly

limits the level of interaction available for students. With videos, the students can pause

and resume, change playback speeds, or skip to another time, but these interactions hardly

constitute meaningful interaction with the content presented.

As a result of the limited interaction available in videos, instructors are unable to make

accurate assessments until an evaluation is issued, returned, and graded-typically in the

form of an exam or problem set. Exams suffer from the fact that they exist at a large

granularity with respect to topics, so not all topics may be thoroughly addressed during

the limits of an exam. This brings about a potentially non-representative distribution of

scores such that it may not reflect student understanding in general, but only on topics which

appeared and in the time constraints given. Problem sets seek to address understanding

at a smaller granularity, but still continue to batch topics into one- or two-week groupings

typically. The trade-off that problem sets make for this smaller granularity, however, is their

rapid pace-which can also present a problem: if an assignment fell to the wayside, students

often need to focus on the current material and neglect to review what has passed until an

exam is upcoming.

Both of these two methodologies fundamentally suffer from the fact that they do not

guide a student's understanding during a lecture, but instead ask students to review lectures

after they have already passed. More in-the-moment assessments could lead to a more

solid understanding the first time material is introduced, as opposed to a look-back-and-

review approach. edX currently provides a version of this by interspersing "finger exercises"

with short video segments, as shown in Figure 1.1. In this way, lectures no longer need

to be so rigid as to enforce one tract upon all students, and students who struggle with

such mini-assessments could be put onto a slower tract than students who excel with the

material. A slower tract could have additional examples or sample problems, rather than

only providing students with the option to re-watch what they have already covered. Such a

system could help to mitigate learning-inequality with its more thorough coverage of material

for slower tracts. Additionally, students who struggle with different concepts can receive more

12

Page 13: JUL LIBRARIES - DSpace@MIT Home

video clips finger exercises

Overview

introduction to Computation

of Programs currently active item~~~ramsLECTURE 3 INTRODUCTION

Simp rt hms

Lecture 3-SimpleAliritbms

Lecture 4- Functions

Recursion and Objects

Eugging Assertionsand Eiceptions

Efficiency, Orders of Growth,Memory and Search

Classes and objectOriented Programming

Plotting Simulations andRandom Walks

Sampling Monte CarloMethods, andStatistical Thinking Download video Download timed transcript

Figure 1.1: An example of the finger exercises from a typical edX course. The exercises are

inter-spliced with shorter sections of video [3].

personalized content on different sections, which one, strictly linear lecture cannot provide.

Also, videos continue to present a variety of problems for students and authors. They

provide a good medium for distribution only in the case of sequential viewing, but the option

to skip forwards or backwards presents the problem of how to discretize or sample the video

in such a manner as to be useful to someone seeking particular content. Video is much more

costly to apply editing techniques to due to the large amount of processing which must be

done on the data. Finally, videos serve as a relatively expensive means of maintaining data

as well, as not all elements in a video frame are relevant to the content meant to be served.

Additionally, since aforementioned benefits of recorded lectures are predicated on the idea

that lectures can easily be manipulated after their initial recording, video is an inappropriate

choice of media. Content which cannot be changed after-the-fact provide rigidity and in

fact limit creators more than benefit them. A lecture which cannot be edited, or where

13

Page 14: JUL LIBRARIES - DSpace@MIT Home

many pains must be taken to do so, force authors to have a fully-fleshed out script prior

to any final recording. These such recordings are a major hindrance since they impose an

all-or-nothing atomicity about them, necessitating that lecturers not only have a full script or

notes beforehand, but also have a recording which is acceptable. Any mistakes or stumbles

can force an entire re-recording from the beginning, since the mistakes will be recorded

inadvertently. Long pauses in the recordings also introduce a very disruptive flow in the

continuity of the lecture, whereas classrooms give lecturers the option to naturally pause for

questions or to pose questions.

Moreover, this also takes away the ability of a lecturer to reuse the recording if any

information should change or need to be updated. While some lecturers may choose to do

their recordings at a smaller granularity so that more local changes can be made, there will be

increased overhead in stitching together the piecewise components. Creators must also impose

some sort of structure for naming if they believe they may need to update the sub-lectures

at a later time. This simply presents yet another barrier to authors of the traditional video

format that is so common, such as from Khan Academy, Coursera, edX, and MIT's OCW [4],

[5], [6], [7].

1.2 Overview

This thesis presents Pentimento, based in principle on the original version proposed by

Professor Durand, but migrated to the web browser with a new overall design of system

components. The key element of the framework is that while other systems record a contiguous

and coupled stream of audio and visuals as they occur, Pentimento allows for a recording of

either through a full decoupling of the audio and visual channels. This has the major benefit

that since channels are recorded separately, they can be edited separately without affecting

the counterpart channel. After each recording is correct, they can later be synchronized to

however the user wishes them, with variable pacing for different sections. The synchronization,

14

Page 15: JUL LIBRARIES - DSpace@MIT Home

however, is non-binding and can still be edited afterwards. At no point is the lecture in any

sort of final or draft state, but instead, the production and editing versions are one and the

same. This is very much by design, and rather reflects the mentality that any document may

never be finished, but may always be edited or updated with new information.

Chapter 2 further discusses other pieces of work which are relevant to Pentimento.

A variety of tools exist for recording or editing, though they do not provide the same

flexibility in editing as our application. Chapter 3 discusses the features and tools which are

available to users of the system. Chapter 4 discusses the design of the data structure and

components which constitute Pentimento, along with the interactions between them. Chapter

5 further discusses specific implementation choices which were made in the current version

of Pentimento and the specific semantics which are available in the application. Chapter 6

provides a summary of the system and its benefits, while also discussing what extensions can

be made on top of what is existing at the time of writing.

15

Page 16: JUL LIBRARIES - DSpace@MIT Home

Chapter 2

Related Work

A variety of applications exist for several of the individual features related to the Pentimento

system, though none embrace all the functionality that can be leveraged with the work here,

or allow for the extensibility that Pentimento provides for other media besides hand-drawn

strokes. Primarily, work relevant to us falls into three general categories of functionality:

recording, editing, and playback.

2.1 Recording

A variety of options are available for the recording of a lecture, varying from true video

recording to screen capture to capture at the level of a stroke. The most basic approach

to capture of lectures is for a full video capture of the lecturer as well as the blackboard.

This can require a substantial amount of hardware, including a camera or camcorder, tripod

stand, and microphone system. Overall, estimates for this type of system estimate costs to be

between $1000 to $3000 according to [8] and [9], not including time or cost for post-processing

and editing. Document cameras are also a feasible approach, though a microphone must be

purchased as an addendum to the video capture [10]. The price of some document cameras

in use, though, can be comparable with a camcorder alone, even with an education pricing

16

Page 17: JUL LIBRARIES - DSpace@MIT Home

[8], [11]. Instead of capturing truly written written strokes, screen capture of hand-written

strokes is another approach popularized by Khan Academy. Solutions such as this can use

any variety of screen capture tools and input tablet, with Khan Academy specifically using

Camtasia Recorder and a Wacom Bamboo Tablet in conjunction with a recording microphone

[12] [13].

A final option is to capture the strokes themselves instead of what is visible at all points

in time. Options such as Penultimate, Note Taker HD, Write, and a host of others referenced

in [14] or [15] will capture writes at the stroke level instead of maintaining an entire video

screen capture at all times, which provides a much more efficient and relevant encoding to

hand-written lectures. More advanced options such as LiveScribe, NoteLedge, SoundNote,

and AudioNote also allow for synchronized recording of audio with the hand-written strokes

[14], [15]. Some options also allow for later recording of audio after the visuals have been

recorded.

2.2 Editing

The domain of editing recorded lectures primarily falls into the domain of video editors,

such as iMovie, Final Cut Pro, Adobe Premiere, or Avid. These video editors are highly

restrictive in that they do not fully leverage the underlying data within a stroke of the lecture,

but are intended as a more fully-featured option which can operate on any video [16]. This

means that in addition to learning curve for learning a new piece of software and becoming

comfortable, users will also experience difficulty in trying to well-manipulate the strokes

within a video. The other option for video editing is to outsource the process to professionals,

though this option can be prohibitively expensive. Specifically, MIT's AMPS service provides

services for both the recording of a lecture as well as the editing process, though the cost is

around $295/hr, with a total semester cost around $10,000 [17], [9].

The prior options mentioned for recording of strokes also offer editing as well, in some

17

Page 18: JUL LIBRARIES - DSpace@MIT Home

cases. However, to the best of our knowledge, the edits are strictly applied to the state of the

document: no sense of when the edits occurred is maintained, and the edits serve to only

apply fixes to notes, but do not constitute a process. While an incredibly useful feature, some

edits should be visible in a lecture to preserve a sense of why or when a change occurred.

Additionally, we do not believe these applications provide the ability to selectively edit the

speed of the different channels independently.

2.3 Playback

At a minimum, two very different types of authors exist in practice: those which record

many smaller lectures and provide the breakdown to students, and those who record one

longer lecture in a contiguous time block. The latter is very common for lecturers who have

agreed to their lectures being recorded as they are delivered in a classroom. However, either

scenario presents a question of how playback should be presented to the user, especially so

that playback is useful to users when seeking content.

Now commonplace, YouTube's seeking mechanism was rolled out in 2012. Show in Figure

2.1, a series of thumbnails is presented when a user hovers over the seek bar within a video

[18]. This provides users with a way to quickly glance at different frames within the video so

that they may more quickly identify relevant frames and move towards that direction. The

mechanism also allows for an additional layer of nesting shown in Figure 2.2 for very long

videos, where a smaller pop-out of the seek bar is displayed before generation of thumbnails.

However, Figure 2.1 demonstrates a series of potential problems with the thumbnail display.

In videos where frames may be very distinct, thumbnails provide near immediate uniqueness,

but lectures where the majority of elements are on a blackboard do not necessarily display

such properties with a simple glance. An additional problem with thumbnails in that there is

no causality between them-it is unclear how one frame relates to another, and this continuity

may be very important in finding a particular section of a lecture.

18

Page 19: JUL LIBRARIES - DSpace@MIT Home

Google Chrome Speed TestsgoogIechrom. 0 Subscribe 221 videos

&Lke + Add to ' Shari L 5,549,002 "

Figure 2.1: The YouTube filmstrip allows for seeking within a video by displaying athumbnail preview of different frames within the video [18].

Another approach is to analyze the video for relevant items before presentation to the

students. NoteVideo offers such an approach for visuals by doing a video analysis on strokes

and providing an interface where users can click on strokes to replay at any particular stroke.

NoteVideo+ expands on this by closely coupling the transcript with the video, so that users

can see the related text displayed when hovering over a stroke, and also giving users the

option to search the transcript and jump to those points within the video [19].

19

Page 20: JUL LIBRARIES - DSpace@MIT Home

Life In A DayHefte 0 Subscribe 68 v~deos v

9 + At Ware 3,994,228 .w

Figure 2.2: The YouTube interface allows for zoom into a smaller section of the seek bar,which then expands into a thumbnail [18].

20

Page 21: JUL LIBRARIES - DSpace@MIT Home

Chapter 3

Features

The primary feature of this thesis is the ability for edits to occur in a system which also

performs the recording of hand-written lectures. Being so, it is important for users to

distinguish between two different modes of operation: one mode is the recording mode which

allows an insertion of new material to be put into any time within the lecture, and another is

the editing mode, where users can change the properties of already existing material. Each

mode offers a different suite of tools for what is appropriate in the current mode-context.

Figure 3.1 presents some common features provided by the application which are useful in

the creation of a hand-written lecture.

3.1 Recording Mode

The recording mode encapsulates all activity of both the tools which provide live changes

and their potential configurations. The set of tools which are active during this time are all

additive in the sense that they modify a lecture with new material which did not previously

exist and they never remove content from the lecture itself. Tool modifiers such as changing

the stroke width also exist for the recording-mode tools. The handful of tools exist for

recording is shown in Figure 3.2 and described below:

21

Page 22: JUL LIBRARIES - DSpace@MIT Home

dv /

- .

(a) The pen tool allows for strokes to be captured directly by the application.

Au R". slc dee rdo ee C h."iansesesm asnin nomnil

dv-5 e..q "o'.rf

0L

-- '1. 2-

'X ~~t

(b) There is the option to edit strokes which were written in various ways, such as selection followedby a deletion.

G)~ , -,A ,I

azat

S 2

(c) Flexible recording allows users to move back in time and record fixes at the time of the mistake,instead of recording them later. This makes it seem as if the mistake has never occured.

Figure 3.1: A preview of some of the features which Pentimento provides for flexiblerecording and editing.

22

M fum Tocis

K'O

Page 23: JUL LIBRARIES - DSpace@MIT Home

In-Lecture TOOls

Selecion for editing(eraoersideofstykis)

Pen

Figure 3.2: Tools available during recording

" selection tool which is used to take visuals on-screen and put them into the current

selection. This can be used as a more long-lasting highlight tool or to buffer visuals for

another tool's functionality

" pen tool which is the primary tool used for adding content. This allows a user to create

new visuals and draw strokes onto the canvas which are all recorded by the application

" highlighter tool which is used to provide emphasis on a section of the canvas, typically

for emphasis on already-drawn visuals. The highlights are recorded by the system,

though the emphasis is temporary with respect to time within the lecture

" width tool allows users to change the stroke width of their pen or highlight strokes

" delete tool can be used to remove visuals at a time in the lecture. It is important to

note that a deletion which occurs during recording simply removes that visual from

being visible at any later time in the lecture. The visual will still exist in the underlying

data structure, but will have a flag set for time of deletion. This is in contrast to an

edit-mode deletion, which is described in the next section

23

Page 24: JUL LIBRARIES - DSpace@MIT Home

* additional slide tool provides the user with a new, blank canvas which has no notion

any previous or any future slide.

3.2 Edit Mode

The edit mode tools are the ones which allow for retroactive changes to be applied to the

lecture. If a mistake is made and the user wishes to fully erase the mistake from ever having

existed in the lecture, the suite of edit tools allow for changes to be directly applied to visual

elements. Tools within this mode cannot add any new material into the lecture, but instead

modify visuals or data which is already within the lecture. The set of tools which exist for

edits, shown in Figure 3.3, are:

" play/pause tool allows the user to preview the lecture as it will be exported in its

current form

" select tool is analogous to the select tool in the recording mode. Selecting in the edit

mode has no side effects for the recording itself

" delete tool is the tool for full removal of a visual from the history of a lecture. In

contrast with the recording-mode deletion, this deletion is a full removal of the visual

from the data structure

" redraw tool allows the user to replace visuals with a new set of visuals. This action acts

as a shortcut for the compound action of deleting a selection and starting a recording.

Because users can replace one visual with any number, they must specify for when the

recording is ended.

" changing the stroke width will edit a stroke or strokes to appear as if they were originally

drawn with the specified width

" deletion of the currently active slide will remove the entire slide and its encapsulated

visuals from the lecture

24

Page 25: JUL LIBRARIES - DSpace@MIT Home

,Ow. 0No Pcae 0-4-'.S Select I Cdft* I redraw selectlon I I

Figure 3.3: Tools available during editing

3.3 Undo & Redo

Pentimento is also integrated with both an undo and a redo stack which allows for actions to

be inverted or replayed. The ability to undo or redo at varying granularity is also supported,

where users may want to only undo the last action or undo a entire set of successive actions.

The undo and redo actions are accessible in both the recording and edit modes, though the

ability to perform either is dependent on which actions have most recently been performed.

They are in both sections of tools simply for spatial association in the view.

While the recording and editing modes enforce strict additivity or non-additivity, re-

spectively, the undo and redo exist as meta-tools which diverge from the mode-enforced

semantics. These tools are exemptions from the mode-based restrictions in order to provide

a correct meaning of the undo action and redo action to the user. For example, a stroke can

be drawn in the recording mode, and the undo action applied to the stroke will erase the

stroke from ever having existed in the lecture-a very edit-like action during the recording

mode. Likewise, a stroke can be deleted in the edit mode, and the undo of that action will

replace that stroke in time and space-an action which is additive in nature. In both cases, if

the user wishes to not have performed the most recent action in the first place, the undo

must break the mode semantics. The exact meaning of undo & redo in different contexts

is provided in further detail in Chapter 5, which discusses the implementation and specific

guarantees of Pentimento.

25

Page 26: JUL LIBRARIES - DSpace@MIT Home

3.4 Constraints

The decoupling of audio and visuals is what allows Pentimento to give such freedom to editing

freely in one channel without directly affecting the counterpart. However this immense

freedom comes at the price of removing the tightly coupled meaning to any specific time;

when previously a time corresponded to a visual and associated audio, it now may correspond

only to a visual but have the associated audio shifted to another time or vice-versa. This

disjointness brings about two separate notions of time, which can now be skewed or shifted

by variable amounts at different sections within the lecture.

This is not a problem, though, since the natural speed of writing and speaking are not

the same, and each will change when trying to do both [16]. Additionally, the pacing of

each channel may need to adjusted separately in different sections to compensate for this

difference. In order to allow users to specify the relative speed of each channel, users can place

constraints within a lecture, where a constraint consists of a (visual-time, audio-time)

pair. This allows for the specification of a loose-synchronization between the two channels,

and these constraint points can be changed later in the edit mode. The interpretation of

constraints is further discussed in Chapter 5.

26

Page 27: JUL LIBRARIES - DSpace@MIT Home

Chapter 4

Design

As with any system, the design of Pentimento is strongly influenced by the features and

semantics which we guarantee to the user. The promises of the application motivate the

layout of the data structures, which are described in detail within this chapter. Several

additional factors also strongly influenced the design of this new Pentimento system: the

need for a clean separation of logic between components and the want for extensibility of the

platform to other media beyond handwriting. Therefore, the overall design of the system was

therefore geared to follow the MVC ideology as closely as possible to satisfy all preceding

factors. Strong adherence to the design principles set forth in the ideology has enforced

a clean design structure, which naturally provides strong modularity within each of the

components. This modularity therefore provides extensibility with the replacement of any

specific component or combination of components.

Figure 4.1 shows a high-level architecture, including several controllers and several

data layers. Important to note is that the controllers and models do not necessarily exist

in a one-to-one relation, as the controllers represent more a logical grouping of functions,

while the models represent data important to the application. In fact, for Pentimento, data

can be manipulated in several meaningful ways depending on the mode of operation, so

controllers are numerous compared to models. This specific MVC design is also highly

27

Page 28: JUL LIBRARIES - DSpace@MIT Home

A interacts * - fires events apply updatesinteractsp

pushes updates services reads

user view controllers models

Figure 4.1: MVC Layout of each component. This follows the MVC derivative, MVP patternwhere presenters bridge the model and view, as opposed to the canonical MVC [20], [21]. We

continue with the label of controllers for more familiarity to readers.

controller centric, where all updates between models and view must be controller-driven, and

no direct communication occurs between the model and view.

4.1 Models

The models serve the sole purpose of being containers with the proper structure to encapsulate

all the application data in a consistent manner. Two very different models exist for the

application: the lecture model and a global notion of the state model. The lecture

encapsulates all the data relevant to the presentation and what will be presented to students

by the author, while the state is information relevant to the current session of the author,

but which should not be saved across sessions or presented to the students.

4.1.1 State Model

Overall, the state model is an umbrella term for different pieces of sub-state which are each

managed through a different controller responsible for that sub-state. Any data which might

be considered transient for a session should be included within this model, and not within the

lecture model. The state serves primarily the purpose of being a channel for inter-controller

communication. Generally, there are three major sub-categories for the state: input state,

navigation state, and tool state.

The navigation state consists of the timing within the lecture and the currently active

28

Page 29: JUL LIBRARIES - DSpace@MIT Home

slide that is viewed. The importance of slide-locality is discussed further in subsection 4.1.2,

but the notion of only being aware of the current slide allows for consideration of only the

intra-slide elements in most cases. This state is important so that the elements rendered

match the time cursor available to the user.

Input state comprises the sum of the input state from all the I/O devices on the author's

machine. The inputs may not always be relevant, but their changes in their state are always

monitored. The choice to keep track of state variables allows the system to check against

the state instead of querying the device for its state on every event. These inputs serve to

potentially modify the effect of the tools which are active: selecting visuals with the Ctrl/Cmd

key or Shift key down will add to the current selection, the selection box should only be

expanded if the left mouse button is held down, etc. The correct combination of inputs can

also trigger the firing of a tool if enabled, such as the commonly used Ctrl/Cmd-Z shortcut

to fire the undo action.

The final state element, the tool state, consists of the properties which may modify

the functionality of any tool when changed. The most recently active color and the most

recently active stroke width both affect any type of drawing which may occur. Recording

is also considered a tool, so the parameters relevant to recording are encapsulated within

this category. Additionally, flags can be set for some elements of the hardware I/O, such as

whether to consider pressure-sensitivity in assigning a stroke's color or width or whether to

consider keyboard shortcuts in the firing of tools. Though the input state may continue to

reflect ignored variables, tools refer to the tool state for whether to consider the inputs.

4.1.2 Lecture Model

The lecture model is the entity which contains all the information for the replay of a lecture,

and exists as a hierarchy of elements. At the highest level, this model consist of only a few

elements: an array of slides and an array of constraints, each explicitly ordered based on

29

Page 30: JUL LIBRARIES - DSpace@MIT Home

visual time visual time

audio time audio time(a) This set of constraints is dis-allowed, (b) This set of constraints is allowed, since all

since a pair of constraints conflict. constraints are monotonic and do not intersect.

Figure 4.2: Pentimento enforces strictly non-intersecting constraints, so that constraints arealways monotonic in visuals and in audio. If two constraints conflict, such as in (a), one

constraint will be removed.

monotonically increasing time. Though no unified notion of time exists, each channel should

be monotonically increasing, and constraints are not allowed to crisscross in their (visual-time,

audio-time) mappings, as shown in Figure 4.2. The hierarchy of components to the lecture

model is shown in Figure 4.3.

The choice of sub-dividing a lecture into slides is inspired by the idea that most lectures

can naturally be discretized into sections such as blackboards or Powerpoint slides. A slide is

a fully-contained, standalone unit that defines sharp boundaries for all data encapsulated

within it. Slides, like lectures, contain an array of visuals and a duration of existence, but

each slide also maintains an array of its transformations over time. Transformations at this

level primarily consist of camera changes, such as zooming in/out or panning to another area.

The duration of the entire lecture can be determined based on the sum of the slide durations,

and at any time t within a lecture, the active slide i can be defined such that:

i-i

Z j.duration < t < j.duration (4.1)j=0 j=0

Time 0 is special in that it belongs to the first slide and is the only exception to this rule.

Slides also lack an explicit slide number, but slide numbers are implicit and can instead

inferred by the position of a slide in the array.

The visuals within a slide may vary quite a bit depending on their types, but they share

some common and fundamental fields. Some visuals that are often helpful in during a lecture

30

Page 31: JUL LIBRARIES - DSpace@MIT Home

lecture: obiect

slide: object

visual: object

Figure 4.3: Lecture model hierarchy

31

Page 32: JUL LIBRARIES - DSpace@MIT Home

include examples such as the strokes an author writes, images relevant, or videos relevant.

All fields for a visual are defined relative to the slide, as opposed to the more global lecture,

to allow for more modularity between slides. Among common fields these different visuals

may have are:

" type of the visual, which essentially identifies the class. In the mentioned examples of

visuals, the types would be stroke, img, video, respectively.

" tMin which defines when the visual came into existence, either through direct recording

of later editing. This property is also when the viewer of the lecture should first

acknowledge the existence of the visual.

" properties is a map between strings which may vary based on the type of a visual.

For example, the properties of a stroke may include the color and the width of the

stroke, while images may have height and width attributes or a scaling attribute.

" transforms is an array of time-ordered transformations which may have been applied

to the visual. If a visual has been modified in any way after creation, such as being

scaled, sheared, or moved, such transformations will be stored in the transforms of

the visual.

" tDeletion is when the visual should no longer be visible or in existence during the

slide, after its tMin. This field is useful when a visual should only be visible for a small

duration during a slide, but not the entire duration of the slide after its tMin.

" hyperlink is a link to a foreign resource which is bound to the visual. For img and

video visuals, the link may simply be a reference to the source of the visual, but a

series of strokes which define an equation may link to an alternate derivation, such as

from Wikipedia.

The final layer fundamental to the lecture model is that of the vertex. Every visual

needs some notion of an (x,y) coordinate: images and videos may define attributes such

32

Page 33: JUL LIBRARIES - DSpace@MIT Home

as height and width, but need a top-left-corner or some other such relative point defined;

strokes are simply a series of (x,y) coordinate vertices connected together. For Pentimento's

needs in particular, especially with its focus on hand-drawn strokes, vertices are defined as

(x,y,t, p) coordinates, with t the time it was laid down and p the pen-pressure with which

it was laid down. The latter field may not always be useful for all visual types.

4.2 Controllers

Controllers serve as the primary hub for all the application logic, serving to bridge the gap

between the data within the model and the view presented to the user. This means that

controllers are in charge of properly interpreting user input, manipulating the models in the

proper manner, and keeping a consistent state. Figure 4.4 shows a layout of the controllers

and their interactions among one another.

4.2.1 Tools Controller

The tools controller is the entry point for all user input in relation to tools and a dispatcher

to all the other controllers to perform the correct logic on the usage of a tool. This means

that this controller is in charge of attaching and detaching handlers on tool changes, while

also keeping track of changes to tool state, such as current stroke color or current stroke

width. Mode changes, from editing-mode to recording-mode or vice-versa, are also loosely

classified as tool changes, but such changes render some tools usable and others unusable.

Therefore, it is also the responsibility of the tools controller to update the view in response

to user input which changes the mode of operation.

33

Page 34: JUL LIBRARIES - DSpace@MIT Home

Recording Controller

TU

UI Events

U ontro er1

recording modeediting mode

Query toupdate model

Queries forupdate to model

VisualsControllei

I LectureController

Visual Slide changes Constraintchanges changes

Applied changesto model

LectureModel

Figure 4.4: The layout of controllers in relation to the lecture model. The green flow

represents the recording mode action flow, while the blue flow is the flow for editing modeactions.

34

I C, II

Page 35: JUL LIBRARIES - DSpace@MIT Home

4.2.2 Time Controller

The time controller is the sole controller responsible for updating the state to reflect the

correct current time of the session. Important to note, though, is that there are two separate

notions of time, with one for the visuals and another for the audio. Though many other

components may ask to read the current visual time or audio time from the state, such as

the renderer which must know at what time step to render, it is time controller maintains

the exclusive right to update the time. Any controller that wishes to update the time must

ask the time controller to do so on its behalf.

4.2.3 Lecture Controller

The lecture model itself is hidden behind this controller so that the lecture controller should

be the only one capable of editing the lecture's fields. The lecture controller only considers

high-level changes such as the addition of slides or deletion of slides. This is also the controller

that maintains the global notion of the lecture, so any computation which depends on the

entirety of its components, such as determining what slide any time corresponds to. The

lecture also maintains no notion of duration, but such a field is determined by the sum of

the durations of its slides, which is yet another computation this controller is responsible for.

The time controller will update the time appropriately, then request for the lecture controller

to update the current slide within the state to correctly reflect the correct slide.

4.2.4 Visuals Controller

Manipulation of the properties of a visual fall under the responsibilities of this controller,

regardless of which slide is relevant. Any change related to visuals must ask the visuals

controller to perform the update on its behalf. The additions of a new visual, deletion of

a visual, or modification of a visual are all encapsulated by the visuals controller. This

35

Page 36: JUL LIBRARIES - DSpace@MIT Home

controller may modify a visual by either changing its properties directly or appending a

transform object into its array of transformations. The visuals controller also has an audio

controller parallel which is discussed in Chapter 6.

4.2.5 Recording Controller

Changes to the lecture model can occcur in one of two fundamental forms: edits and

additions, and the recording controller is charged with the latter functionality. The entry

point for any changes still falls with the tools controller, which then delegates responsibility

for any recording to this controller. While edits may alter the properties of already-existing

visuals, edits may not create or insert new visuals. The recording controller handles additions

which occur at a time or span some time interval and provide a change which did not

previously exist to be incorporated to the lecture model.

Any change to the lecture happens directly on the underlying data structure, so the

controller acts as a hub for all logic which manipulates the model. It appropriately handles

temporary buffers for in-flight visuals which have not yet been completed, and asks the lecture

controller and visuals controller to perform actions on its behalf, such as appending a new

slide or inserting a new visual. As a side effect of direct manipulation, the currently active

time and currently active slide may change during a recording, so this controller must also

act in conjunction with the time controller to ask for correct updates to to both.

4.2.6 Retiming Controller

The retiming controller handles the constraints which are placed in the application. At the

beginning and end of a recording, the recording controller will ask the retiming controller

to place constraints so that the elements within a recording remain coherent as they were

initially recorded. Later during editing, users can move these automatically placed constraints,

delete them, or insert their own. This controller also enforces the requirement that no two

36

Page 37: JUL LIBRARIES - DSpace@MIT Home

constraints can conflict. Other controllers may also query the retiming controller for proper

interpolation of a visual time to an audio time or vice versa.

4.2.7 Undo Manager

The undo manager is not in fact a controller, but exists as a logging mechanism for all actions

that have been performed. With each action, a log exists of its inverse action to be applied

on an undo 0 call. The undo manager not only keeps a track of an undo-stack of actions

that can be undone, but also a redo-stack of actions which were undone. Any non-undo or

non-redo action that directly pushes to the undo stack will clear the redo stack.

Important to note is that the undo manager helps to guarantee flexible semantics of

the application. In fact, the undo manager helps to guarantee an asymmetric undo in some

cases, where the lecture model is modified to a point where it never was before the undo

action. Much like how the undo manager breaks the semantics of each mode, such scenarios

are possible where we believe it would be more meaningful to the user than the perform of an

exact undo. This is discussed further in Chapter 5, which discusses specific implementation

choices.

The undo manager's logic also reinforces the controller break-down as presented here.

Since each controller performs specific and clearly defined actions, the same controller must be

responsible for the inverse action. For example, because the visuals controller is responsible

for all changes to visuals; it is responsible for the deletion of a visual and addition of a visual,

which is the inverse action of the deletion. In this way, the type of events which are placed

onto the undo stack are well localized to each controller.

37

Page 38: JUL LIBRARIES - DSpace@MIT Home

4.3 View

The view has several separate components which are managed individually: the editing tools,

the recording tools, the navigation, the drawing canvas, and the record start/stop. The

aforementioned tools controller handles the actions of correct tools in each mode, which still

leaves the problem of needing to correctly render the state of the lecture at any time. That

responsibility is moved a separate component called the player. The player correctly handles

the constraints and changes to visuals which may have been made over the course of the

lecture. The player generates output based on the time frame at which it is asked to render.

The tools laid out across the top are consist of the non-recording tools, which are used

during times when the application is not recording a section for the lecture. Along the side

are the recording tools which are available for use during a recording. Additionally, the

navigation state is reflected in the view as well. The layout can be seen in Figure 4.5, and

includes the option to begin a recording or stop a recording, with what channel should be

recorded.

The video time cursor and ticker box reflect the current state of a session. The primary

way in which authors navigate the visuals is through the slider, but the ticker will always

reflect the time of a lecture, even during a recording. The slider is disabled during recording.

The recording tools and editing tools reflect those which were described in more detail in

Section 3.1 and Section 3.2, respectively. The playing of a lecture and moving across slides

may also be helpful in editing, but do not modify the lecture itself, though they will change

the state. These tools are grouped with the editing tools since they navigate a lecture only

when not recording.

38

Page 39: JUL LIBRARIES - DSpace@MIT Home

Figure 4.5: An overview of the view which is presented to a user. The aforementioned toolsare logically grouped by color in this figure.

39

F

Page 40: JUL LIBRARIES - DSpace@MIT Home

Chapter 5

Implementation

5.1 Method

The underlying requirement that this embodiment of Pentimento be cross-compatible among

different platforms and different operating systems meant that we primarly considered options

which are wide-spread and mature for development. While we could have allowed for users to

download and install a binary, many of the features we wish to support lend naturally to

the web, specifically hyperlinking and embedding of foreign visuals such as images or videos.

Additionally, because the majority of MOOCs are accessed through the internet, it seemed a

natural extension to provide the recording and viewing with a browser environment.

While the browser environment provides a primarily benefit, there are some drawbacks

which should be addressed. In order to record audio, the origin policies of current browsers

require that the application must be hosted, meaning that authors must either run a local

server to host the code or connect to a remote server which does so. Saving the contents of a

lecture, however, may be done either locally or remotely; though a local save would require

re-upload the lecture before editing. The primary problem which is faced, however, is the

fragmentation of browsers and which features each browser supports. Most notably, Internet

40

Page 41: JUL LIBRARIES - DSpace@MIT Home

Explorer deviates from the standards to which Chrome and Firefox adhere, and vice-versa.

Additionally, we aim to support HTML5-compliant browsers, so last-generation browsers may

not receive a full set of features. Lastly, the different browsers also support a different set of

events and event-rate, which can affect the smoothness of strokes, for example. Some of these

issues can be mitigated, such as importing a the third-party Javascript library excanvas. j s

in order to compensate for older browser not supporting the canvas element [22].

5.2 Recording

Recording brings about a series of problems which are masked behind the logic of the

recording controller. The recording controller acts as the primary hub for every action that

occurs during a recording. It primarily handles the setting of various state aspects related to

recording, so that other controllers and UI handlers can reference the state to determine

whether or not to proceed. Control flow during a recording is show in Figure 4.4.

The recording controller firstly marks the state to indicate a recording is in progress,

and keeps a local variable of when the recording began with respect to its local clock. It

then asks the time controller to begin a regular interval at which it updates the video cursor

(and/or the audio cursor depending on which type of recording is being performed). We

found that the regular updates were not quite regular, so instead the regular updates perform

a local read of the clock and update appropriately depending on the read and the most

previous clock read. Handlers for the writing of strokes then notify the recording controller

of their specific event, leaving the recording controller to act on it as desired. For example,

beginning a stroke on the canvas will fire a handler which will pass in a new stroke object

into the recording controller, the recording controller will mutate the time to be relative to

the beginning of the slide, and then ask the visuals controller to add the visual into the slide.

Additionally, the recording controller also masks visuals which do not apply to the

recording. Consider a visual which exists at time ti, and a recording which begins at time to

41

Page 42: JUL LIBRARIES - DSpace@MIT Home

such that to < t1 . The visual should come into existence once the time reaches ti, but this

specification brings about awkward semantics during recording for the author. Instead, the

original visual should be shifted by the time of the recording within the slide, A, and should

come into existence at t1 + A during the slide. However, the value of A cannot be known

until the recording is finished.

A variety of options are available for this problem, and we considered three possible

solutions. The first of which is to consider a temporary lecture which is merged with the

original at the end of the recording. However, this provided large amounts of complexity in

both integration with the undo manager and rendering. The second solution is to perform a

"rolling shift", where the where the visuals are shifted on an interval basis, so they are never

displayed. Our solution is that any visuals in the current slide after to are "disabled" by

the recording controller, by setting its tMin to be Number . POSITIVEINFINITY in JavaScript,

essentially moving those visuals to the end of the slide. Once a recording is finished, those

visuals are reset to their original tMin values, then shifted by A.

Recording also makes three additional guarantees worthy of note. Firstly, it provides

snap-left insertion, meaning that if a user seeks to do a recording, the time will snap left

to right after the most previous visual. However, if there is a visual appearing during

the insertion time, no snap-left will occur. This helps to eliminate the silences that come

about when a user begins recording at an arbitrary time within the lecture. We believe

that users very rarely want silences, but instead want to place constraints that are more

meaningful. Second, recording provides strict additvity in all aspects, but specific to note are

the transformations. For example, selection of visuals during a recording places a transform

within each of the selected visuals which says to display the selection color and selection

width until another transformation says otherwise. The deletion of a visual during a recording

also does not delete the visual, but simply sets the tDeletion appropriately. Third, the

recording controller also asks the retiming controller to properly maintain the timing within

the recording by placing constraints at the beginning and end of a recording.

42

Page 43: JUL LIBRARIES - DSpace@MIT Home

visual time visual time

insertion time insertion time(a) The user wishes to begin a recording at (b) The system will snap the insertion time left

some point in time. to right after the most previous visual.

visual time

insertion time(c) When the recording time overlaps with a

visual, no snap will occur.

Figure 5.1: The current time will snap backwards to whenever the user last had a visual,eliminating silences between the recording and the previous visual, only when it does not

conflict with another visual.

5.3 Editing

Where recordings applied transforms to get the desired effect, editing is the process of

retroactively changing the properties of a visual directly. Changes to the color of a stroke or

width of a stroke directly change the stroke's property to make the stroke appear as if it were

originally drawn with the new color or width. Editing also performs the action of clearing all

conflicting transforms, so if a stroke is edited to have a new width, it will have that width for

its entire lifetime unless the user changes the stroke's properties during another recording.

Also important are our specifications for a deletion and their relation to time. In simply

deleting a visual, a space of empty time is left, which may be highly undesirable. Instead, we

allow for a time collapse upon a deletion, where the empty space left by a visual will get filled

in by shifting all subsequent visuals left by the time of the deleted visual. Again, we believe

the user rarely, if ever, means to leave empty space, but usually means to more appropriately

place constraints. However, simply shifting visuals will affect the constraints, so subsequent

constraints are shifted as well. Deletion of a slide affects the constraints in a similar fashion.

43

Page 44: JUL LIBRARIES - DSpace@MIT Home

visual time visual timeFigure 5.2: Deletions cascade from the back forwards. The items marked for deletion are inred. (1) is performed first, followed by (2), leading to a collapse of time based on the items

which were marked for deletion.

5.4 Undo & Redo

The undo manager provides for either a symmetric undo or an asymmetric undo depending on

whether the original action was performed in the recording mode or in the editing mode. The

latter is the easier case to address, which will be discussed first. The actions in the editing

mode provide for a symmetric undo, though calling an undo will break the non-additive

nature of the editing mode. Strokes which are deleted during the editing mode can be added

back into the lecture, which is an additive form of action. Likewise, edits to the stroke

property can be exactly undone to restore the property back to the original.

The undo manager provides for an asymmetric undo semantic in almost all cases with our

application, in relation to a recording. In particular, we believe that most users will intend

to undo their visual actions or undo audio actions, but not necessarily undo both. During a

recording, we only allow for the undoing of visuals; because time is strictly monotonically

increasing for both visual and audio channels, this allows the lecture to be placed into a state

which did not previously exist. Likewise, redoing an action during a recording will place the

lecture into a new state which did not previously exist due to the increase in timing.

For audio, once the recording ends, we place an event at the beginning of the current

recording which indicates the insertion of audio. This means that users who wish to modify

the recording can edit it directly, but would have to undo all visuals during the recording

to reach the audio on the undo stack. Additionally, shifting of visuals by the A is placed

at the beginning of the recording after the audio insertion event. Placing the A at the end

would immediately lead to an immediate undo of the shift on the invocation of the undo call,

44

2 1

Page 45: JUL LIBRARIES - DSpace@MIT Home

meaning that visuals would immediately overlap.

Important to note is that a recording maintains external consistency with the outside

world. That is, if a recording begins at a time tbegin in the world and ends at a time tend,

then the total time which has elapsed is tend - tbegin. This difference is placed into the lecture

no matter if an undo or redo of a slide occurs in between. If a user performs the undo of

a slide at time tundo such that tbegin < tundo < ten, then the entirety of tend - tbegin will be

placed into the preceding slide. Additionally, if a user then performs a redo at time tredo such

that tbegin < tundo < tredo < tend, then the time of tundo - tbegin is placed into the preceding

slide, and tend - tundo will be placed into the newly added slide, such that the total time

of tend - tbegin is placed into the lecture. This same logic can be recursively applied to any

number of slides beyond simply one prior slide and one newly added slide, so that tend - tbe in

is preserved within the lecture.

This naturally leads us to the semantic that the duration of a recording belongs to the

recording itself, when considering undo and redo. After a recording is finished, users can still

undo and redo actions which occurred during the recording, but because the in-recording

undo guarantees a duration of tend - tbegin, the out-of-recording undo must provide the same

guarantees. Hence, while each slide has a duration, that duration can change if a user

performs an undo/redo on the addition of a new slide. In order to remove the extra addition

of tend - tbegin, a user must either delete a visual which was added, delete a slide which was

added, or undo the entire recording.

45

Page 46: JUL LIBRARIES - DSpace@MIT Home

Chapter 6

Conclusion

Pentimento provides a framework for the flexible authoring of hand-written lectures which

can be modified retroactively. The decoupling of visuals and audio during recording time

allow for more free-form edits which affect each channel independently, and users can later

specify for how to re-synchronize the channels.

Our system follows the MVC design philosophy, where the view is separated from the

models of the application by a variety of controllers which are the hub of the system logic.

The tools controller acts as the interface point for the view, delegating responsibilities for

other controllers and updating the view as appropriate. The recording controller and the time

controller work in conjunction to provide meaningful semantics during a recording session,

with an audio recording controller soon-to-be integrated. After a recording, the user will enter

the editing mode, where they can ask the lecture controller and visuals controller act upon

the underlying model appropriately. During either recording or editing, users can specify

an undo or redo of their last action, which provides the semantics that we guarantee, but

not always an exact undo or redo. At any point in the editing mode, users can also specify

constraints to be placed or removed from the lecture. These constraints provide a loose

synchronization between the two channels, and are handled by the retiming controller.

46

Page 47: JUL LIBRARIES - DSpace@MIT Home

The models within the application consist of either durable data which should be persisted

within the lecture or temporary data which should only exist during a user's session with

the application. The state model consist of user-specified configurations or changes to tool

state. The lecture model is designed around the philosophy that lectures can be broken into

discrete chunks such as slides or blackboards, where one chunk can function independently of

another. Each slide then independently contains an array of transformations and an array of

visuals. Visuals then each also independently contain properties such as the tMin of when

they come into existence during the lecture, tDeletion of when they are removed from the

lecture, and transforms of alterations over their existence. Currently, each visual also stores

its own set of verticies in the form of (x,y,t,p) pairs. Chapter 5 more specifically details

the exact semantics for all of the operations which are supported.

6.1 Future Work

The existing implementation of Pentimento provides for the recording of written strokes,

and the further incorporation of an audio controller. This is a primary goal which is under

development, though can generally follow the structure set forth in the framework laid out,

where the audio controller is the parallel of the visuals controller. We believe audio represents

a more realistic estimate of time than visuals, so the audio channel would be the basis on

which playback is based. We also hope to support processing of the audio to detect silences

and highlight them more easily for the user. Audio presents a different set of needs than the

visuals, though, since authors may have multiple audio tracks. For example, if authors were

to show a YouTube video of an experiment, they might narrate over the video's audio. We

also do not believe the slide semantics of discretization will necessarily apply to audio.

Another goal is the interface and its layout to become more refined for users. Once an

audio controller has been integrated, more visual elements can be presented to the user for

editing, such as the placement of constraints or visualization of audio. We also currently

47

Page 48: JUL LIBRARIES - DSpace@MIT Home

present both recording tools and editing tools simultaneously, though a complete replacement

of tools on mode change could help to eliminate any mode error a user may experience.

Several add-ons to Pentimento would also provide a much richer interaction for users

than what is currently given. We have implemented a best-guess at the primary feature set

and logic which users will need for a coherent recording hand-written lectures, but some

tools are to-be-completed, such as color changes. Space and logic has also been allocated for

tools which apply spatial transformations, though none exist at the moment. Additionally,

users typically prefer recording and editing visuals first, followed by syncing audio to those

visuals [16]. This sync function is currently pending on the audio controller and would require

correct constraint placement.

Pentimento also currently takes full-charge of its visuals representation, from each

individual vertex up to the entire stroke structure. This allows for the player to have

backwards-compatibility with the OS X implementation [16]. However, some third-party

libraries support the visual operations based on the canvas element, such as Fabric.js.

Moving to an alternative library would change how visuals are represented, but may expedite

development of tools which such a library already supports, such as drag-and-drop.

Lastly, we currently extend purely to the realm of recording hand-written lecture notes

and their replaying, but this could be expanded to any realm where this recording logic

would apply. In particular, text-based lectures, could be a great field of expansion, combining

the interactivity of environments like CodeAcademy with the lecture-style of presentations

like Railscast. As online engines to execute submitted code also exist, text would be a very

interesting area for us to move towards.

48

Page 49: JUL LIBRARIES - DSpace@MIT Home

49

Page 50: JUL LIBRARIES - DSpace@MIT Home

Bibliography

[1] MIT OpenCourseWare. Monthly Reports. 2014.http://ocw.mit.edu/about/site-statistics/monthly-reports/

[2] Michael Horn. Special K: Don't Sleep On Khan Academy, Knewton. March 21, 2013.

http: //www . forbes . com/sites/michaelhorn/2013/03/21/special-k-dont-sleep-on-khan-academy-knewton/

[3] MITx. 6.00r Courseware. 2014.https: //lms .mitx.mit .edu/courses/MITx/6.00r/2014_Spring/courseware/

[4] Khan Academy. What is Khan Academy?

http: //khanacademy.desk. com/customer/portal/articles/337790-what-is-

khan-academy-

[5] Coursera Help. What format are the courses in?

http: //help. coursera. org/customer/portal/articles/1164362-what-f ormat-

are-the-courses-in-

[6] edX. Student FAQ.https://www.edx.org/student-faq

[7] MIT OpenCourseWare. Audio/Video Lectures.

http://ocw.mit.edu/courses/audio-video-courses/

[8] Academia Stack Exchange. How much effort does it take to record video courses?

October 9, 2013.http: //academia. stackexchange . com/questions/9221/how-much-effort-does-

it-take-to-record-video-courses

[9] Erik Demaine and Martin Demaine. Recording Video Lectures - a guide by Erik Demaine

and Martin Demaine. October 17, 2012.

http://erikdemaine.org/classes/recording/

[10] Google Groups. Video Capture of Handwritten Notes. 2012.

https ://groups . google . com/a/ascue . org/f orum/#! topic/members/a8xl-9amCq4

[11] Epson. Education Only US Product Pricing - April 2014. April 2014.http://www.epson. com/_alfresco/projectors/brighterfutures/pdf/Pricing_Sheets/pricing-bf-projectors.pdf

50

Page 51: JUL LIBRARIES - DSpace@MIT Home

[12] Khan Academy. What software program / equipment is used to make Khan Academyvideos?http://khanacademy.desk.com/customer/portal/articles/329318-what-

software-program-equipment-is-used-to-make-khan-academy-videos-

[13] Sanjay Gupta. Khan Academy: The future of education? CBS News 2012.https ://www . youtube . com/watch?v=zxJgPHM5NYI

[14] Android 4 Schools. 5 Note-taking Android Apps for Students.http: //www. android4schools . com/2013/08/21/5-note-taking-android-apps-for-students/

[15] AppAdvice. iPad Note Taking Apps.http: //appadvice. com/applists/show/notepad

[16] Fredo Durand. Non-Sequential Authoring of Handwritten Video Lectures with Pentimento.

2014.

[17] AMPS. Lecture Capture.http://mit-amps.mit.edu/services/lecture .html

[18] Google Product Forums. New YouTube Player Features: Previewing with Thumbnails &More. 2012.https: //productforums .google. com/f orum/#!topic/youtube/oOzNeuJkrQc

[19] Toni-Jan Keith Monserrat, Shengdong Zhao, Kevin McGee, Anshul Vikram Pandey,NoteVideo: Facilitating Navigation of Blackboard-style Lecture Videos, CHI 2013.

[20] Chris Ramsdale, GWT Project, 2010.http://www.gwtproject. org/articles/mvp-architecture .html

[21] John T. Emmatty, Differences between MVC and MVP for Beginners, 2011.http: //www. codeproj ect .com/Articles/288928/Dif f erences-betwe en-MVC-and-MVP-f or-Beginners

[22] explorercanvas: HTML5 Canvas for Internet Explorer.https: //code .google. com/p/explorercanvas/wiki/Instructions

51