Computational Photography - MIT Media Labweb.media.mit.edu/~raskar/11Sig/raskarResStmntDec15th.doc · Web view1.2 Computational Photography Computational photography is an emerging,

Professional Statement: Ramesh Raskar

Can we photograph objects that are not in the direct line of sight? Can we build portable machines that can see inside our body? Can we provide diagnostic care in remote parts of the world by converting mobile phones into scientific instruments? My research goal is to create an entirely new class of imaging platforms that have an understanding of the world that far exceeds human ability, to produce meaningful abstractions that are well within human comprehensibility. To achieve this super-human vision, my contributions are in new theories and instrumentations for solving challenging inverse problems in computational light transport. To tackle these inverse problems, I create a carefully orchestrated movement of photons, measure the resulting optical response and then computationally invert the process to learn about the scene, so that the new imaging platforms can achieve seemingly impossible goals.

Imaging has dramatically transformed so many areas of our lives. My work has made a visible impact in the academic and commercial world via research and novel products. I find it compelling to create imaging designs and algorithms that have a broad applicability rather than a specific application. I choose to demonstrate the underlying theories and elegant optical plus mathematical insights in devices such as cameras, displays and medical tools but those underlying theories will make a major impact in years to come in yet unforeseen devices.

1. Towards Computational Light TransportI have been fascinated with the idea of super-human abilities

to visually interact with the world via cameras that can see the unseen and displays that can alter the sense of reality. In the realm of cameras and displays, I have moved logically from computational illumination during my PhD studies to computational photography at MERL to computational light transport at MIT. My work at MERL generated key ideas in several fields, produced over 40 patents and resulted in novel products. Although MERL was great for inventing in a field that everyone else understood, I chose to come to MIT – an ideal place to start a new field.

I am now embarking on a more ambitious effort in imaging devices for analyzing light transport in computational imaging. Since joining MIT, I have invented two novel forms of imaging that show immense potential for research and practical applications: (a) time resolved transient imaging that exploits multi-path analysis and (b) angle resolved imaging for displays, medical devices and phase analysis.

1.1 Computational Illumination with Projectors In my thesis and immediately afterwards, my team and I were either the first or one of the earliest researchers

to inventively use projectors for computational illumination in three areas: augmentation, communication and mobility. My key idea was to treat the projector as the dual of a camera – that is, a projective device that maps 2D pixels to a 4D ray-space. Our Siggraph 1998 paper on an “Office of the future” is one of the earliest 3D videoconferencing solutions. In the same year, I introduced programmable projector-based augmentation of 3D real world objects. Our 1998 paper on spatial augmented reality (SAR) and Shader Lamps spawned a new branch of augmented reality that is being practiced worldwide. I presented the idea of using ordinary projectors for high speed optical communication (Siggraph 2004, tracking RFIG, i.e., photosensing RFID tags). I conceptualized ‘pick and play’ portable projectors in 2000. At CVPR 2001 and Siggraph 2003, we showed iLamps, which is a complete solution for mobile pocket projectors. My work presaged the introduction of commercial pocket projector units in 2005 by five years and led to Technology Review’s TR100 award 2004 for young innovators under 35. I developed a strong track record for technology transfer by converting over a dozen of my patents in multi-projector display into multiple Mitsubishi Electric products. They went on to win four invention awards. My work on quadric image transfer (Siggraph 2003) became a new curved screen display product at Mitsubishi

1

Figure 1: My work explores creative new ways to play with light by co-designing optical and digital processing.

BitsPh

oton

s

Computer Vision

Optics

Sensors

Computational Light Transport

Signal Processing

Displays

Machine Learning

Computational Photography

Electric. The algorithm dramatically reduced the complexity and cost of existing half-million dollar installations by an order of magnitude. The first installation was used for rehabilitating patients via simulated wheelchair training.

1.2 Computational PhotographyComputational photography is an emerging, multi-disciplinary field that is at the intersection of optics,

signal processing, computer graphics and vision, electronic hardware, visual arts, and online sharing in social networks. At MERL and in the last few years, my team and I created new trends as well as original rigorous theories by inventing unusual optics, programmable illumination, modern sensors and image analysis algorithms (Figure 2). With my collaborators, I made a generalizing and unanticipated observation that, by blocking light over time, space, angle, wavelength or sensors, we can reversibly encode scene information in a photo for efficient post-capture recovery. I published an important paper, flutter shutter camera (Siggraph 2006), that used a binary sequence to code exposure to deal with motion blur. This paper (along with Fergus et al [2006]) opened a new trend at Siggraph in papers that deal with information loss due to blur, optical techniques and deblurring. Our further work generalized this concept for powerful algorithmic decomposition of a photo into light fields (Siggraph 2007), deblurred images, global/direct illumination components (Siggraph 2006), or geometric versus material discontinuities (Siggraph 2004). Along the way, we also created a new range of intelligent self-ID technologies: RFIG (Siggraph 2004), Prakash (Siggraph 2007) and Bokode (Siggraph 2009).

My work has resulted in invention awards, invitations for several keynote presentations at major events, a popular course at Siggraph for three years (with Jack Tumblin) and an in-progress book. The work led to Alfred P. Sloan research fellowship. Many patents are in product pipelines. They include the flutter shutter camera in machine vision and multi-flash camera based depth-edge recovery which is now a multi-million dollar effort at Mitsubishi Electric for ‘robot cell’ manufacturing.

Coding in Time Coding in Space (Optical Path) Coded Illumination Coded Wavelength Coded Sensing

Coded Exposure for Motion Deblurring

Coded Aperture for Extended Depth of

Field

Mask-based Optical Heterodyning for

Light Field Capture

Multi-flash Imaging for Depth Edge

Detection

Agile Spectrum Imaging

Gradient Encoding Sensor

for HDR

Figure 2: Previous Work, Computational Camera and Photography Research Summary 1.3 Computational Light Transport

Since joining MIT, I have undertaken ambitious efforts in creating super-human visual abilities – often combining previously disparate fields to do things that people thought were unattainable. New directions include a camera that can look around corners, portable machines that can see inside the body and low cost sensors that can transform healthcare around the world. I describe these in the next section.

2. Current Work: Time-Resolved and Angle Resolved Light Transport Analysis

My record and my team’s ability put me in the ideal position to challenge the limited scope of time and angle aware imaging today, to develop spectacular new innovations and to lead the field in revolutionizing optical mechanisms for capturing and sharing visual information.

2.1 Time-Resolved Transient Imaging: “Looking Around a Corner”Transient imaging allows the seemingly impossible task of photographing objects beyond the line of sight.

Due to multi-path reflections, light from occluded objects can indirectly reach the camera. How can we record and analyze these indirect reflections? Pioneering work in computer vision has analyzed inter-reflections. But transient imaging exploits the fine-scale time-dimension (Figure 3) by using ultra-fast illumination and ultra-fast

2

sensing to record a 5-D light transport. We analyze the light scattered after multiple bounces using strong scene priors and model the dynamics of transient properties of light transport as a linear time state space system. We developed a system identification algorithm for inferring the scene structure as well as the reflectance. However, scenes with sufficient complexity in geometry (volumetric scattering, large distances) or reflectance (dark surfaces, mirrors) can pose problems. Transient imaging has tremendous future potential in many areas: avoiding car collisions at blind spots, robot path planning with extended observable structure, detecting survivors for fire and rescue personnel and peforming endoscopy and scatter-free reconstruction in medical imaging.

The transient imaging concepts with simple experiments received one of the top prizes in Computer Vision (Marr Prize Honorable Mention, best paper #2 among 1400 submissions). I also received DARPA Young Faculty Award in 2010. As far as we know, the theory to mathematically invert the effect and recover information from multipath time delayed scattering is entirely new. Direct view time-of-flight imaging, optical coherence tomography (OCT) (currently limited to millimeter-sized bio-chemical samples) and seismic imaging will also benefit from new inverse scattering techniques.

Occluded Scene Reconstruction

Streak Camera PhotosSetup

2nd Bounce

1st Bounce

3rd Bounce

Transient Imaging Camera

Figure 3: Can we see around a corner? (Left) Transient imaging camera exploits multi-path analysis. (Right) Ultra-fast illumination (femtosecond lasers) and ultra-fast sensing (picosecond-accurate streak cameras) capture 5D light transport. For illustration, the occluder is shown as a transparent overlay. We analyze a sequence of the raw streak camera photos to reconstruct hidden shape and BRDF.

2.2 Angle-Resolved Imaging and AnalysisMy work in angle-resolved analysis is based on four important theoretical contributions (i) Spatial

Heterodyning (ii) Rank-constraints of light propagation after occluders (iii) Augmented light field to represent wave phenomenon and (iv) Angle sensitive computational probes.

Spatial heterodyning Capturing space-angle dependent 4D lightfield on a 2D sensor previously had two solutions: a pin-hole array or a lenslet array, both invented about 100 years ago. My team has shown a new third solution that is remarkably simple, easily scalable and wavelength independent: using a patterned mask. The key contribution is a formal frequency-domain analysis of the relationship between occlusion of light (shielding via mask) and preservation of scene information. This has enabled a flurry of research in light-field cameras, HCI (BiDi Screen via light sensing LCD, Siggraph Asia 2009), 3D scanning (Shield-fields, Siggraph Asia 2008) and a CAT-scan machine (with no moving parts or synchronization, ECCV 2010).

Algebraic Rank constraint on 3D Displays I recently discovered a surprisingly overlooked fact that any light field emitted by a lenticular or a parallax barrier display is rank-1 because it is an outer product of two patterns. Based on these observations, we have built a novel content-adaptive dual-stacked LCD display that is more light efficient by expressing image generation as a matrix approximation problem (HR3D, Siggraph Asia 2010).

Augmented Light Field for Wave Phenomena using Rays We proposed a ray-based Augmented Light Field (ALF) representation that can model wave effects such as diffraction and interference. ALF provides an effortless link from computer vision to concepts in Fourier optics. We can solve inverse problems in estimating diffractive elements (Optics Express 2010) and render wave phenomena (Eurographics 2010).

3

Computational Probes Bokodes are long-distance barcodes that code information in angle and reveal the pattern in the circle of confusion of an out of focus camera (Siggraph 2009). NETRA is an optometry solution for estimating a refractive map which previously required coherent laser based wavefront sensing (Siggraph 2010).

3. Future of What Humans can See and Envision: My approach to researchI am interested in the future of what people are capable of seeing: with modern imaging technology and with

sophisticated visualization. Why should the physical constraints of human vision limit what the human mind can think, conceive and envision? To create super-human visual abilities, to make the invisible visible, I must explore varied directions, bring disparate ideas together and see what bears out. With a deep theoretical understanding and mental modeling, my work often starts with a goal that – to others – appears impossible, then becomes merely improbable and then finally, inevitable. I feel this distinctive approach fuels my research: ‘to create advanced technology that is indistinguishable from magic’ [Arthur C. Clarke]. My approach achieves fusion of the dissimilar: computational techniques that have rarely been combined with physical devices. My work aims to make the invisible visible. The empirical nature of this research is important. Rather than focusing on one narrow field, I deliberately cast feelers in many directions as it is usually unclear which direction will bear fruit. Frequently, my inventions are a result of dabbling in new things that are slightly beyond my expertise. Among seemingly scattered efforts, I am actually focused on my deep-seated passion of creating tools for super-human vision.

New Directions and Goals I am highly motivated to pursue a research agenda that will spawn new research themes, entirely new application domains and new commercial opportunities. For this, I must create entirely new fields with new questions (e.g., transient imaging), redesign and make current approaches obsolete with new insight (e.g., CAT-scans) and find new purpose for disruptive mass-use technologies to create broad social impact (e.g., NETRA). I plan to be at the forefront of the future of what humans are capable of seeing to improve health, productivity, entertainment and education.

Transient imaging is an ambitious long-term project that will transform recoding of visual information and will require a new set of algorithms for scene understanding and visualization. The hardware may appear challenging but emerging solid state lasers, new sensors and non-linear optics will provide practical and portable imaging devices. Initial applications will be in controlled settings like endoscopy, scientific imaging and industrial vision. But the mathematics is also valid for shorter wavelengths (e.g., x-rays) or for ultrasound and sonar frequencies. Scenes: Beyond the non-line of sight imaging, we have initial results in single-view bidirectional reflectance distribution function (BRDF) estimation that eliminates encircling instrumentation. Next approach will address volumetric scattering (e.g. for tissue) or refracting elements (e.g. for fluids). Theory: We are developing multi-path inference and inversion techniques that exploit scene priors, sparsity, rank, meaningful transforms and achieve bounded approximations. Analysis will include coded sampling using compressive techniques and noise models for SNR and effective bandwidth. My work has attracted collaboration and discussions with several PI’s across the campus including the MIT’s Department of Chemistry, Research Lab of Electronics (RLE), and Lincoln Lab, as well as the Woods Hole Oceanographic Institute (WHOI).

The basic design of a rotating CAT scan machine has not changed for decades. How can we build one that fits in a portable, always-on head or chest band? Motion-free CAT scan (based on spatial heterodyning) is a very challenging endeavor that will require methodical joint exploration of computational and physical designs. Scattering is a big problem for wavelengths less harmful than X-rays. Our initial work in inverse scattering (CVPR’10, ECCV’10) shows promise. We plan to model forward and backward diffractive scatter with an augmented light field (ALF) to support inversion in volumes.

We are creating unusual scientific and medical instruments out of mass-use devices that are now crammed with microscopic resolution cameras, displays and other sensors. These widespread devices will provide a wonderful platform for broad social impact in remote parts of the world. But providing any new functionality requires serious research in underlying optics, mathematics, hardware and processing. Eye health is a mirror of general health and ocular manifestation of a systemic disease is very common. Preventable visual impairment is a major cause of poverty worldwide. So Project NETRA has inspired me to rethink the design of devices that diagnose the health of human eyes. Our newer prototypes show that we can analyze cataract, cone-color and

4

retinal diseases. We will strive to research new purposes for low-cost devices for health and education, and with our NGO partners, work towards worldwide deployment.

4. Teaching, Service and OutreachTeaching: I have strived hard to make imaging a central part of the curriculum at MAS by designing and

teaching three new courses. (1) ‘Computational Camera’: students learn and build novel imaging apparatus. The course projects have led to multiple high quality conference and journal papers as well as grant proposals. (2) ‘Future of Imaging’: students envision ‘what if’ scenarios. I brought in a series of high profile speakers. (3) 'Imaging Ventures': students learn to convert imaging ideas into business ventures and NGOs. This class has spawned multiple companies and business plans (mistersmartyplants.com, hypr3d.com, Quantum dots for image sensors: ‘Qamera’, Seeing Machine: a device to overcome partial blindness). Three business plans were finalists in tracks for the MIT 100K competition. It is very satisfying to be teaching classes that contribute to the entrepreneurial ethos and that help students take their ideas into the real world.

Service: I have served on two departmental committees (faculty search and graduate student academics). I have a strong record of engaging Media Lab sponsors, organizing sessions at major events and collaborating with specific sponsors (e.g., Samsung, Canon and Toshiba in the past (and ESPN, ITRI in the coming year) have employees hosted in my group). I have participated on Siggraph or Siggraph Asia papers committee every year since 2007 and I serve as Associate Editor for ACM Transactions on Graphics. I have served on NSF panels and NSF workshops for developing research agenda.

Outreach: I have deliberately chosen research projects that are not just academic curiosities but also have the potential for large scale impact in the real world. I am motivated to do that in part because I am a world citizen. I come from India and understand the tremendous role absence or presence of technology can play. Two recent projects in this space are NETRA and VisionOnTap. Advanced imaging can revolutionize health, especially in poor areas where existing solutions are far too expensive or impractical. It dawned on me that the current pixel pitch of mobile phone displays (26 micrometers) is approaching the limits of scientific instruments. NETRA, the mobile-phone based eye refraction test device is already being spun out in a non-profit effort in several developing countries via our NGO collaborators. NETRA is being considered as a tool on international space station by NASA. In India, L V Prasad Eye Institute has already run IRB approved trials on patients with good validation of NETRA accuracy. Eyeglasses cost as little as $3 to manufacture, but there are no easy diagnostic tests. As a result, half a billion people worldwide have uncorrected refractive error, leading to illiteracy (and poverty). Education is the key to living a decent and humane life. The fact that modern solutions may provide corrective vision and provide students a fighting chance for better education is a new dimension and is incredibly rewarding for me.

The project VisionOnTap is a real time computer vision service, for the masses and produced by the masses. Inspired by the Scratch platform at Media Lab, we have a created a new form of a visual social computing ecosystem to empower amateurs and the underprivileged to perform programmable and automated tasks on video streams.

My group and I have been involved in several outreach activities in the Boston area via visits, talks and half-day seminars at high schools and by hosting high school interns via MIT’s RSI program. I have worked hard to prepare my students for a graduate career and my documents on that topic are popular online, e.g., ‘How to invent’ via an ‘idea hexagon’ presentation that gets young people excited about research. I hope to inspire students and pass to them the torch of creativity and generativity.

5. ConclusionThe devices, algorithms and visualizations for creating super-human vision will exhibit strikingly different

forms and abilities in the future. With my unique background, my multidisciplinary team and collaborators, I plan to be at the forefront of converting elegant optical and mathematical insights into revolutionary cameras, displays, medical tools and future devices. I believe I have a plan and vision that will attract students and will inspire them to create new research fields and enterprises. The main area I intend to pursue in the coming years is a rigorous exploration of new algorithms, development of hardware prototypes and applications with broad research, commercial, educational and social impact.

5

http://hypr3d.com/

http://www.mistersmartyplants.com/

Documents

Computational Photography - MIT Media Labweb.media.mit.edu/~raskar/11Sig/raskarResStmntDec15th.doc · Web view1.2 Computational Photography Computational photography is an emerging,