Upload
mark-billinghurst
View
1.018
Download
1
Embed Size (px)
Citation preview
LECTURE 10: RESEARCH DIRECTIONS IN MOBILE AR
Mark Billinghurst [email protected]
Zi Siang See [email protected]
November 29th-30th 2015
Mobile-Based Augmented Reality Development
Looking to the Future
The Future is with us It takes at least 20 years for new
technologies to go from the lab to the lounge..
“The technologies that will significantly affect our lives over the next 10 years have been around for a decade.
The future is with us. The trick is learning how to spot it. The commercialization of research, in other words, is far more about prospecting than alchemy.”
Bill Buxton
Oct 11th 2004
Research Directions • Tracking
• Markerless tracking, hybrid tracking
• Interactions • Displays, input devices, gesture, social
• Applications • Collaboration
• Scaling Up • User evaluation, novel AR/MR experiences
TRACKING
Sensor tracking • Used by many “AR browsers” • GPS, Compass, Accelerometer, (Gyroscope) • Not sufficient alone (drift, interference)
Combining Sensors and Vision • Sensors
• Produce noisy output (= jittering augmentations) • Are not sufficiently accurate (= wrongly placed augmentations) • Gives us first information on where we are in the world,
and what we are looking at
• Vision • Is more accurate (= stable and correct augmentations) • Requires choosing the correct keypoint database to track from • Requires registering our local coordinate frame (online-
generated model) to the global one (world)
Outdoor AR Tracking System
You, Neumann, Azuma outdoor AR system (1999)
Wide Area Tracking
• Process • Combine panorama’s into point cloud model (offline) • Initialize camera tracking from point cloud • Update pose by aligning camera image to point cloud • Accurate to 25 cm, 0.5 degree over wide area
Ventura, J., & Hollerer, T. (2012). Wide-area scene mapping for mobile visual tracking.In Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on (pp. 3-12). IEEE.
Project Tango
• Smart phone + Depth Sensing • Sensors
• Gyroscope/accelerometer/compass • 180º field of view fisheye camera • An infrared projector. • 4 MP RGB/IR camera
How it Works
• Sensors • 4MP RGB/IR camera : can capture full color images and
detect IR reflections. • IR Depth Sensor : Used to measure depths with IR pulse • Tracking Camera : To track objects
• 3 Basic operations • In real time can map depth of environment • Measure depth accurately using IR pulse • Create a 3D model of the environment real time
Applications
• Indoor tracking, games, disability, etc
Qualcomm Smart Terrain
• Reconstructs the environment • Builds mesh
• Recognizes props • Separate objects in environment
Applications
• Gaming, advertising, training
DISPLAYS
Occlusion with See-through HMD • The Problem
• Occluding real objects with virtual • Occluding virtual objects with real
Real Scene Current See-through HMD
ELMO (Kiyokawa 2001)
• Occlusive see-through HMD • Masking LCD • Real time range finding
ELMO Demo
ELMO Design
• Use LCD mask to block real world • Depth sensing for occluding virtual images
Virtual images from LCD
Real World
Optical Combiner
LCD Mask Depth Sensing
ELMO Results
Contact Lens Display • Babak Parviz
• University Washington
• MEMS components • Transparent elements • Micro-sensors
• Challenges • Miniaturization • Assembly • Eye-safe • Providing power, data
Contact Lens Prototype
Wide FOV Displays
• Wide FOV see-through display for AR • LCD panel + edge light point light sources • 110 degree FOV
Maimone, A., Lanman, D., Rathinavel, K., Keller, K., Luebke, D., & Fuchs, H. (2014). Pinlight displays: wide field of view augmented reality eyeglasses using defocused point light sources. In ACM SIGGRAPH 2014 Emerging Technologies (p. 20). ACM.
INTERACTION
The Vision of AR
To Make the Vision Real..
• Hardware/software requirements • Contact lens displays • Free space hand/body tracking • Environment recognition • Speech/gesture recognition • Etc..
Natural Interaction
• Automatically detecting real environment • Environmental awareness • Physically based interaction
• Gesture Input • Free-hand interaction
• Multimodal Input • Speech and gesture interaction • Implicit rather than Explicit interaction
AR MicroMachines
• AR experience with environment awareness and physically-based interaction • Based on MS Kinect RGB-D sensor
• Augmented environment supports • occlusion, shadows • physically-based interaction between real and virtual objects
Operating Environment
Architecture • Our framework uses five libraries:
• OpenNI • OpenCV • OPIRA • Bullet Physics • OpenSceneGraph
System Flow • The system flow consists of three sections:
• Image Processing and Marker Tracking • Physics Simulation • Rendering
Physics Simulation
• Create virtual mesh over real world
• Update at 10 fps – can move real objects
• Use by physics engine for collision detection (virtual/real)
• Use by OpenScenegraph for occlusion and shadows
Rendering
Occlusion Shadows
Mo#va#on AR MicroMachines and PhobiAR
• Treated the environment as sta/c – no tracking
• Tracked objects in 2D
More realis#c interac#on requires 3D gesture tracking
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
Architecture
5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling • Hand recognition/
modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Supports PCL, OpenNI, OpenCV, and Kinect SDK.
o Provides access to depth, RGB, XYZRGB. o Usage: Capturing color image, depth image and
concatenated point clouds from a single or multiple cameras
o For example: Kinect for Xbox 360
Kinect for Windows
Asus Xtion Pro Live
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling • Hand recognition/
modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Segment images and point clouds based on color, depth and space.
o Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size.
o For example:
Skin color segmentation
Depth threshold
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling • Hand recognition/
modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Identify and track objects between frames based on XYZRGB.
o Usage: Identifying current position/orientation of the tracked object in space.
o For example:
Training set of hand poses, colors represent unique regions of the hand.
Raw output (without-cleaning) classified on real hand input (depth image).
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling • Hand recognition/
modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Hand Recognition/Modeling ! Skeleton based (for low resolution
approximation) ! Model based (for more accurate
representation) o Object Modeling (identification and tracking rigid-
body objects) o Physical Modeling (physical interaction)
! Sphere Proxy ! Model based ! Mesh based
o Usage: For general spatial interaction in AR/VR environment
Results
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling • Hand recognition/
modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Static (hand pose recognition) o Dynamic (meaningful movement recognition) o Context-based gesture recognition (gestures
with context, e.g. pointing) o Usage: Issuing commands/anticipating user
intention and high level interaction.
Gesture Based Interaction
• Use free hand gestures to interact • Depth camera, scene capture
• Multimodal input • Combining speech and gesture
HIT Lab NZ Microsoft Hololens
Meta SpaceGlasses
Natural Gesture Interaction on Mobile
• Use mobile camera for hand tracking • Fingertip detection
Capturing Behaviours
▪ 3 Gear Systems ▪ Kinect/Primesense Sensor ▪ Two hand tracking ▪ http://www.threegear.com
Performance
▪ Full 3d hand model input ▪ 10 - 15 fps tracking, 1 cm fingertip resolution
Multimodal Interaction
• Combined speech input • Gesture and Speech complimentary
• Speech • modal commands, quantities
• Gesture • selection, motion, qualities
• Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
Free Hand Multimodal Input
• Use free hand to interact with AR content • Recognize simple gestures • No marker tracking
Point Move Pick/Drop
Multimodal Architecture
Multimodal Fusion
Hand Occlusion
User Evaluation
• Change object shape, colour and position • Conditions
• Speech only, gesture only, multimodal
• Measure • performance time, error, subjective survey
Experimental Setup
Change object shape and colour
Results • Average performance time (MMI, speech fastest)
• Gesture: 15.44s • Speech: 12.38s • Multimodal: 11.78s
• No difference in user errors • User subjective survey
• Q1: How natural was it to manipulate the object? • MMI, speech significantly better
• 70% preferred MMI, 25% speech only, 5% gesture only
COLLABORATION
Resolution Tube
• http://www.resolutiontube.com/ • Shared video calls with annotations
Vipaar Lime - https://www.vipaar.com/
• Remote collaboration on handheld • Remote users hands appear in live camera view
SOCIAL IMPLICATIONS
Consider the Whole User
How is the User Perceived?
TAT Augmented ID
Social Acceptance
• People don’t want to look silly • Only 12% of 4,600 adults would be willing to wear AR glasses • 20% of mobile AR browser users experience social issues
• Acceptance more due to Social than Technical issues • Needs further study (ethnographic, field tests, longitudinal)
CROSSING BOUNDARIES
Crossing Boundaries
Jun Rekimoto, Sony CSL
Invisible Interfaces
Jun Rekimoto, Sony CSL
Milgram’s Reality-Virtuality continuum
Mixed Reality
Reality - Virtuality (RV) Continuum
Real Environment
Augmented Reality (AR)
Augmented Virtuality (AV)
Virtual Environment
The MagicBook
Reality Virtuality Augmented Reality (AR)
Augmented Virtuality (AV)
Invisible Interfaces
Jun Rekimoto, Sony CSL
Example: Visualizing Sensor Networks
• Rauhala et. al. 2007 (Linkoping) • Network of Humidity Sensors
• ZigBee wireless communication
• Use Mobile AR to Visualize Humidity
Invisible Interfaces
Jun Rekimoto, Sony CSL
Ubiquitous AR (GIST, Korea)
• How does your AR device work with other devices? • How is content delivered?
CAMAR - GIST
(CAMAR: Context-Aware Mobile Augmented Reality)
Requirements for Ubiquitous AR • Hardware is available (mobile phones). • Required are software standards:
• APIs for common framework, independent of hardware. • ARML as descriptor language for AR environment, scenario, etc.
• Further required: • Authoring tools for creating AR applications • AR Enabled infrastructure (buildings etc)
Reality Virtual Reality
Terminal
Ubiquitous
Desktop AR VR
Milgram
Weiser
UbiComp
Mobile AR
Ubi AR
Ubi VR
SCALING UP
Reality
VR
Ubiquitous
Terminal
Milgram
Weiser
Single User
Massive Multi User
Massive Multiuser
• Handheld AR for the first time allows extremely high numbers of AR users
• Requires • New types of applications/games • New infrastructure (server/client/peer-to-peer) • Content distribution…
Social Network Systems • 2D Applications
• MSN – 29 million • Skype – 10 million • Facebook – up to 70m
• Desktop VR • SecondLife > 50K • Stereo projection - <500
• Immersive VR • HMD/Cave based < 100
• Augmented Reality • Shared Space (1999) - 4 • Invisible Train (2004) - 8
PERSONAL VIEW
Augmented Reality 2.0 Infrastructure
Leveraging Web 2.0 • Content retrieval using HTTP • XML encoded meta information
• KML placemarks + extensions • Queries
• Based on location (from GPS, image recognition) • Based on situation (barcode markers)
• Syndication • Community servers for end-user content • Tagging
• AR client subscribes to data feeds
Scaling Up
• AR on a City Scale • Using mobile phone as ubiquitous sensor • MIT Senseable City Lab
• http://senseable.mit.edu/
WikiCity Rome (Senseable City Lab MIT)