Click here to load reader

Introduction to Computer Vision - University of Minnesota · April 11 W 4 - Spatial Domain + Computer Vision in Matlab April 13 F QUIZ #2 –No Lecture April 18 W 5 - Image Histograms

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Introduction to Computer VisionRODNEY DOCKTER, PH.D.

    MARCH 2018

    1

  • Rodney Dockter (me)• Ph.D. in Mechanical Engineering from the University of Minnesota

    • Worked in Dr. Tim Kowalewski’s lab

    • Medical robotics and machine learning

    • Now working at Danfoss Power Solutions developing autonomous systems in agriculture and heavy machinery

    2

  • Contact Information•Office: ME 2110

    •Email: [email protected]

    •Office Hours: W, F 1:15pm – 2:00 pm

    3

    mailto:[email protected]

  • Book References (optional)• David A. Forsyth and Jean Ponce: Computer vision: a modern approach. Prentice Hall, 2002.

    • Peter Corke: Robotics, Vision and Control: Fundamental Algorithms In MATLAB. Springer, 2017

    • R.C. Gonzalez and R.E. Woods: Digital Image Processing, Prentice-Hall, 2002

    • Richard Hartley and Andrew Zisserman: Multiple View Geometry in Computer Vision. Cambridge2004

    • Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer, 2011

    4

  • Class Organization• All computer vision from here on out!

    • ~10 lectures, 3 assignments, 30% of total grade

    • “Prerequisites”:• Linear algebra

    • Statistics

    • Vector Calculus

    • Coding!

    • Course does not presume prior computer vision experience

    • Emphasis on coding!

    • Matlab will be required for all homework assignments

    5

  • Class Organization Cont.• Collaboration Policy:

    - You are encouraged to discuss assignments with your peers. However, all written work and coding must be done individually. Individuals found submitting duplicate or substantially similar materials due to inappropriate collaboration may get an “F” in this class and may receive other serious consequences.

    - In the real world no one will write your code for you!

    • Libraries and Code Found Online:

    - None of the Matlab libraries for computer vision may be used in assignments. You will writing your own computer vision code from the ground up.

    - Students found plagiarizing code found on the internet will fail the course (we know how to use google too)

    6

  • Course Syllabus (Vision Portion)March 30 F 1 - Introduction to computer vision and image processing

    April 04 W 2 - Image Formation Camera Fundamentals

    April 06 F 3 - Digital Image Representation (Quiz #2 review)

    April 11 W 4 - Spatial Domain + Computer Vision in Matlab

    April 13 F QUIZ #2 – No Lecture

    April 18 W 5 - Image Histograms

    April 20 F 6 - Edge Detection

    April 25 W 7 - Edge Detection Cont. – Canny and Sobel

    April 27 F 8 - Interest Point Detection

    May 02 W 9 - Line Detection. Hough Transform.

    May 04 F 10 - The future. Deep Learning.

    7

  • Outline • What is computer vision? What is image processing?

    • Basics of Computer Vision Processing

    • Interesting examples of computer vision in the wild?

    • Brief introduction to Matlab for computer vision use.

    8

  • Part 1 - What is computer vision?Allowing robots and machines to see the world around them.

    9

  • What is computer vision?Once robots can see the world, they can interact with it.

    10

  • What is computer vision?Machines can learn to perform tasks just like humans.

    11

  • What is computer vision?• Deals with the formation, analyses and interpretation of images

    • Integral to robotics and Artificial Intelligence (AI)

    • Interdisciplinary subject area:• Robotics

    • Autonomous Vehicles

    • Medical Applications

    • Security

    • Practical and Useful

    • Challenging and continuously evolving

    12

  • Difficulties in Computer Vision• Images are ambiguous: dependent on perspective

    • Images are affected by many factors:• Sensor model

    • Illumination (lighting)

    • Shape of object(s)

    • Color of object (s)

    • Texture of object (s)

    • No “universal solution” to vision and sensing

    • Many theories and potential algorithms

    • Gold standard: Human Vision!

    13

  • Human Vision:

    14

  • Difficulties in Computer Vision• It is a many-to-mapping:• Different objects with different material and geometric properties, possibly under different lighting

    conditions could lead to identical images.

    • The same object, viewed from different perspectives or with different lighting can result in vastlydifferent images (under constrained mapping)

    • Information is lost in the transformation from the 3D world to a 2D image.

    15

  • Difficulties in Computer Vision• It is computationally intensive.

    • Usually requires high end processors or graphics cards to achieve real time performance.

    • We still do not fully understand the recognition problem.

    16

  • Nomenclature• Many names for mostly similar things:

    • Computer Vision• General – All encompassing term

    • Computational Vision• Modeling of biological vision

    • Image Processing• Generally refers to static images (frame to frame). Building block for computer vision.

    • Machine Vision• Industrial or factory applications for inspection and measurement.

    17

  • Related Fields• Pattern Recognition

    • Machine Learning

    • Robotics (obviously)

    • Medical Imaging

    • Computer Graphics

    • Human – Robot Interaction

    18

  • Why study computer vision• Images and videos are everywhere

    • Mobile phones, cheap cameras, real time streaming

    • New applications all the time

    • Face recognition

    • 3D representations from pictures

    • Automatic surveillance

    • Driverless vehicles

    • Shipping and warehouse management

    • Various deep scientific mysteries• Human vision system etc

    19

  • Brief History of Computer Vision• B.C. (Before Computers)

    • Philosophy

    • Optics

    • Physics

    • Neurophysiology

    • Early Cameras (Late 1800s)

    • Earliest way to represent the real world in a consistent manner

    • Precursor to video and digital representations of the world around us

    20

  • Brief History of Computer Vision• Early Computer Vision Systems (1960’s)

    • Minsky at MIT• Attempted to solve vision in a summer (lol)

    • Empirical approaches• Ad hoc, “bag of tricks”

    • Image processing plus+

    • Solutions tailored to each problem.

    • Simplified worlds• “Blocks worlds” then generalize

    • Known perspectives, known illumination

    21

  • Brief History of Computer Vision• 1980’s improved computational power + robotics innovations

    • 1990’s personal computing brings computer vision possibilities to researchers everywhere.

    • 2000’s OpenCV, ROS, and Matlab provide generalized computer vision computing abilities toresearchers

    • 2010’s – Computer vision comes to the masses:• Microsoft Kinect

    • Leap Motion

    • Google Tango

    • Augmented Reality

    • iPhone Face ID

    22

  • Progress in Computer Vision• First Generation: Military/ Early Research• Few systems, custom built. Cost: $1 Million+

    • Users have PhDs

    • Slow (1 hour per frame)

    • Second Generation: Industrial / Medical• Numerous systems (1000+). Cost: $10,000+

    • Users have bachelor degrees

    • Specialized hardware

    • Third Generation: Consumer• 100000 + systems worldwide. Cost: $100

    • Users have little or no training

    • Raspberry + webcam = $50 = limitless potential

    23

  • The Future• Software:• 1 solution for all vision problems

    • Provide unexperienced users with an easy way to detect/track objects.

    • Does it exist?

    • Convolutional Neural Networks?

    • Tensorflow?

    • Hardware:• Cheap cameras (cell phone)

    • Depth cameras (Kinect, stereo cameras)

    • Cheap, powerful, mobile processing (getting there)

    24

  • Part 2 – Computer Vision Processing• Processing Levels:• Low Level

    • Mid Level

    • High Level

    • Image Formulation

    25

  • Three Processing Levels – Low Level• Low Level Processing:

    • Standard procedures to improve image quality and format

    • No “intelligence”

    26

  • Three Processing Levels - Mid Level• Intermediate Processing:

    • Extract features and characterize components

    • Some “intelligence”

    27

  • Three Processing Levels - Mid Level• Representing small patches in an image• Want to establish correspondence between points or regions in different

    images, so we need to mathematically describe the neighborhood around apoint

    • Sharp changes are important features! (Known as “edges”)

    • Representing texture by giving some statistics of the different types of patchespresent in a region

    • i.e. Tigers have lots of stripes, few spots

    • Leapords have lots of spots, no stripes

    • How can we mathematically describe this?

    28

  • Three Processing Levels - Mid Level• Filter Outputs• Filters form a dot-product between a pattern and an image, while shifting the pattern

    across the image

    • Strong Response -> image region matches a pattern

    • e.g. derivatives measured by filtering with a kernel that looks like a big derivative (bright line next to a dark line would yield a large derivative)

    29

  • Three Processing Levels - Mid Level• Many objects are distinguished by their texture• Tigers, cheetahs, grass, trees…

    • We represent texture with statistics from filter outputs• For tigers, a stripes filter at a coarse scale responds strongly

    • For cheetahs, a spot filter with a coarse scale responds strongly

    • For grass, long narrow lines

    • For the leaves of a trees, extended spots

    • Objects with different textures can be segmented

    • The variation in textures in a cue to a shape

    30

  • Three Processing Levels - Mid Level• Geometry of multiple views (images)• Where might an object appear in camera #2 (or #3,…) given its position in

    camera 1

    • Stereopsis• What we know about the world from having 2 eyes (cameras)

    • Depth and 3D information

    • Structure from motion• What we know about the world from having many eyes (a single eye, moving

    quickly)

    31

  • Three Processing Levels - Mid Level• Finding coherent structures in an image to break it into smaller units• Segmentation:

    • Breaking images and videos into useful pieces

    • e.g. finding image components that are coherent in appearance

    • e.g. separating image foreground from background

    • Tracking:

    • Keeping track of a moving object through a sequence of views

    32

  • Three Processing Levels – High Level• High-Level Processing:

    • Recognizing patterns, comparing to known models

    • High “intelligence” (ie machine learning)

    33

  • Three Processing Levels - High Level• Relationship between object geometry and image geometry• Model Based Vision

    • Find the position and orientation of known objects (e.g. squares, circles)

    • Smooth surfaces and outlines

    • How the outline of a curved object is formed and what it looks like (contours)

    • Aspect graphs:

    • How the outline of a curved object changes as you view it from different directions

    • Range data

    34

  • Three Processing Levels - High Level• Using classifiers and probability to match and recognize objects• Templates and classifiers

    • How to find objects that look the same from view to view with a classifier

    • Relations

    • Break up objects into big, simple parts, find the parts with a classifier, thenidentify relationships between parts to find the object

    • Geometric templates from spatial relations

    • Extend this trick so that templates are formed from relations betweensmaller parts

    • Bordering on machine learning…

    35

  • Image Formulation

    36

  • Image Formulation

    37

  • Simplistic View

    38

  • The Physics of Imaging• How images are formed:

    • Cameras

    • How a camera creates an image

    • How to tell where the camera was

    • Light

    • How to measure light

    • What light does at surfaces

    • How the brightness values we see in cameras are determined

    • Color

    • The underlying mechanisms of color

    • How to describe and measure color

    39

  • The Physics of Imaging

    40

  • The Physics of Imaging

    More on this later …

    41

  • Part 4 – Computer Vision in the wild

    There are countless products that havean element of computer vision, andthe number increases daily.Consumer, defense, and research.

    42

  • Computer Vision in the wild

    NASA Rover PAR systems: Vision GuidedRobots

    43

  • Computer Vision in the wild

    Face Recognition ForSecurity, User Interface

    Apple Face ID

    44

  • Computer Vision in the wild

    Challenges in Face Recognition:-Illumination-Pose Variations-Facial Expressions-Facial Similarity

    45

  • Computer Vision in the wild

    3D Sensors, Microsoft, Intel, etcTime of Flight Depth

    Gesture RecognitionLeap Motion, Realsense

    46

  • Computer Vision in the wild

    Optical Character Recognition(Hand Written Digits)

    MNIST Data Set

    47

  • Computer Vision in the wild

    Traffic Monitoring Augmented Driver Assist Systems

    48

  • Computer Vision in the wild

    Autonomous Vehicles

    49

  • Computer Vision in the wild

    Robotics Navigation

    50

  • Computer Vision in the wild

    Brain Imaging MRI Vital Images Inc.

    51

  • Computer Vision in the wild• Types of images:• Infra-red

    • Ultra-violet

    • Radio-waves

    • Visible-light

    • Radar

    • Tomography

    • Sonar

    • Microscopy

    • Magnetic Resonance

    52

  • Computer Vision in the wild

    Thermal Imaging:Applications in defense, agricultureLocal company: Fluke Thermography

    53

  • Computer Vision in the wild

    Human Activity Recognition

    54

  • Computer Vision in the wild

    Video Surveillance and Tracking

    55

  • Can computers match human perception?• Not yet…

    • Computer vision is still no match for human perception

    • But computers are catching up in certain areas

    • Classifying the ImageNet Database:• 1.2 million images, 1000 categories

    • Human: ~5.1% Error Rate

    • Inception-v3 Convolutional Neural Network: 3.46% Error Rate

    https://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

    56

  • Conclusion• Computer Vision is a challenging and exciting field

    • Applied to many real world situations

    • Tremendous progress in the last two decades

    • There is still work to be done…

    57