View
12
Download
0
Category
Preview:
Citation preview
10/15/2018
1
Machine Learning in Test‐ A Journey To AI
ITC 2018 Tutorial ‐ML in Test ‐Wang 1
Li‐C. WangUniversity of California, Santa Barbara
Let’s Start With Four Questions
What kind of “AI?”
Why “A Journey To AI?”
How “Machine Learning” is applied in Test?
What “Machine Learning” means anyway?
ITC 2018 Tutorial ‐ML in Test ‐Wang 2
10/15/2018
2
What AI?
ITC 2018 Tutorial ‐ML in Test ‐Wang 3
What Kind of “AI”?
1950 “Computing Machinery and Intelligence” – The Turing TestOther AI: Thinking humanly, Thinking rationally, Acting rationally“Artificial Intelligence ‐ A Modern Approach” 3rd Edition, PRENTICE HALL SERIES
ITC 2018 Tutorial ‐ML in Test ‐Wang 4
(Acting humanly – The Turing Test Approach)
Natural Language Processing
Knowledge Representation
Automated ReasoningComputerVision
Robotics
MachineLearning
AI
10/15/2018
3
What Kind of “AI”? (Outline Of The Book)
Problem Solving– Classical search, adversarial search, CSPKnowledge Representation, Reasoning, Planning– Prop. Logic, 1st‐order Logic, Planning and ActingUncertain Knowledge and Reasoning– Probabilistic reasoning, reasoning over time, Decision making
Learning– Learning algorithms, Explanation‐based learning, reinforcement learning
Communication, Perceiving, Acting– NLP, Perception, Robotics
ITC 2018 Tutorial ‐ML in Test ‐Wang 5
What “AI” Remind US?
Problem Solving– Classical search, adversarial search, CSPKnowledge Representation, Reasoning, Planning– Prop. Logic, 1st‐order Logic, Planning and ActingUncertain Knowledge and Reasoning– Probabilistic reasoning, reasoning over time, Decision making
Learning– Learning algorithms, Explanation‐based learning, reinforcement learning
Communication, Perceiving, Acting– NLP, Perception, Robotics
ITC 2018 Tutorial ‐ML in Test ‐Wang 6
Self‐Driving Car
10/15/2018
4
Intelligence Vehicle Development
1959 – GM Firebird III– In vision of autonomous driving1990‐95 – CMU Navlab landmark vehicle1994 – Diamler VITA II (Vision Technology Application)– Automatic collision avoidance– Autonomous highway driving and lane change1997 – Netherland PATH program– Demonstrated platoon driving1998‐01 – NIST Demo III Experimental Unmanned Vehicle
DARPA Challenges on Autonomous Driving– Mar 13,04 – no one completed 5% of the 142‐mile off‐road course– Oct 8, 05 – 5 completed with Stanford’s “Stanley” winning the race– Nov 07 – Urban Challenge (CA) won by CMU’s “Boss”
7ITC 2018 Tutorial ‐ML in Test ‐Wang
DARPA Urban Challenge
GPS+DMI provide real‐time positioningLMS + Riegl provide information of 3D road structure and road surface (e.g. lane)Velodyne + LDLRS + Radar provide moving vehicle information
8
CMU’s Boss
Stanford’s Junior
BOSCH Radar
IBEO Laser
Riegl LaserVelodyne Laser
SICK LMS Laser
(Distance) DMISICK LDLRS Laser
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
5
Lesson from Urban Challenge
Urban Challenge represented significantly more difficulties than previous challenges
Advancement in sensor technology (off‐the‐shelf sensor availability) was a key reason for the success
However, senor technology was not yet sufficient to cope with noises and cost for practical production of autonomous vehicles
Since then, technologies got much improved– Better sensors– Advances in pattern recognition– More powerful hardware
9ITC 2018 Tutorial ‐ML in Test ‐Wang
Today’s Example ‐ Google’s Waymo
By 2012, with more than 300K test driveIn 2015, launch fully self‐driving car Firefly
In 2017, introduce fully self‐driving Chrysler Hybrid minivans
10ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
6
Future Projection
11
Source: Victoria Transportation Policy Institute, May 1st 2017
Large‐scaletesting
Taxi,Carshareservices
A‐VLane
RestrictingHuman‐driving
InfrastructureRe‐design
ITC 2018 Tutorial ‐ML in Test ‐Wang
The Three Components
Sensing: SR/LR Radars, LiDAR, Vision, Stereovision, U‐Sonic, etc. Perception: Lane detection, object recognitionReasoning & Control: Free space calculation, path planning, speed/brake/rotation controls
ITC 2018 Tutorial ‐ML in Test ‐Wang 12
Self‐Driving Car
Sensing
Perception
Reasoning& Control
10/15/2018
7
What The 3 Components Do
Sensing: Collect all relevant dataPerception: Recognize what data meanReason & Control: What to do next
ITC 2018 Tutorial ‐ML in Test ‐Wang 13
Sensing
Perception
Reasoning& Control
SR‐RadarLR‐Radar
LiDAR
VisionStereovision
U‐Sonic …Perception Component
Flowchart
Optimizationengine
Decision‐TreeRules
…
Operate In A “WORLD VIEW”
The system maintains a “WORLD VIEW” on the environment, and react to it
14
Source: Google self‐driving car pictures from the public domain
ITC 2018 Tutorial ‐ML in Test ‐Wang
Sensing
Perception
Reasoning& Control
10/15/2018
8
World View – Google Self‐Driving Car
15ITC 2018 Tutorial ‐ML in Test ‐Wang
(Google More Examples Online)
Why “Journey To AI”?
ITC 2018 Tutorial ‐ML in Test ‐Wang 16
10/15/2018
9
Applying ML in Design/Test (2003‐2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 17
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
Applying ML in Design/Test (2003‐2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 18
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
IEEE Trans. On CAD Paper (Oct 2016):
“Experience of Data Analytics in EDA and Test – Principles, Promises, and Challenges”
10/15/2018
10
Journey To AI
This is especially the case when the ML solution is deployed in design/test processes
ITC 2018 Tutorial ‐ML in Test ‐Wang 19
Applying ML in a D&T application
In order to deploy a solution, it is not just about the ML tool – very often, we
need a system to apply ML.
AI (Autonomous System)
Where we are now
ITC 2018 Tutorial ‐ML in Test ‐Wang 20
10/15/2018
11
The project (Intelligence Engineering Assistant)
ITC 2018 Tutorial ‐ML in Test ‐Wang 21
The IEA research lab: https://iea.ece.ucsb.edu/
IEA pounced as “Ai‐Ya”
(Artificial Intelligence – Your Assistant)
ITC 2018 Tutorial ‐ML in Test ‐Wang 22
10/15/2018
12
What’s IEA like?
ITC 2018 Tutorial ‐ML in Test ‐Wang 23
Tasks Performed By A Product Engineer
ITC 2018 Tutorial ‐ML in Test ‐Wang 24
ManufacturingProcess
ClassProbe
WaferProbe
Packaging Burn‐inFinalTest
CustomerReturns
Shipped to customers
Production Test
Interface
yield issueAnalytic Workflow
Findings
PPT Presentation
10/15/2018
13
IEA for Yield Optimization
ITC 2018 Tutorial ‐ML in Test ‐Wang 25
ManufacturingProcess
ClassProbe
WaferProbe
Packaging Burn‐inFinalTest
CustomerReturns
Shipped to customers
Production Test
Interface
yield issueAnalytic Workflow
Findings
PPT Presentation
IEA System to execute the tasks
<IEA Demo>
ITC 2018 Tutorial ‐ML in Test ‐Wang 26
10/15/2018
14
What’s IEA?
ITC 2018 Tutorial ‐ML in Test ‐Wang 27
Concept‐Based Workflow Programming
To implement this workflow, we need several data‐driven concept recognizers– Yield excursion, low‐yield, grid pattern, edge failing, correlation trend
ITC 2018 Tutorial ‐ML in Test ‐Wang 28
IF (observe a yield excursion) {% enter the particular yield excursion contextFOR (selected low‐yield lots) {
% enter the low‐yield contextSELECT a test bin with the largest yield loss;PERFORM e‐test correlation analysis;IF (observe a correlation trend) {% enter the correlation contextREPORT the trend plot;GENERATE stack‐wafer plots;IF (observe a grid pattern) ‐> REPORT plots;IF (observe a edge failing pattern)
‐> REPORT plots;}}}…
10/15/2018
15
A New Programming Paradigm
IEA is a concept‐based workflow programming environment
An IEA system comprises three components– API for workflow construction– Library of concept recognizers– Data processing and analysis tools
ITC 2018 Tutorial ‐ML in Test ‐Wang 29
ExecutableWorkflow
ConceptRecognizers
Data processing and analysis tools
The Analogy
Data Tools: Collect all relevant “data”Concept Recognition: What “data” meanWorkflow: What to do next
ITC 2018 Tutorial ‐ML in Test ‐Wang 30
Sensing
Perception
Reasoning& Control
IEA System
ExecutableWorkflow
ConceptRecognizers
Data processing and analysis tools
10/15/2018
16
Machine Learning in Test‐ A Closer Look At The Journey
ITC 2018 Tutorial ‐ML in Test ‐Wang 31
Applying ML in Design/Test (2003‐2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 32
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
This journey went through multiple stages …
10/15/2018
17
Applying ML in Design/Test (2003‐2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 33
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
At first, “Algorithmic Focus” –What is the “right”
ML algorithm (tool) to apply?
Learning from “Data”
Machine learns from training data to build a model
The model is used to “predict” future “unseen” data
ITC 2018 Tutorial ‐ML in Test ‐Wang 34
“Data” MachineLearning Model
10/15/2018
18
ML Python Lib: http://scikit‐learn.org/
35ITC 2018 Tutorial ‐ML in Test ‐Wang
Dataset Format
A learning tool usually takes the dataset as above– Samples: examples to be reasoned on– Features: aspects to describe a sample– Vectors: resulting vector representing a sample– Labels: care behavior to be learned from (optional)
36
features
samples labels
vectors
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
19
Supervised Learning
Classification– Labels represent classes (e.g. +1, ‐1: binary classes)Regression– Labels are some numerical values (e.g. frequencies)
37
(features)
Labels
ITC 2018 Tutorial ‐ML in Test ‐Wang
Unsupervised Learning
Work on features– Dimension reduction– TransformationWork on samples– Clustering– Novelty detection– Density estimation
38
(features)
No y’s
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
20
Take Classification
ITC 2018 Tutorial ‐ML in Test ‐Wang 39
Basic Approaches for Classification
Nearest NeighborsLinear Discriminant Analysis (LDA)– Quadratic Discriminant Analysis (QDA)Naïve BayesDecision Tree– Random ForestSupport Vector Machine– Linear– Radius Based Function (RBF)Neural Networks
ITC 2018 Tutorial ‐ML in Test ‐Wang 40
10/15/2018
21
e.g. Nearest Neighbors
ITC 2018 Tutorial ‐ML in Test ‐Wang 41
= = average of the k nearest neighbors to Uniform average or
weighted by inverse of distance User choosea given distance functionIn a given space
Source: http://scikit‐learn.org/stable/auto_examples/neighbors/plot_classification.html#example‐neighbors‐plot‐classification‐py
Linear Discriminant Analysis (LDA)
For each class, the mean and covariance are estimated based on the dataIn LDA, the two covariances are assumed to be the same– Otherwise, it is called Quadratic Discriminant Analysis (QDA)In many cases, the difference between LDA and QDA is small
ITC 2018 Tutorial ‐ML in Test ‐Wang 42
Model it as a Gaussian Distribution ( , )Model it as another Gaussian Distribution( , )
Class 1
Class 2
Decision function: = ( inclass1|given )( inclass2|given )
10/15/2018
22
Bayesian Inference – Naïve Bayes Classifier
ITC 2018 Tutorial ‐ML in Test ‐Wang 43
evidencelikelihoodprior
xxpclassxxpclasspxxclassp
n
nn
×==
),...()|,...,()(),...,|(
1
11
)|,...,()(),...,|( 11 classxxpclasspxxclassp nn ∝
Independent assumptions
The naïve Bayes classifier uses the assumption that features are mutually independent– This is not usually not true as we have seen in the test data
Also, if each xi is a continuous variable, we either need to estimate the probability density, or we need to discretize the value into ranges
)|()|()( 1 classxpclassxpclassp nL∝
Decision Tree Classifier
An easy and popular learning algorithm CART (1984 Breiman et al.)Of course, the key question is how to measure “purity”
ITC 2018 Tutorial ‐ML in Test ‐Wang 44
Find the best feature f and thedecision rule f>c to split the datasetinto 2 dataset with more purity
Recursivelyfind the best split
Recursivelyfind the best split
10/15/2018
23
CART Approach
Randomly select m1/2
variable to be tried at each split node
Find the variable that split the data the best (purity meas.)
Stop Criterion1. The split has fully
separated the subset2. None of the variable
can further separated the subset anymore.
ITC 2018 Tutorial ‐ML in Test ‐Wang 45
x1>c1
Class 1
x2>c2
x3<c3
Class 1Class 2
Class 1
Class 2
Class 2
Support Vector Machine (SVM)
In SVM, it tries to find an optimal weight vector (the “alpha” vector) for the samplesIdeally, many alpha’s values are zeroThe non‐zero samples are “support vectors”
46
(features)
Labels
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
24
What Is A SVM Model Like?
Suppose we have a similarity function that measures the similarity between two sample vectors
ITC 2018 Tutorial ‐ML in Test ‐Wang 47
),( ixxk rrmeasures the similarity between two vectors
A simple classifier is to compare the average similarity to class +1 samples to the average similarity to class ‐1 samples
= ( ) = 1 , − 1 ,: +:),()( ii xxkbxf rrr αΣ+=
If we weight each sample differently
Neural Networks
A neural network learning algorithm determines the best weights – most of them might be zero
ITC 2018 Tutorial ‐ML in Test ‐Wang 48
… +⋯+∙ > ; , ‐1
Inputs Outputs
(∙)NeuronInputs
10/15/2018
25
A Comparison of Classifiers
Algorithms are comparable on the 1st and 3rd examplesPerformance on the 2nd example variesIn practical application, a more complex algorithm is not necessarily betterResults also largely depend on the “space” the data is projected onto
ITC 2018 Tutorial ‐ML in Test ‐Wang 49
Source: http://scikit‐learn.org/stable/
An Common Application
Learning model tries to replace the expensive flow with the cheaper one
ITC 2018 Tutorial ‐ML in Test ‐Wang 50
… An complex andexpensive test flow
……
(N+M) sample parts Class 1
Class 2
N parts
M parts
……
A much cheapertest flow involving
K tests
⋯⋮ ⋱ ⋮⋯⋯⋮ ⋱ ⋮⋯+1
‐1
Learn
…… LearningModel
Parts in production
…Class 1
Class 2
10/15/2018
26
Alternative Analog Test – A Classification Problem
ITC 2018 Tutorial ‐ML in Test ‐Wang 51
n measurements
mnmm
n
n
mxxx
xxxxxx
x
xx
...............
...
...
...21
22221
11211
2
1
==X
my
yy
...2
1
=y
nMMM ...21Pass/Fail
msa
mpl
es c
hips
Dataset
nxxxx Lr
21= (chip under test)
Pass/Fail?
These measurements are low‐cost alternative tests
m chips
Unsupervised
ITC 2018 Tutorial ‐ML in Test ‐Wang 52
10/15/2018
27
Basic Clustering Algorithms
Clustering largely depends on– The space the samples are projected onto– The definition of the concept “similarity”
ITC 2018 Tutorial ‐ML in Test ‐Wang 53
Source: http://scikit‐learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html
Clustering: K‐Mean
K‐Means– User gives the number of clusters k– The algorithm iteratively tries to find the “best” k centroids Mini Batch K‐Means (for speed reason)– Search is based on a subset of samples– “Best” is evaluated based on all samples
ITC 2018 Tutorial ‐ML in Test ‐Wang 54http://scikit‐learn.org/stable/modules/clustering.html
10/15/2018
28
Transformation – Principal Component Analysis
Principal Component Analysis (PCA) – find directions where the data spread out with large variance– 1st PC (PC1) – data spread out with the most variance– 2nd PC (PC2) – data spread out with the 2nd most variance– …
PCA is used for outlier analysis in test
ITC 2018 Tutorial ‐ML in Test ‐Wang 55
s1
sM
r1
rM
= [ d11, d12, d1N
= [ d1, dK2, DMN
]
]
…
…
…
M sam
ples
PCA
Re‐Projectionof data in aPCA space
…
f1 f2 fN…
PC1
PC2
PCA for Outlier Analysis in Test
Each test is used to screen with a test limit– Two tests essentially define a bounding box
Multivariate outliers are not screened by applying tests individually
ITC 2018 Tutorial ‐ML in Test ‐Wang 56
This outliers are notscreened by the twotests individually
10/15/2018
29
Multivariate Outlier Analysis
Use PCA to re‐project the data into a PCA space – then define the test limits in the PCA space– Each PC becomes just another test individually
Also see Nik Sumikawa et al. (ITC 2012) – “Screening Customer Returns With Multivariate Test Analysis”
ITC 2018 Tutorial ‐ML in Test ‐Wang 57
This is what we desire PCA helps achieve that
Main Points to Keep In Mind
There are categories of machine learning problems
Many algorithms for each category
An algorithm usually requires user‐input parameters– This determination is often done with cross‐validation
For a practitioner– 1. Formulate the problem as a machine learning problem– 2. Pick an algorithm– 3. Experimentally to select the parameters to build model
ITC 2018 Tutorial ‐ML in Test ‐Wang 58
10/15/2018
30
But, it is not just about the ML tool …
ITC 2018 Tutorial ‐ML in Test ‐Wang 59
An Application Example – Fmax Prediction
ITC 2018 Tutorial ‐ML in Test ‐Wang 60
n low‐cost delay measurements
mnmm
n
n
mxxx
xxxxxx
x
xx
...............
...
...
...21
22221
11211
2
1
==X
my
yy
...2
1
=y
nMMM ...21Fmax
msamples chips
Dataset
nxxxx Lr
21= (a new chip c)
Fmax of cDelay measurements can be– FF based, pattern based, path based, or RO based
10/15/2018
31
Example Algorithms For Regression
See Janine Chen et al. (ITC 2009)– “Data Learning Techniques and Methodology for Fmax Prediction”
ITC 2018 Tutorial ‐ML in Test ‐Wang 61
LSF method(linear model,over‐fitting the training dataset)
RG method(linear model,
provide a way toavoid the over‐fitting)
K‐NN method(distance‐based,over‐fitting the training dataset)
SVR method(distance‐based,use kernel k( ) to
calculate the distance,provide a way to
avoid the over‐fitting)
GP method(Bayesian version ofthe SVR method with the abilityto estimate the
prediction confidence)
Improve on the over-fitting issue
Improve on the over-fitting issue
Combinedwith
Bayesianinference
Replace linear modelwith a model in the formof a linear combinationof kernel basis functions
GP Was The Best! (Conformal Check)
See Janine Chen et al. (ITC 2009)– “Data Learning Techniques and Methodology for Fmax Prediction”
ITC 2018 Tutorial ‐ML in Test ‐Wang 62
1 5 9 13 17 21 25 29 33 37 41 45
Freq
uenc
y
99.7% confidence band
45 conformal samples (sorted by predicted Fmax)
95.4% confidence band (dotted line)
Predicted FmaxActual Fmax
10/15/2018
32
System Fmax Prediction
Structural Fmax on 1443 FFs were measured5/6 samples were used to build a predictive model for system FmaxModel was applied on remaining 1/6 samplesSee Janine Chen et al. (ASP‐DAC 2010)– “Correlating System Test Fmax with Sturctural Test Fmax and Process Monitoring
Measurements”
ITC 2018 Tutorial ‐ML in Test ‐Wang 63
Correlation = 0.98Sys F
max
Predicted Sys Fmax
0
5
10
15
Fmax Freq. (lot1 – 79 cores)
Coun
t
0
5
10
15
20
Fmax Freq. (lot2 – 74 cores)
Correlation Lot1 Lot2
Predictive model 0.98 0.87
Best single FF 0.83 0.55
A Barrier for Deployment
Can’t deploy a model without having a consistent set of features across all lots
ITC 2018 Tutorial ‐ML in Test ‐Wang 64
0
0.2
0.4
0.6
0.8
1
1443 722 316 181 91 46 23 12
Accuracy
Lot1
Lot2
Selecting 23 featuresgives the best result = 0.98
Selecting 46 featuresgives the best result = 0.80
Lot1
Lot2
FF number
# of features
10/15/2018
33
Applying ML in Design/Test (2003‐2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 65
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
In the 2nd stage, it was “Methodology Centric” – What should be the methodology to enable deployment of a ML‐
based solution/model?
Design‐Silicon Timing Correlation
Question to answer: Why Static Timing Analyzer (STA) and Silicon Path Delays disagree?
ITC 2018 Tutorial ‐ML in Test ‐Wang 66
10/15/2018
34
Design‐Feature‐Based Analytics Framework
Answer: Those features cause the mismatch
ITC 2018 Tutorial ‐ML in Test ‐Wang 67
Design databaseVerilog netlist
Timing report
Cell models
LEF/DEFSwitchingactivity
SImodel
Temperature map
Power analysis
Path
selection
pathsPath
encoding
Design features
ATPG
Tests Test data
Path data
PathVectors(dataset)
Learning
Test pattern simulation
Features
Preparation – Feature Generation
ITC 2018 Tutorial ‐ML in Test ‐Wang 68
Cell-Based Transistor-Based
Interconnect-Based
Features are potential sources of uncertainty on a path
10/15/2018
35
Feature Generation
ITC 2018 Tutorial ‐ML in Test ‐Wang 69
victim
0
1Coupling
Multiple Input Switching
Dynamic effects
Temperature(Eli Chiprout, Intel)
Power noise(Eli Chiprout, Intel)
Location-based
Y
X
Binary Classification – Tree Learning
Design features extracted from STA timing reports and GDSII
Tree model: There are > 14 single vias between layers 4/5 and > 70 double vias between layers 5/6
One can validate a tree model by visualizing the colored scatter plot
ITC 2018 Tutorial ‐ML in Test ‐Wang 70
1.5 2 2.5 3 3.5 4 4.51.5
2
2.5
3
3.5
4
4.5
5
5.5
Normalized Expected Slack
Nor
mal
ized
Mea
sure
d Sl
ack
Single Via C
ount Layers 4/5 Clock P
ath
0
2
4
6
8
10
12
14
16
18
20
Validation of the tree model
10/15/2018
36
Another Example (Joint work with AMD)
Fifteen 4‐core processor parts480 critical paths from AC delay tests (shown on top 4 freq. steps)Many are STA non‐critical (Slack > x)See Janine Chen et al. (ITC 2010) – “Mining AC Delay Measurements for Understanding Speed‐limiting Paths”
ITC 2018 Tutorial ‐ML in Test ‐Wang 71
050100150200250300350
Step 1 Step 2 Step 3 Step 4
Path cou
nt
Very Unbalanced Dataset (M >> k)
There are STA‐critical 12,248 paths activated by patterns– with slacks <= x– Do not show up as critical paths in top 4 frequencies
For 480 failing FFs from top 4 frequency steps– 158 silicon critical but STA non‐critical paths
Question: What are unique about the 158 paths?– Use 12,248 silicon non‐critical paths as the basis for comparison
ITC 2018 Tutorial ‐ML in Test ‐Wang 72
12,248 silicon non‐critical paths
158 siliconcritical paths
vs.
10/15/2018
37
Feature Creation
92 path features – Basic information: for example, whether the path is a half‐cycle
path, transition directions on N/PMOS, etc.– Timing statistics: delay information from STA timing report.– Usage of Vt devices: counts of various Vt devices used on the
path, i.e. high, low, regular, etc.– Cell type: features of the usage of various important cell types.– RC related: features capturing the load information on the path.– MUX related: special features focusing on the usage of some
MUX cells.– Location: describing location of the path.
16 features correlate to others
ITC 2018 Tutorial ‐ML in Test ‐Wang 73
Potential Features as Causes
Multiple rules may be reflecting the same cause– Rules #1,2,4,5 all involves CMAC feature– They actually reflect the same cause (later confirmed)
Finding: The methodology is practical – finding something missed by designer– If one is willing to pay the price for developing the features
ITC 2018 Tutorial ‐ML in Test ‐Wang 74
10/15/2018
38
The Need for Domain Expert
A domain expert won’t accept a solution if he/she can’t see the value, or don’t understand it– Interpretable and actionable model– Added value to their existing solutions already in place
Let the methodology start with an expert, by – Asking for a set of “reasonable” features– Collecting sufficient data for learning feature importance
But …– If the engineer knows what features are relevant, why even apply so‐called “Machine Learning”?
– If they don’t know, how much data is needed? – If collecting the data is hard, will it ever get done? – If it is too costly, what’s the added value?
ITC 2018 Tutorial ‐ML in Test ‐Wang 75
Modern ML is about learning the features
ITC 2018 Tutorial ‐ML in Test ‐Wang 76
10/15/2018
39
Applying ML in Design/Test (2003‐2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 77
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
In the 3rd stage, “Application Driven” – Which application has a better chance to succeed with a
ML solution?
One Application‐ Selective Burn‐In
ITC 2018 Tutorial ‐ML in Test ‐Wang 78
10/15/2018
40
Case Study 1
Product– Analog and sensor product
Fail characteristics– Fails were verified – we knew the root causes
Objective– Identify pre‐burn‐in outlier models to capture them
79
WaferTest
Final Test(H/C)
BurnIn
Final Test(H/A)
7 verified fails with FA reports
ITC 2018 Tutorial ‐ML in Test ‐Wang
Case Study 1 – 2‐test models
All 7 models were verified with test and design knowledge– The tests used in the model were testing the failing site
The study suggests existing pre‐burn‐in test “signatures” for burn‐in fails80
Fail #1 Fail #2
Fail #4 Fail #7
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
41
Case Study 1 ‐ Summary
To capture all 7 fails, only < 5% of the population require burn‐in (Is this a good news?)
81
Fail # Fail Site Outlier Model Kill %
1 Gate Oxide 2‐test model 1%
2 Polysilicon 2‐test model 0.44%
3 Gate Oxide Univeriate 1.28%
4 Polysilicon 2‐test model 0.01%
5 Gate Oxide Univariate 0.23%
6 Gate Oxide Univariate 0.48%
7 Gate Oxide 2‐test model 1.24%
Accumulated Kill % 4.54%
ITC 2018 Tutorial ‐ML in Test ‐Wang
Case Study 2
Product– Automotive microcontroller
Fail characteristics– Some fails had FA reports, but most did not
Objective– Identify outlier models to capture them before burn‐in
82
WaferTest
BurnIn
Final Test(H/A)
48 fails collected
Category Type # of fails
A 1 8
2 3
3 9
4 2
B 1 15
C 1 11
Total 48
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
42
Case Study 2
Models are not selected based on only outlying properties– The tests in use are testing the block containing the fails– The model is shared by most of the failsDifficulties
– Even the fails are given with the same type – not necessarily the case– Not sure if all 48 fails were true burn‐in fails– If we couldn’t find a good screen, what did we tell the product team – need supporting
evidence to show that the part was not screenable with the tests
83
Category Type # of fails
Model # of fails screened by the model
Kill % Kill % to include the next
fail
A 1 8 2‐test model X 6 7.3% 24.9%
2 3 2‐test model X 2 6.7% 16.8%
3 9 2‐test model X 5 7.8% 11.4%
4 2 2‐test model X 2 6.9% ‐
B 1 15 Univariate 8 4.1% 9%
C 1 11 Univariate 7 7.5% 27.9%
Total 48 3 models 30 fails
ITC 2018 Tutorial ‐ML in Test ‐Wang
Case Study 2
New tests became available later– Allow better models to be built on
84ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
43
Case Study 2 – Additional Models
As new tests became available, more models could be found– So this is not really an outlier modeling problem, isn’t it?
85
Same model from the previous slide projectmany other fails as outliers
Another model projects multiple fails as outliers
ITC 2018 Tutorial ‐ML in Test ‐Wang
Burn‐In Fail Prediction – Is It Possible?
Consider a target test T with many post‐burn‐in fails– Observe significant change of T measured value pre‐ and post‐ burn‐inOnly 2 fails were marginal in both pre‐ and post‐ burn‐in stageWithout the stress, how can one screen those fails?– We need alternative stress
86
Pre‐burn‐in test T measured value
Post‐burn‐in te
st T m
easured value
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
44
Four Key Considerations
In this picture, I did not mention a “learning algorithm” because it was not as deciding a factor than these four for realizing a practical methodology for an application
87
Low‐HangingFruits
DomainKnowledgeAvailability
DataAvailability
LearningResult
Utilization
Added Value toExisting Flow
ITC 2018 Tutorial ‐ML in Test ‐Wang
Applying ML in Design/Test (Since 2013)
ITC 2018 Tutorial ‐ML in Test ‐Wang 88
Pre‐silicon(Design Automation)e.g. Simulation data
Post‐silicon(Manufacturing Automation)
e.g. Test data
In‐field(System/in‐field test)e.g. Customer returns
Classification Regression Transformation Clustering Outlier Rule Learning
Supervised learning Unsupervised learning
Apply
Test cost reduction
Functionalverification
Layoutverification
Design‐silicon timing verification
Po‐Si ValidationYield Zero DPPM
FmaxSpeed test
In the 4th stage, it was all about the question of “Added Value?”
10/15/2018
45
Resolving A Production Yield Issue (2013)
An automotive SoC productYield fluctuated over timeThe issue could not be resolved for months after several design and test revisions, and several process tuning recipesITC 2014 paper: “Yield Optimization Using Advanced Statistical Correlation Methods”
89
Lots in time
Yield
(for illustration)Problem:
ITC 2018 Tutorial ‐ML in Test ‐Wang
Objective: Discover Process Adjustments
Task: Discover strong correlations between a test fallout and a process parameter
Desire: Adjustment of the process parameter leads to improved yield and reduced yield fluctuation
90
Yield
Density Distribution
estimated based on 2000+ wafers
The desired outcome with yield optimization(what we want)
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
46
Basic Question
The correlation can exist in many different ways– Statistical correlation or just an association– Univariate or multivariate relationship
With many variables, this essentially is a search problem
ITC 2018 Tutorial ‐ML in Test ‐Wang 91
Wafer to wafer
Average
# of fa
ilsSome variable (factor)
Wafer to waferCorrelate?
Initial Unsuccessful Search
The best results from searching a correlation between the number of fails and a process parameter are not enough to support a silicon design‐of‐experiment
ITC 2018 Tutorial ‐ML in Test ‐Wang 92
Bin 26
Corr=0.463
Test A
Corr=0.456Process parameter
# of fa
ils
Process parameter
10/15/2018
47
Need To Consider More Variables
The correlation may exist in certain regionAlso, process parameters are measured at different locations on a wafer – their values can be different
ITC 2018 Tutorial ‐ML in Test ‐Wang 93
Test A Test D
Site 1
Site 2
Site 3
Site 4
Site 5
Site Locations
Parameter PP1
Varia
nce Correlation = ‐0.75
Consider More Variables
The left shows a correlation found only by considering failing with a specific test valueThe right shows a correlation found only to the variance of a test distribution
ITC 2018 Tutorial ‐ML in Test ‐Wang 94
Parameter PP1
X 4category fails Correlation = ‐0.766
10/15/2018
48
Consider Temporal Effect
Temporal variations can mask a correlationBy unmasking the temporal effect, a high correlation is found
ITC 2018 Tutorial ‐ML in Test ‐Wang 95
# of X
1–
X3 type
s of fails
PP5 parameter
Corr= 0.75
Correlation without separation= 0.63
Corr= 0.86
Risk Evaluation – “No Correlation”
Before silicon experiment, we had to ensure a recommended process change does not affect other test fallouts– We don’t want to solve one issue and cause anotherOur method: ensuring statistical independence between a process change and each of the other test fallout– Implementation is complicated – see paper for detail
96
0
0.5
1
Bin 31
Risk
PP1 risk evaluation
PP1 Avg. Value
Mean test value Test
limits
Risk inspection and containmentITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
49
Silicon Result
Found several recommended process changes– 3 process changes based on 5 process parameters– 1 change applied to all experimental lots
Result: Significant yield improvement and reduction of the yield fluctuation
97
Before
Yield
Den
sity
Before ADJ #1 ADJ #2 Both
Yield
ITC 2018 Tutorial ‐ML in Test ‐Wang
Finally, we observed “added value!”
ITC 2018 Tutorial ‐ML in Test ‐Wang 98
10/15/2018
50
But not so fast …
In the yield example, we and the product team had access to the same set of ML tools
So, why we succeeded and they did not?
Because we had the knowledge enabling us to conduct a more effective analytic process to apply the ML tools
It was that piece of knowledge made the difference, not the tools in use
ITC 2018 Tutorial ‐ML in Test ‐Wang 99
The True Value
For deploy a solution, I can’t just package the ML tools and give it to the product team
I needed to package my “knowledge” – How am I going to do that?
ITC 2018 Tutorial ‐ML in Test ‐Wang 100
10/15/2018
51
Between 2014‐16, we were working on several more
production lines in the context of yield optimization
(Unpublished, all internal)
ITC 2018 Tutorial ‐ML in Test ‐Wang 101
We Need A System View To Apply ML (Since 2016)
So in short, why the system view?
Because we need domain knowledge
ITC 2018 Tutorial ‐ML in Test ‐Wang 102
InputPreparation
ResultEvaluation
ToolInvocation
Domain Expert
Need Automation of all three components
10/15/2018
52
We Need A System View To Apply ML
Why do we need domain knowledge?
Mainly, because we have limited data
ITC 2018 Tutorial ‐ML in Test ‐Wang 103
InputPreparation
ResultEvaluation
ToolInvocation
Expert
We Need A System View To Apply ML
What is so special about applying ML in view of “limited data?”
“Learning from Limited Data in VLSI CAD” – an upcoming book chapter ‐ preview at our lab web site: https://iea.ece.ucsb.edu/
Because there are theoretical assumptions made to achieve ML, and with limited data those assumptions would be hard to meet in practice→ we need domain knowledge to compensate ML
ITC 2018 Tutorial ‐ML in Test ‐Wang 104
InputPreparation
ResultEvaluation
ToolInvocation
Expert
10/15/2018
53
What Theoretical Assumptions for
Machine Learning?
ITC 2018 Tutorial ‐ML in Test ‐Wang 105
Classification, Machine Learning, Pattern Recognition
In machine learning, Perceptron is widely considered as one of the earliest examples to show that a machine can actually “learn”SVM is based on statistical learning theory that provides the necessary and sufficient conditions where a machine is guaranteed to “learn”
ITC 2018 Tutorial ‐ML in Test ‐Wang 106
Perceptron (1958 Rosenblatt – 2‐level neural network)
Back propagation (1975 Werbos – NN with hidden layer)
Kernel trick (1964 Aizerman et al.)
Support Vector Machine (1995 Vapnik et al.)
Gaussian Process for Regression (1996 Williams&Rasmussen)
Gaussian Process for Classification (1998 Williams&Barber)
SVM one‐class (1999 Scholkopf et al.)
Decision tree learning (1986 ID3)Rule learning (1989 CN2)
Rule learning (1993 C4.5)Random Forests (2001 Breiman)
Rule learning (2002 CN2‐Subgroup Discovery)
Decision tree learning (1984 CART)
Deep Learning (2012)
10/15/2018
54
A Popular Dataset For Machine Learning Research
One of the most popular datasets used in ML research was the USPS dataset for hand‐written postal code recognition– e.g. When SVM was introduced, it substantially outperformed others
based on this dataset
Question: What is the difference between this problem and yours?
ITC 2018 Tutorial ‐ML in Test ‐Wang 107
Source: Hastie, et al. “The Elements of Statistical Learning” 2nd edition 2008 (very good introduction book)
Binary Classification
There are subspaces that are easy to classify (all algorithms agree)One algorithm differs from another on how each partitions the subspace in the “grey area”– What’s the “best way” to define the “orange‐blue” boundary?
ITC 2018 Tutorial ‐ML in Test ‐Wang 108
Source: Hastie, et al. “The Elements of Statistical Learning” 2nd edit 2008 (very good introduction book)
Orange space
Blue space
Grey area
10/15/2018
55
Model Complexity
You can always find a model that perfectly classifies the two classes of training samples (middle picture – based on nearest neighbor strategy)– The model is usually complex
However, this may not be what you want– Because your model is highly biased by the training data
ITC 2018 Tutorial ‐ML in Test ‐Wang 109
Source: Hastie, et al. “The Elements of Statistical Learning” 2nd edition 2008 (very good introduction book)
Complex – rough edge Complex – fragmented SmoothNearest neighbor model
Model Complexity Vs. Prediction Error
In learning, an algorithm tries to explore this tradeoff to avoid over‐fittingThere are two fundamental approaches– Fixing a model complexity
• Find the best fit model to the train data• e.g. Neural Network, equation based models
– Fixing a training error (say almost 0)• Find the low‐complexity model (given ALL possible functional choices in a space)• e.g. SVM
ITC 2018 Tutorial ‐ML in Test ‐Wang 110
Pred
ictio
n error
Model Complexitylow high
Error on the validation samples
Error on the training samples
Over‐fitting
10/15/2018
56
Neural Network (Fixed Complexity)
ITC 2018 Tutorial ‐ML in Test ‐Wang 111
Source: Hastie, et al. “The Elements of Statistical Learning” 2nd edition
K classes
0112211111 bZbZbZbY MM ++++= K
M hidden variables
)( 0112211111 aXaXaXaZ PP ++++∂= K
)exp(11)(
xx
−+=∂
A neural network model complexity is fixed by fixing the number of Z variablesLearning is by finding the best‐fit values for the parameters– (M+1)K parameters– (P+1)M parameterse.g. Use the back propagation algorithm (1975 Werbos)
What Causes Over‐Fitting Anyway?
ITC 2018 Tutorial ‐ML in Test ‐Wang 112
10/15/2018
57
Five Assumptions for Supervised Learning
A restriction on H (otherwise, NFL)An assumption on D (i.e. not time‐varied, e.g silicon data)Assuming size m is in order O(poly(n)), n: # of featuresMaking sure a practical algorithm L existsAssuming a way to measure error, e.g. Err(f(x), h(x))
ITC 2018 Tutorial ‐ML in Test ‐Wang 113
SampleGenerator G
Hypothesis Space H
LearningAlgorithm L
Function y=f(x)
Hypothesis h
D
m samples
f
(1)
(2)
(3)
(4)(5)
(x,y)
In Practice, Issue #1
ITC 2018 Tutorial ‐ML in Test ‐Wang 114
SampleGenerator G
Hypothesis Space H
LearningAlgorithm L
Function y=f(x)
Hypothesis h
D
m samples
f
(1)
(2)
(3)
(4)(5)
(x,y)
Because we don’t know how complex H should be, we assume the most complex H we can afford in training
10/15/2018
58
In Practice, Issue #2
ITC 2018 Tutorial ‐ML in Test ‐Wang 115
SampleGenerator G
Hypothesis Space H
LearningAlgorithm L
Function y=f(x)
Hypothesis h
D
m samples
f
(1)
(2)
(3)
(4)(5)
(x,y)
For a complex H we need a large amount of data, but we usually don’t know if we have enough in advance
Occam Razor Assumption
Hypothesis space: e.g. all possible assignment of weight values in a neural network (can be infinite)
Occam Razor: Find the “simplest” hypothesis that fit the data– Hence, many machine learning algorithms solve a constrained
minimization problem
Conceptually, Occam Razor means making a “smooth” assumption in the hypothesis space
ITC 2018 Tutorial ‐ML in Test ‐Wang 116
Space of all hypotheses
Data is used to filter out
some hypotheses
For the remaining,find the “simplest”hypothesis as the
answer
10/15/2018
59
“Smooth” Assumption
ML would say the color of the area is blueNFL would say “No, it can be any color”
117
?
ITC 2018 Tutorial ‐ML in Test ‐Wang
Another Example of “Smooth” Assumption
Occam’s razor would give you a simpler model and conclude “It is red”NFL would say “can be either green or red”– See IEEE TCAD 36(6) 2017
“Experience of Data Analytics in EDA and Test”
118
000
?yx
z001
100
110
010
011101111
= ⇒ Green= ⇒ Red
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
60
In Practice, Issue #3
ITC 2018 Tutorial ‐ML in Test ‐Wang 119
SampleGenerator G
Hypothesis Space H
LearningAlgorithm L
Function y=f(x)
Hypothesis h
D
m samples
f
(1)
(2)
(3)
(4)(5)
(x,y)
Because non‐convex optimization is hard, some heuristic is used, and the solution is often a local minimum
In Practice, Issue #3
ITC 2018 Tutorial ‐ML in Test ‐Wang 120
SampleGenerator G
Hypothesis Space H
LearningAlgorithm L
Function y=f(x)
Hypothesis h
D
m samples
f
(1)
(2)
(3)
(4)(5)
(x,y)
NP‐Hard– The problem to find the desired model is NP‐Hard– This is proved by fixing the representation
Crypto‐Hard– The problem is as hard as breaking a crypto function– This is proved by not fixing the representation
10/15/2018
61
Learning Complexity Hierarchy
ITC 2018 Tutorial ‐ML in Test ‐Wang 121
No Free Lunch
All Possible Function Classes
Poly VC‐D
NP‐Hard Crypto‐Hard
Efficiently Learnable
K‐term DNF formulaPoly‐size circuit
Poly‐size Neural NetDepth‐2 circuit
Learnable
But, ML Is So Successful, Isn’t It?
Speech recognitionLanguage translationComputer visionAutonomous vehicle…
If so difficult, how could they be so successful?
ITC 2018 Tutorial ‐ML in Test ‐Wang 122
10/15/2018
62
The Common Approach To Succeed in ML
ITC 2018 Tutorial ‐ML in Test ‐Wang 123
No Free Lunch
All Possible Functions
Poly VC‐D
NP‐Hard Crypto‐Hard
Learnable
Assume the largest DeepNeural Network affordable
Demand huge datafor training!
Demand special HWfor speed!
Assume your target is here
Well‐Known Issue With A NN Model
Adversarial Examples – A slightly perturbed input that causes the model to misclassify
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus, “Intriguing properties of neural networks,” arXiv:1312.6199v4, 2013– Google scholar: >1K citations already
ITC 2018 Tutorial ‐ML in Test ‐Wang 124
10/15/2018
63
Adversarial Examples Are Easy To Find
MNIST Dataset, Source: Xiaowei Huang, Marta Kwiatkowska, Sen Wang, Min Wu “Safety Verification of Deep Neural Networks,” arXiv:1610.06940v3 , 5 May 2017 latest version
ITC 2018 Tutorial ‐ML in Test ‐Wang 125
Current Findings with Adversarial Examples
Finding adversarial examples is easy; but defending against adversarial examples is hard
Adversarial training can improve performance; and ensemble learning helps
Adversary might not be caused by “overfitting,” and more likely caused by out‐of‐domain inputs
But when can we say we “verify” a CNN model?
ITC 2018 Tutorial ‐ML in Test ‐Wang 126
10/15/2018
64
In Summary …
ITC 2018 Tutorial ‐ML in Test ‐Wang 127
In Summary, Four Barriers To Consider …
A result after considering those 4 barriers– Data barrier– Theoretical barrier– Computational barrier– Deployment barrier (over an existing solution)
The system is largely domain‐knowledge‐driven
ITC 2018 Tutorial ‐ML in Test ‐Wang 128
InputPreparation
ResultEvaluation
ToolInvocation
Expert
10/15/2018
65
The Yield Context
ITC 2018 Tutorial ‐ML in Test ‐Wang 129
InputPreparation
ResultEvaluation
ToolInvocation
Expert
2. Learning the process as how a yield expert applies
those ML tools(VTS 2017)
1. What ML tools are useful and required in yield engineering
(ITC 2014)
3. GAN‐based result recognizer (ITC 2018)
The AI System
The core of this AI system view is the autonomous execution of the workflow
ITC 2018 Tutorial ‐ML in Test ‐Wang 130
InputPreparation
ResultEvaluation
ToolInvocation
The AI SystemThe Speech Recognition and NLP Component
Integration ⇒ Autonomous Execution
10/15/2018
66
We Need A System View To Apply ML
ITC 2018 Tutorial ‐ML in Test ‐Wang 131
InputPreparation
ResultEvaluation
ToolInvocation
Workflow
Need Automation of all three components
IEA Autonomous System
ExecutableWorkflow
ConceptRecognizers
Data processing and analysis tools
Recall: The Analogy
Data Tools: Collect all relevant “data”Concept Recognition: What “data” meanWorkflow: What to do next
ITC 2018 Tutorial ‐ML in Test ‐Wang 132
Sensing
Perception
Reasoning& Control
IEA System
ExecutableWorkflow
ConceptRecognizers
Data processing and analysis tools
10/15/2018
67
“Intelligence” In IEA
The language interface is mostly used for queries of results after the IEA autonomous execution is completed
ITC 2018 Tutorial ‐ML in Test ‐Wang 133
NaturalLanguageInterface
Manifestation AutonomousSystem PersonManifestation
Level 0Level 1 Level ∞
IEA
Final Remarks‐ Recent Trends in ML
ITC 2018 Tutorial ‐ML in Test ‐Wang 134
10/15/2018
68
Noticeable ML Applications In Recent Years
ITC 2018 Tutorial ‐ML in Test ‐Wang 135
Self‐Driving Car Mobile Google Translation
Smart Robot AlphaGo (Google)
*These images are found in public domain
Deep Learning for Image Recognition
ImageNet: Large Scale Visual Recognition Challenge (http://www.image‐net.org/challenges/LSVRC/) – 1000 Object Classes, 1.4M Images
ITC 2018 Tutorial ‐ML in Test ‐Wang 136
28.20%25.80%
16.40%
11.70%
7.30% 6.70%3.57% 5.10%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
2010 2011 2012 2013 2014 2014 2015 Human8‐LayerAlexNet
8‐LayerZFNet
19‐LayerVGG
22‐LayerGoogleNet
152‐LayerResNet
2016 CUImage: 269 Layers
Top‐5 error rate
10/15/2018
69
Deep Learning for Image Recognition
ImageNet: Large Scale Visual Recognition Challenge (http://www.image‐net.org/challenges/LSVRC/) – 1000 Object Classes, 1.4M Images
ITC 2018 Tutorial ‐ML in Test ‐Wang 137
28.20%25.80%
16.40%
11.70%
7.30% 6.70%3.57% 5.10%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
2010 2011 2012 2013 2014 2014 2015 Human8‐LayerAlexNet
8‐LayerZFNet
19‐LayerVGG
22‐LayerGoogleNet
152‐LayerResNet
2016 CUImage: 269 Layers
Top‐5 error rate
1st Enabler: The availability of a large dataset to enable the study of deeper neural
network
Deep Learning for Image Recognition
ImageNet: Large Scale Visual Recognition Challenge (http://www.image‐net.org/challenges/LSVRC/) – 1000 Object Classes, 1.4M Images
ITC 2018 Tutorial ‐ML in Test ‐Wang 138
28.20%25.80%
16.40%
11.70%
7.30% 6.70%3.57% 5.10%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
2010 2011 2012 2013 2014 2014 2015 Human8‐LayerAlexNet
8‐LayerZFNet
19‐LayerVGG
22‐LayerGoogleNet
152‐LayerResNet
2016 CUImage: 269 Layers
Top‐5 error rate
2nd Enabler: The availability of efficient hardware to
enable training with such a large neural network
10/15/2018
70
NN for Unsupervised Learning
GANs is one of the hottest topics in 2017
Ian J. Goodfellow, Jean Pouget‐Abadie, Mehdi Mirza, Bing Xu, David Warde‐Farley, Sherjil Ozair, Aaron Courville, YoshuaBengio, “Generative Adversarial Networks,” arXiv:1406.2661v1, 2014
Introductory site: https://deeplearning4j.org/generative‐adversarial‐network
Take a look at the rich applications of GANs: https://github.com/hindupuravinash/the‐gan‐zoo
Tips and tricks to implement GANs: https://github.com/soumith/ganhacks
ITC 2018 Tutorial ‐ML in Test ‐Wang 139
A Recent DNN Study
Size and power efficiency are concerns
ITC 2018 Tutorial ‐ML in Test ‐Wang 140
Source: Alfredo Canziani, Adam Paszke, Eugenio Culurciello, “An Analysis Of Deep Neural Network Models For Practical Applications,” arXiv:1605.07678v4 [cs.CV] 14 Apr 2017
10/15/2018
71
Learning Hardware ‐ A Very Good Tutorial
Dr. Song Han, MIT (prior at Stanford University)– https://stanford.edu/~songhan/talks.html
https://www.youtube.com/watch?v=Q0b‐CkejWcc&feature=youtu.be
ITC 2018 Tutorial ‐ML in Test ‐Wang 141
Improvement on DL Models
Deep Learning models continue to “train more and get better! ”
142
Image Recognition Speech Recognition
‘12 AlexNet ‘15 ResNet ‘14 DeepSp1 ‘15 DeepSp2
1.4 GFLOP 22.6 GFLOP 80 GFLOP 465 GFLOP
∼16% err ∼3.5% err ∼8% err ∼5%err
16X improvement 10X improvement
Source: S. Han’s tutorialat FPGA 17
ITC 2018 Tutorial ‐ML in Test ‐Wang
10/15/2018
72
Three Issues With Increasing DNN Model Size
Training speed– RestNet152 (Microsoft) – 1.5 weeks– This turn‐around time limits model development
Energy efficiency– AlphaGo (Google) – 1920 CPUs + 280 GPUs– Huge electric bill for each run
Mobile application– Model downloaded to mobile device (size limitation)– Running the model needs speed and low‐power
In the future, it is expected that a brain‐scale system might integrate 1011 neurons and 1015 weight values
ITC 2018 Tutorial ‐ML in Test ‐Wang 143
Hardware for Training ‐ Examples
CPU– Intel Knights Landing (2016)
• 7 TFLOPS FP32, 16GB MCDRAM (400 GB/s), 14nm
GPU– Nvidia PASCAL GP100 (2016)
• 10 TFLOPS FP32, 16BG HBM (750 GB/s), 16nm– Nvidia Volta GV100 (2017)
• 15 TFLOPS FP32, 120 Tensor TFLOPS, 16BG HBM2 (900GB/s), 12nm, 21B Transistors
• Matrix performance improves by >9 times
TPU– Google Cloud TPU
• 180 TFLOPs in training• “TPU pod” comprising 64 TPUs for 11.5 PetaFLOPs performance
ITC 2018 Tutorial ‐ML in Test ‐Wang 144
10/15/2018
73
Hardware for Inference ‐ Examples
Heavily researched area– A‐eye (FPGA 2011), Diannao (2014), MIT Eyeriss (2016), Stanford EIE (2016), Google TPU, many start‐ups…
A common goal is to minimize memory access– To improve performance and energy efficiency
Most of the practical works are based on novel microarchitecture design– They do not necessarily pose new test problems at the component level
ITC 2018 Tutorial ‐ML in Test ‐Wang 145
Example: Google TPU Architecture
Source: “In‐Datacenter Performance Analysis of a Tensor Processing Unit TM” 2017 International Symposium on Computer Architecture
ITC 2018 Tutorial ‐ML in Test ‐Wang 146
Not necessarily pose fundamentally new test problems, except
there are more memory elements
distributed among the computation units
10/15/2018
74
New Devices Under R&D
Source: C. D. Schuman, et al. “A Survey of Neuromorphic Computing and Neural Networks in Hardware” May 19, 2017
ITC 2018 Tutorial ‐ML in Test ‐Wang 147
Why Memory Resistor (Memristor)?
It is believed that a von Neumann Architecture is reaching its performance limitsA Neuromorphic Architecture merges memory into the computation in a distributed way– Memristor is an ideal device to implement synapses
ITC 2018 Tutorial ‐ML in Test ‐Wang 148
PU
Memory
von Neumann Architecture
Bottleneck
Neuromorphic Computing
10/15/2018
75
Neurocomputing Hardware
Dmitri B. Strukov: Mixed‐Signal NanoelectronicNeurocomputing– Analog/mixed‐signal design can offer far better performance for INFERENCE
ITC 2018 Tutorial ‐ML in Test ‐Wang 149
Neurocomputing Hardware
Dmitri B. Strukov: High‐Performance Mixed‐Signal Neurocomputingwith Nanoscale Floating‐Gate Memory Cell Arrays, IEEE Trans. Neural Networks & Learning Systems, 2018 (early access)– The core component is a resistive crossbar with tunable conductance G,
essentially an analog NVM, for vector‐by‐matrix multiplication
ITC 2018 Tutorial ‐ML in Test ‐Wang 150
Test: what’s the requirement for testing?Validation: What’s the impact of imperfect hardware to model inference?
10/15/2018
76
Summary
ITC 2018 Tutorial ‐ML in Test ‐Wang 151
Summary
Machine Learning for Test– Many works exist (but how many are practical?)– For applications with limited data, need an AI system to apply machine learning
Test for Machine Learning– Driven by self‐driving car and deep‐learning hardware– Functional safety will remain a major concern in the foreseeable future
ITC 2018 Tutorial ‐ML in Test ‐Wang 152
10/15/2018
77
THANK YOU!
Questions?
ITC 2018 Tutorial ‐ML in Test ‐Wang 153
Recommended