SPC: A Practitioner’s Perspective - Minnesota Section ASQ · 2016-03-08 · Traditional attribute charts (p, np, c and u) work well for non-conformity or defect monitoring, when

SPC: A Practitioner’s

Perspective

Tim Conway

[email protected]

8 March, 2016

SPC: A Practitioner’s Perspective 2Tim Conway

Agenda

Introduction (High Volume Semiconductor Manufacturing)

SPC Charts

SPC Control Limits, Simplified

Control Chart Risks

SPC Performance Metrics

Sample Size

Key Take-Aways


Production Line (Old Style)

Lucy and Ethel wrap chocolate

© CBS


Production Line

From this.

To this.


Production Line

To this.

� Numerous Steps

� Definition of “unit” can change

� Batching


Production Line

Semiconductor Production Line

� Batching, Multiple Tools, Multiple Substeps in a Step, Nested

Variance Structure


Production Line (Metal Deposition)

Multiple Processing Chambers

� Example: aluminum deposition

• Collimated Titanium (CoTi)

• Aluminum (HAL)

• Titanium-Tungsten (TiW)

• Degas, Etch, Cool

• Multiple times for each wafer

SPC: A Practitioner’s Perspective8Tim Conway

SPC Charts


Control Chart

SPC was founded by Walter Shewhart on three main

concepts

� Use of simple, graphical, time-series presentations of the data

� The notion of chance causes versus assignable cause variation

� Process output averages are normally distributed even though the

individual measurements are not

Walter Andrew Shewhart (1891-1967) was an American physicist, engineer,

and statistician. The “father of statistical quality control,” in May of 1924 he

described the first control chart, which launched statistical process control and

quality improvement.


Types Of Variation

SPC is a Signal-to-Noise Analysis System

Common Cause Variation

� “Noise”

� Variation that is consistent, natural, predictable and random

� We hope this is how our process normally behaves

Assignable Cause Variation, or Special Cause Variation

� “Signal”

� Variation that is unusual or not typical of the process

� Usually we can assign a root cause to the variation

� Typically easier to remove than common cause variation


Rational Subgroups

Classical SPC uses “Rational Subgroups” to improve the

signal-to-noise performance

� Select samples such that the variation within the sample is small

(i.e., homogeneous sampling) while the variation between

samples is large when assignable cause is present

� Then use the within-sample variation to estimate the between-

sample variation, assuming only common-cause variation is

present

Potential Issue

� What if the points in the sample are highly correlated?


Control Chart Usage

General

� Monitor stability of one or multiple components of variation

(between-sample, within-sample, etc.)

� Estimate capability and predict future performance of the system

� Quickly detect and fix problems

� Identify opportunities for improvement

Variable Data (measured on continuous scale)

� Monitor process, product output parameters (dim, thickness)

� Monitor input, state variables (temps, pressures, flow rates)

� Assess matching (tool-to-tool, lot-to-lot, wafer-to-wafer)

Attribute Data (counts of defects , defective units or events)

� Monitor tool or process state variables (particles, events, etc.)

� Monitor product performance (yields, failure counts, etc.)


Control Charts for Variables Data

Variables Control Charts

� There are many variations of control charts. Typical variables

charts include:

• Xbar/R, Xbar/S, Xbar/%S. The Xbar chart plots the sample average

and monitors the between-sample variation. The R, S or %S chart

plots the dispersion of the sample and thus monitors the within-

sample variation.

• X/MR (Individual / Moving Range). Used when sample size is one.

• Delta-to-Target, Z, Zbar/E. Also known as “short-run” charts, these

plot standardized or normalized data that allow multiple process

streams to be placed on the same chart even if the streams have

different targets.

• EWMA, CUSUM. Specialized charts that are sensitive to small drifts

or shifts.

• Multivariate (Hotelling T2). Combine multiple variables on one

chart.


Control Charts for Attribute Data

Attribute Control Charts

� Typical charts for attribute data include:

• Proportion Defective (p). Monitors proportion or percent

nonconforming units in a group of units. Example is a yield chart for

die per wafer or die per lot.

• Number Defective (np). Monitors the count of nonconforming units in

a group.

• Number of Defects (c). Monitors defect or particle counts in a group

of units.

• Defects per Unit (u). Monitors defect or particle counts per unit.

� “p” and “np” charts assume a Binomial distribution

� “c” and “u” charts assume a Poisson distribution

� Normal distribution can approximate Binomial if np and n(1-p) > 5

� Normal distribution can approximate Poisson if λ ≥ 15


Control Chart Decision Tree


SPC Charts

Classical Model for Variables Control Charts

Where:

� Errors ε are normally distributed with mean = 0 and variance = σ2

� Process data is assumed to be stationary and uncorrelated

Implications

� Two components of variation (between-sample, within-sample)

� Rational subgrouping to

• Minimize chance of differences within subgroups

• Maximize chance of differences between subgroups, if assignable

causes are present

�� = � + ��


SPC Charts

Nested Model for Variables Control Charts (Semiconductor)

Where:

� Errors ε are normally distributed with mean = 0 and variance = σ2

� But the errors are centered on the mean of the next higher level of

nesting

� Other components exist (device, tool, chamber, etc.)

Implications

� Multiple components of variation (lot-lot, wafer-to-wafer, site-to-

site, etc.)

� Still want to minimize chance of differences within samples while

maximizing chance of differences between samples, if assignable

causes are present

�� = � +�� +�� +��


Recommendations

Recommendations (Variables Data)

� Characterize / understand the major components of variation

� Monitor the major components (use > 2 charts if needed)

• Don’t be limited by traditional two-chart software systems

• Innovate on visual display of variation (e.g., box plots on trend chart)

� Or have quick method to visually break out the components

• Variability (multi-vari) charts

• Probability plots


Components of Variance

Multi-vari Example


Components of Variance

Probability Plot Example


SPC Control Limits, Simplified


SPC Control Limits

General Form of the Control Limit Equations (Traditional)

Where:

� "mean" is the average of the plotted points

� "StdErr of the mean" is the estimate of the common-cause sigma

of the plotted points

� Typically, ��/2 is set to 3 to provide a false alarm rate of 1/370 for

normally distributed data. (Why 3? Because it works well.)

In layman’s terms

� When the plotted points exhibit random, common-cause variation

mean ±��/�*StdErr of the mean

Avg of Plotted Points ± 3*Sigma of Plotted Points


SPC Control Limits

Traditional Charts for Variable Data

� Note the general form


SPC Control Limits

Traditional Charts for Attribute Data


SPC Control Limit Factors


SPC Control Limits

Issue: Within-Sample Correlated Data

� For Xbar charts, using the �� or �� ⁄ methods result in overly

tight control limits if the within-sample data is correlated.

Alternately, plot summarized data (e.g., sample averages) on an

individuals "X" chart and calculate the sigma estimate "#$� using:

1. Median MR (Clifford's robust sigma estimate, recommended).

2. Average MR. Risk that flyers may inflate the control limits.

3. IQR (InterQuartile Range). Sets control limits to outlier box plot

whisker ends

4. Percentile Method. Set UCL to the P99 or P99.865 for example, if data

highly skewed.

5. The standard deviation of the plotted sample averages (Levey-

Jennings). Risk that assignable-cause variation may inflate the

control limits.


SPC Control Limits (Non-Traditional)

Median Moving Range (recommended)

� Also known as Clifford’s Robust Sigma Estimate

� Robust to outliers and to mean shifts

� Treats the summary statistic (e.g., sample average) as individuals

data; uses X/MR chart and estimates the process sigma using

median moving range

%&'()**+,- =.

-/∗ 123

%&'()**+,- = 1.05 ∗ 189:;< =) − =)?.

Where MR3 = MedianMovingAbsoluteRange



Average Moving Range

� Sensitive to non-normality

%&PQ =.

-�∗ MR

%&PQ = 0.866 ∗ TU8V;W8 =) − =)?. , for MR of span 2

Where MR = AverageMovingAbsoluteRange



InterQuartile Range (IQR)

� Control limits correspond to whiskers on outlier box plot

Percentile Limits

� P99.865 & P0.135 correspond to 3.0 sigma limits for normal

distribution

YZ[\]Q = ^_` + 1.5 ∗ a^2[Z[\]Q= ^�` − 1.5 ∗ a^2

Where a^2 = ^_` − ^�`



Levey-Jennings

� Sigma estimate is the standard deviation of the plotted points

� Caution: assignable-cause variation inflates the control limits

� If used, check for and remove assignable-cause variation

Eyeball method

� Count out two “sigma's” from the center to the edge of the data,

then count out one more “sigma” and put the control limit there

� Assumes normally distributed, common-cause data (should check

assumptions)

� Based on idea that roughly 95% of the data (19 of 20 points) is

contained within +/- two sigma's of the average


SPC Control Limits (Variables Data)


SPC Control Limits

Recommendations (Variables Data)

� Plot sample statistics that are sensitive to assignable causes

• X�, R, S

• X� takes advantage of central limit theorem � improved normality

� Base control limits on outlier-resistant methods

• Median Moving Range (recommended)

� Individuals (X) charts have poor ability to detect small shifts

• Example: Average Run Length (ARL) = 44 to detect 1-sigma shift

• For improved shift detection, use smoothed data (EWMA:

Exponentially Weighted Moving Average)


SPC Control Limits

Recommendations (Attribute Data)

� Use variables data if possible (much smaller sample sizes)

� If data approximately normal then can use variables charts

� Try transforming skewed data (SQRT, LOG) to approximate

normal

� Traditional attribute charts (p, np, c and u) work well for non-

conformity or defect monitoring, when chart not dominated by

zero-valued data

� But for rare non-conformities or defects, consider “time between

failure.” Transform time between failure using y= d./e.f

transformation to make data approximately normal.


Control Chart Risks


Control Chart Have Risks

Caution

� Control charts have decision risks

• Calling a process out-of-control when the process has not changed

• Calling a process in control when the process has shifted

� Control charts have sampling risk

• A shift may not be detected for a number of subsequent samples

� Mixing of data from different sources can hide signals

• If multiple process streams are placed on the same chart then stream-

specific signals may be hidden.


Decision Risk

Example: SPC Chart with mean shift

� Null hypothesis (H0): process is stable, µ = µ0

� α = false alarm rate; out-of-control point when process is stable

� β is the risk of not detecting a given shift in the mean

� 1- β is the power of detecting the given shift in the mean


Sampling Risk: Average Run Length (ARL)

“Power” is the probability of detecting a given shift on the

next sample

� Power = 1- β

Average Run Length (ARL) can be used two ways

� ARL = 1/α gives the false alarm rate

� ARL = 1/(1- β) is the number of samples on average that it will

take to detect a given shift



False Alarm Average Run Length (ARL), Normal Dist



Shift Detection Average Run Length (ARL), Normal Dist

Table is for n=1, larger samples improve the ARL

� Example: n=4, 3-sigma limit, 1-sigma shift, ARL 44 � 4


SPC Performance Metrics


Detect potential problems

� Want to monitor effectively (measure the things that are critical)

� Want to monitor efficiently (don’t over-sample)

� Want to detect issues quickly (react to real signals)

� Do not want to intervene unless the process tells us to (don’t react

to noise)

Reduce variation

� Ensure that our processes are stable (predictable)

� Ensure that our processes are capable (meet the spec)

� Ensure that our processes are targeted (loss is minimized)

The Golden Rule of SPCIntervene in a timely and efficient manner

Why Do SPC?

Lo

ss

Spec Limit Spec Limit

Target


Performance Metrics: RV1

� RV1 monitors stability

• # of WECO Rule 1 violations in # of SPC points or timespan

• This is the headache metric (is the process or equipment creating

excessive headaches)


Performance Metrics: Cpk, Ppk, Cp, Pp

� Cpk monitors short-term capability, where only common-cause

variation is present. Cpk is the minimum of CPU (upper spec limit

applies) and CPL (lower spec limit applies).

� Ppk monitors long-term capability and thus includes both

common-cause and assignable-cause variation. Ppk is more

representative of the quality level of the process.


Performance Metrics: Z, Z’

� Z and Z’ monitor targeting

• Z and Z’ are the delta from target, in sigma units

• Typical requirement is |Z| < 1.0

• Potential issue: metric is penalized by goodness (small sigma)

� Alternate metric: the “k” in Cpk

• k is the deviation of the average from the center, as a proportion of

the half-spec window

g =$h − i��j�

�$�g′ =

$h − i��j�

"#lm� ��

k=$h?(o��p��)/�

(o��?��)/�(CPU case)

k=(o��p��)/�?$h

(o��?��)/�(CPL case)


Sample Size


Sample Size: Shrinkage Factor

Using the Central Limit Theorem

� Variance of the means equals

variance of individuals divided by

sample size

Solve the equation for “n”

� “n” is the shrinkage factor

Use of the shrinkage factor

� Quick assessment of correlated

within-sample data

� If shrinkage factor is less than the

sample size then some data values

are correlated and the sample size

maybe can be reduced

n

22

X

Xσ

σ =

2

X

2

σ

σX

n =


Sample Size: Shrinkage Factor

Conclusion:

Use n=3 as sample

size

What other

considerations should

be taken into account?

Sample

Sample Site 1 Site 2 Site 3 Site 4 Site 5 Averages

1 106 158 82 22 122 98.0

2 70 102 72 34 106 76.8

3 20 134 66 78 68 73.2

4 68 8 20 156 64 63.2

5 54 72 74 50 98 69.6

6 24 50 36 48 52 42.0

7 10 76 62 30 32 42.0

8 76 16 26 44 60 44.4

9 68 100 90 16 88 72.4

10 54 76 66 54 62 62.4

11 20 134 80 4 40 55.6

12 56 16 6 90 60 45.6

13 80 148 100 34 112 94.8

14 48 30 86 76 56 59.2

15 12 98 42 42 26 44.0

16 70 106 42 210 120 109.6

17 70 134 74 28 102 81.6

18 44 10 60 66 44 44.8

19 12 68 22 50 16 33.6

20 88 120 86 172 146 122.4

Variance (individuals) 1668

Variance (of sample averages) 610

Shrinkage factor 2.7

Measurement

2

X

2

Factor Shrinkageσ

σX=


Sample Size: Clustering Analysis

What is Clustering Analysis?

� Assesses the within-sample data to determine if there is significant

correlation among the data points.

• One application is to determine if the sites on the wafer can be clustered.

� Correlation implies that the points are not independent of each other.

• The correlated points can be grouped into clusters.

• Each cluster contains values that are “not much different” from each other.

• Picking one value from each cluster thus efficiently represents the cluster.


Key Take-Aways


Key Take-Aways

� Quality decreases as variability increases. Set up the SPC

system to efficiently and quickly detect and attack variation.

� SPC is a signal-to-noise system. Assignable-cause variation is

the signal and common-cause variation is the noise.

� Don’t respond to the noise (with control actions).

� Proper control limits balance the risks of false alarms vs risks of

missing an assignable cause. Use sample averages to induce

normality into your data and reduce the risks.

� Using within-sample variation to estimate between-sample

variation (classical approach) is problematic with highly-correlated

within-sample data, such as is seen in semiconductor processing.

� Use of robust-estimators of the process sigma helps mitigate the

above problem.

� But if your control limits look reasonable when the data looks

random, then that is likely OK from an engineering standpoint.


Key Take-Aways (cont.)

� There are many types of charts. Avoid the confusion by keeping

in mind the “my control limits look reasonable” concept.

� Don’t just SPC the outputs; monitor, control and improve the

inputs.

� Determine and monitor the major components of variation.

� Graphical displays of the components of variation are hugely

beneficial.

� If you put spec limits on your chart, make sure they use the same

basis as the data (raw spec for individual data, average spec for

average data).

� Keep aware of the Average Run Length (ARL) concept. Avoid too

many rules as that will increase the false-alarm rate.

� Out-of-Control Action Plans (OCAPs) are huge! You need

methods to troubleshoot and resolve OOC points.


Key Take-Aways (cont.)

� Stable, capable and targeted is always a good thing. Monitor the

performance of the SPC system.

� SPC performance metrics should consider capability (Cpk, Ppk),

targeting (Z) and headaches caused by OOC points (RV).

� Make sure to check your measurement system. A noisy

measurement makes process improvement much, much more

difficult.


References

� Montgomery, D. C., (2009), Introduction to Statistical Quality

Control, 6th ed., Wiley, New York

� Clifford, P. C., (1959), “Control Charts Without Calculations,”

Industrial Quality Control, Vol. 15(11), pp. 40-44

Documents

SPC: A Practitioner’s Perspective - Minnesota Section ASQ · 2016-03-08 · Traditional attribute charts (p, np, c and u) work well for non-conformity or defect monitoring, when