6 Sigma Definitions

8/7/2019 6 Sigma Definitions

http://slidepdf.com/reader/full/6-sigma-definitions 1/18

-A-

Alpha Level: In significance testing, the alpha level (sometimes called the

significance level) is a probability that specifies how extraordinary the observed

result must be for us to reject the null hypothesis. Typical alpha levels are 0.10,

0.05, and 0.01, but any value may be used provided it is specified before the data

are examined. In practice, if we observe a statistic whose p-value computed basedon the null hypothesis is less than the pre-determined alpha-level, then we reject the

null hypothesis. Because the alpha level defines a "rare event," and rare events do

happen sometimes, the alpha level determines the probability of a Type I error.

Alpha Risk: The chance or probability of making a Type I Error. This probability is

always greater than zero. The researcher establishes a maximum acceptable risk,

usually 5%, of deciding to reject Ho when it is actually true.

Alternative Hypothesis (Ha): Statement of change, difference, or a relationship.

This statement is considered true if Ho is rejected.

ANOVA: Anova stands for ANalysis Of Variance. It compares the means of a

quantitative variable for different levels of one or more categorical variables. The null

hypothesis tested by an ANOVA is that the means are equal for the different levels.

The technique partitions the total variance into components:

- Between treatment levels (factors or X)

- Within Treatment (Error)

Attribute Data: Quality data that typically reflects the number of conforming or

non-conforming units on go/ no go, pass/ fail basis.

Attribute Gage R&R: A methodology to assess attribute measurement systems.

The gage R&R will qualify measurement error due to repeatability, reproducibility,

and calibration.

Average: The average of a collection of values is the sum of all the values divided

by the number of values, otherwise called the mean. The term "average" is

sometimes taken to mean any summary of the center of a distribution, but this

usage is not common in statistics.

Axis: In graphing on the Cartesian plane (as in common x-y plots), the vertical and

horizontal lines that give the coordinates are called axes. By convention, the y-axis

runs up and down, and the x-axis runs side-to-side. Displays such as dotplots and

boxplots have a single axis; usually the vertical y-axis, while displays like histograms

use only the horizontal x-axis.



-B-

Balanced Design: A design in which the same number of runs are performed at

each treatment combination.

Bar Chart: A bar chart displays a batch of numbers with side-by-side bars, usuallyspaced evenly along a horizontal base. Each bar corresponds to a category of data.

The height of each bar is proportional to the number of cases in its category. Bar

charts differ from histograms because they graph categorical variable values. One

easy way to spot the difference is that the bars in a bar chart should have some

space between them, while the bars in a histogram should touch each other.

Beta Risk: The chance or probability of making a Type II Error. This probability is

always greater than zero. The researcher establishes a maximum acceptable risk,

usually 10 %, of overlooking a source of variation.

Bias: A bias in sampling is when observations favor some individuals in the

population over others. Sampling bias is difficult to identify, and probability theory

cannot be used to offset these sampling errors.

Bimodal: A distribution shape is bimodal if it has two modes. Sometimes it is not

clear whether a bump on a histogram is a second mode or just a local bump. One

way to decide is to rescale the histogram. If the bump is present at many different

scales, then it can reasonably be called a second mode. If not, then it may be just an

accident of the plot scale.

Binomial: Literally "two names". A binomial setting is one in which observationstake on one of only two possible values. By convention, these are often called

"success" and "failure" although succeeding or failing need not be involved. Also by

convention, the probability of a "success" is usually denoted p. The probability of a

"failure" must then be 1 - p.

Blocking: An experimental technique used to account for the variability due to

nuisance or noise variables. An example would include using blocking to address

multiple batches of raw materials in which parts from a single batch are likely to be

more uniform than parts from different batches. Material batch would be blocking

variable.

Box Plot: A boxplot displays the 5-number summary of a variable. Boxplots graph

against a single, usually vertical, axis. Boxplots show a central box running from the

first to the third quartiles, marked by a line at the median. They then extend

whiskers either to the maximum and minimum data values, or to the most extreme

values within 1.5 IQR's of each quartile. If any data values are outside these limits,

they are displayed individually because they may be outliers.



-C-

Capability Analysis : A statistical measure of inherent variation for a given event in

a stable process.

Categorical: A categorical variable records into which of several categories or levelsan individual falls. Categorical variables need not be numeric; they can simply name

the category. Contrast with quantitative variables.

Causation: Variable A can be said to cause variable B if:

1. Whenever A changes, there is a corresponding change in B

2. B could not cause A, and

3. There is no other variable, C, that could cause both A and B.

Scientists usually require that in addition, there be a plausible mechanism connecting

A to B. Causation is best demonstrated with a designed experiment. In particular,

the existence of a correlation or association between two variables does not

demonstrate a causal relationship between them.

C Chart: Number of defects. Must have equal subgroup size.

Center: The center of a distribution is a general concept referring to the middle of

the values in the distribution. Common measures of the center of a distribution are

the mean, the median, and the mid-quartile. Some texts present the mode as a

measure of center, but it in fact measures something else entirely.

Central Limit Theorem: The central limit theorem (CLT) states that the sampling

distribution of the sample mean approaches the normal distribution as the samplesize, n, increases regardless of the distribution of the population. It is common to

include with the CLT the related facts that this sampling distribution has a mean

equal to the population mean and a standard deviation equal to the population

standard deviation divided by the square root of n.

Central Tendency: A term used in some introductory texts where center is

intended. There is no "tendency" involved in the center of a distribution. It is true,

however, that averages tend toward the population mean as the sample size grows,

as stated by the law of large numbers. This may be the original source of confusion.

Chi-square: The chi-square statistic for testing independence in a two-way table is

found by summing the squares of the standardized residuals. It follows a chi-square

distribution on (#Rows-1)*(#Cols-1) degrees of freedom. The chi-square distribution

is skewed to the right, and the test for independence is a one-sided test in which

larger values of the chi-square statistic correspond to smaller p-values.



Confidence: Confidence is a declaration of probability for some claim. Most often,

"confidence" refers to the probability that a confidence interval includes the intended

population parameter value. Confidence probabilities are usually found by referring

to the sampling distribution of the statistic being used to estimate the population

parameter.

Confidence Bounds: The confidence bounds are the lower and upper limits of a

confidence interval.

Confidence Interval: A level C confidence interval for a population parameter is an

interval of values usually of the form statistic ± margin of error found from data in

such a way that it has probability C of including the parameter value in the interval.

The data must be representative of the population either because of appropriate

sample design or by resulting from a well-designed randomized experiment.

Confidence Level: In constructing a confidence interval the confidence level is the

probability that an interval constructed in this manner (that is, based on a sample of this size drawn from this population, for example) will cover the true population

parameter.

Confounding: One or more effects that cannot unambiguously be attributed to a

single factor or factor interaction.

Common Cause Variation: Variation that occurs naturally and is inherent and

expected in a stable process. Such variation can be attributed to “chance” or random

causes.

Continuous: Continuous values are numbers that can fall anywhere in a range of

values. Contrast with discrete values.

Continuous Random Variable: A continuous random variable is a random variable

that can take on any numeric value within a range of values. The range may be

infinite or bounded at either or at both ends. The relationship between the intervals

within the range and probabilities is given by a density curve. Contrast with discrete

random variables.

Control Chart: Control charts graph statistics measured on a process to track trendsand detect changes. Extraordinary values or patterns in a control chart often indicate

a problem with the process. Control charts usually include control lines to help

identify processes that are out of control.



Control Limits: Statistically based limits to detect a shift in the process. Control

limits exist for the critical statistic of interest (e.g. mean, variance, etc.)

Correlation: A statistical tool used to quantify the strength of the linear relationship

between a continuous input and a continuous output. Correlation is a number

between -1 and +1 that measures the degree of linear association between twovariables. In its most useful form, its square (commonly called r-squared) gives the

fraction of the variability of one variable accounted for by a least squares regression

on the other variable.

-D-

Data: Data are values or names that record information about individuals in an

orderly fashion, together with a context. Most data record the same information

about many individuals.

Defective: Units deemed to be defective contain one or more defects

Defects: failures associated with a single unit. A defective unit can contain several

defects.

Degrees of freedom: Degrees of freedom indicate the amount of information

provided by the data in a way that often parameterizes a sampling distribution. In

this course, we can usually refer to the "Rule of Thumb" that degrees of freedom are

equal to the sample size, n, minus the number of parameters that have been

estimated. However, this "rule" is violated by the two-sample t procedure, for which

a special approximation provides better results, and the chi-square statistic for two-

way tables, which finds degrees of freedom from the table dimensions rather thanthe sample size.

Density Curve: A density curve is a curve that is never negative (that is, it is

always above the horizontal axis) and that has an area of exactly 1.0 under the

curve. Density curves relate ranges of data values to the relative frequency of the

values in those ranges. Density curves provide a mathematical model to describe

the shapes shown by histograms.

Dependent variable: The dependent variable is also referred to as the response

variable.

Design of Experiment (DOE): Adds structure to the experiment to ensure all

experimental objectives are achieved and the experimental assumptions are

satisfied. Design of Experiment is often referred to as DOE’s or factorial experiments.



Discrete: Discrete values are named individually or fall into named "bins". Contrast

them with continuous values.

Distribution: The distribution of a variable gives the values the variable can take on

and tells how frequently it takes on each of these values.

Dotplot: A display of a single variable in which points are arranged along a (usually)

vertical axis. Dotplots show the spread of the data, any clustering or local modes,

and any outliers. Some dotplots spread out overlapping points; others do not.

Dotplots are related to boxplots.

-E-

Experiment: A test or series of tests in which X’s are purposefully varied and impact

on Y quantified.

Experimental run / Treatment Combinations: a single combination of factor

levels that yields one observation of the output variable. If temperature is being

evaluated at 10 and 20 degrees and pressure at 50 and 100 p.s.i. an example of an

experimental run would be 20 degrees and 50 p.s.i.

-F-

F distribution: The F distribution is one of the common sampling distributions used

in statistics. Formally, a random variable with an F distribution can be found as the

ratio of two independent random variables with chi-square distributions.

F-statistic: A test statistic that follows the F distribution. Typically F-statistics arise

as the ratio of two values, each following the chi-square distribution. Common tests

based on F-statistics test whether the numerator and denominator of the ratio are

equal.

Factors: Experimental factors are the X’s or inputs being studied.

Factor Level: The Factor levels are the settings of the X’s being evaluated. For

example if the factor temperature is being studied at 10 and 20 degrees, 10 and 20are factor levels. If a qualitative factor is being evaluated such as shift, shift one, two

and three are the factor levels.

FMEA: A structured approach to

- Predict failures and prevent their occurrence in manufacturing and other

functional areas which generate defects



- Identify the way in which a process can fail to meet critical customer

requirements (Y).

- Estimate the severity, detection and occurrence of defects.

- Estimate the current control plan for preventing these failures from

occurring

- Prioritize the actions that should be taken to improve the process.

First Time Yield (FTY): Number of defective units divided by the total number of

units sampled.

-G-

Gauge R&R Study : A study to ensure that our measurement systems are

statistically sound.

Gage accuracy or Bias: Accuracy or bias is the difference between the observedaverage of the measurement and reference value (Metrology lab reference).

Gage Stability: Gage stability refers to the difference in the average of at least two

sets of measurements obtained with same gage on the same parts taken at different

times.

Gage Linearity: Gage linearity is the difference in the accuracy values throughout

the measurement range in which gage is intended to be used.

Gage Repeatability: Gage repeatability is the variation in measurements obtained

with one measurement instrument used several times by one appraiser while

measuring the identical characteristic on the same part.

Gage Reproducibility: Gage Reproducibility is the variation in the average of the

measurements made by different appraisers using the same measuring instrument

when measuring the identical characteristic on the same part. Reproducibility can

also include the evaluation of several pieces of equipment using the same operator

on the same part.

-H-

Histogram:

A histogram displays a batch of numbers with adjacent, side-by-side bars. Each bar

corresponds to a range of data values. The size of each bar is proportional to the

number of data values falling in its range. Histograms differ from bar charts because

they graph quantitative variable values.



Hypothesis: A theory about nature of relationships between input and output

variables.

-I-

Independent variable: Independent variables are also referred to as predictorvariables and explanatory variables. In ANOVA, they are common called factors.

These may be better terms because they are less likely to be confused with the

probability-based concept of independence.

Inference: To infer is to reach a conclusion by reasoning from something known or

assumed. In statistics, inference usually refers to formal procedures for combining

assumptions about unknown population characteristics with observed facts about

data to reach a conclusion.

Interaction: An interaction exists when the main effect of a factor depends on the

level of another factor. For example, if a 10 unit change exists change exists when

pressure is varied from 50 to 100 psi at a temperature of 10 degrees , but a 1 unit

change occurs at a temperature 20 degrees when the pressure is varied , an

interaction has occurred between temperature and pressure.

Interquartile Range: Difference between the 75th percentile and the 25th percentile.

-K-

Kurtosis: The kurtosis measures the "peakedness" of a histogram or density curve.

It is commonly adjusted so that the kurtosis of the normal density is zero. Flatterdensities such as the uniform density have a kurtosis less than zero. More peaked

densities, such as those of the t distribution with 1 or 2 degrees of freedom have a

kurtosis greater than one. You can think of the kurtosis as measuring how much the

spread changes as we move from the middle of the density to the tails. Densities

with large kurtosis give the impression of ever greater spread as we move to the

tails.

-L-

Linear regression: Linear regression is the most common form of regression. It

predicts or describes a response variable in terms of a linear equation with one ormore predictor variables. It therefore has an equation of the form.



-M-

Main Effect: The change in average response (Y) observed during a change in the

factor (X) from one level to another.

Mean: Most common statistic used to describe the center of the data. The mean isalso commonly referred to as the average. The mean is a parameter describing a

density curve. Intuitively, it describes the middle or center of the distribution.

Median: The median is the middle value of a collection of values. An equal number

of values in the collection are greater than and less than the median.

Mode: Observation that occurs most often in data set. Mostly used for skewed

distributions.

Mistake Proofing: Process improvements efforts often falter during implementation

of new operating methods learned in the improvement phase. Without a overall

strategy and workable control tactics to guarantee permanency, sustainable

improvements can not be achieved. Mistake Proofing seeks to gain permanency by

eliminating or rigidly controlling human intervention in a process. Mistake Proofing

uses wisdom and ingenuity to create devices that eliminate defect.

Multi-Vari: A graphical tool which, through logical sub-grouping, analyzes the

effects of discrete X’s on Y’s. The X’s must be discrete or made to be discrete in

order to use the multi-vary technique.

Multiple Linear Regression: Regression model containing two or more predictors

variables predict or describe a single response variable.

-N-

NP Chart: Number of defective units. Must have equal subgroup size.

Normal Curve: The normal curve is a bell-shaped, unimodal, symmetric shape

depicting the density curve of the normal distribution. Normal distributions are

characterized by their mean and standard deviation.

Normal Distribution : The normal distribution is the most common distribution

shape in statistics. It serves as a standard of comparison for other distribution

shapes. Each combination of mean and standard deviation generates a unique

normal curve.



Normality: The property of having a normal distribution. Normality is an assumption

required for many statistics tests. It is often best checked with a normal probability

plot

Null Hypothesis (Ho): Statement of no change, no difference, or no relationship.

Refers to the true parameters (e.g.l,r,p) of a population. This statement isassumed true until sufficient evidence is (sample data) is presented to reject it.

-O-

Observation: An individual measurement included in a sample.

One-sided: A hypothesis test in which only deviations from the null hypothesis in

one specified direction (typically thought of as to the right or to the left on the

density curve of the test statistic's sampling distribution) are considered. One-sided

(sometimes called one-tailed tests) typically have a smaller P-value for a givenstatistic value than the corresponding two-sided test, but the choice of a one-sided

test must be defended on substantive grounds prior to computing the statistic.

One-tailed: A hypothesis test in which only deviations from the null hypothesis in

one specified direction (typically thought of as to the right or to the left on the

density curve of the test statistic's sampling distribution) are considered. One-sided

(sometimes called one-sided tests) typically have a smaller P-value for a given

statistic value than the corresponding two-tailed test, but the choice of a one-sided

test must be defended on substantive grounds prior to computing the statistic.

Orthogonal: A property that ensures all experimental factors are independent of each other. No correlation exists between X’s.

Outcome: The outcome of an event is the value measured, observed, or reported

for an individual instance of that event.

Outlier: An outlier is a value that deviates from the others in a dataset. Outliers

may be extraordinarily large or small, or differ by having some exceptional

combination of values on two or more variables. Ordinarily, it is a good idea to set

outlier values aside, analyze the data without them, and then consider why the

outlier value was so extraordinary.



-P -

P-value: In assessing the significance of a null hypothesis, a P-value is the

probability of observing a value for a test statistic at least as far from the

hypothesized value as the statistic value actually observed. A small P-value indicates

either that the observation is improbable or that the probability calculation wasbased on incorrect assumptions. The assumed truth of the null hypothesis is the

assumption under suspicion.

Pareto Chart: A representation of the relative importance of the process causes or

defects.

P Chart: Proportion of defective units. Can have unequal subgroup size.

Pi e chart: A pie chart displays the distribution of a categorical variable by showing

the fraction of the cases falling within each of the categories. Since all the categories

account for 100% of the cases, a pie chart divides up a circle into wedges (pie slices)

that add to 100% of the circle. Many statisticians dislike pie charts because they

think it difficult to judge the size of a wedge. However, pie charts are widely used.

Population: All the statistics that exist today about a process plus what will exist in

future. Usually we cannot afford to observe or measure the entire population, so we

select a representative sample.

Power: The sensitivity of a statistical test to detect a difference when there really is

one, or the probability of being correct in rejecting Ho. Power = 1- Beta.

Practical Significance: used when the magnitude of change, difference, or

relationship is important to the researcher.

Precision: A confidence Interval is precise if it is narrow. Precision refers to pinning

down an estimated value, but not necessarily to the accuracy of that value.

Probability: The probability of an outcome is a number between 0 and 1 that

reports the relative frequency of the outcome's occurrence in a long series of trials.

Process: A process generates data over time. Typically, we think of a process as

having no beginning or end, but rather generating a potentially infinite stream of

data. A process is different from a sample because process data do not represent an

unseen population. Chief concern in process analysis is whether the process is in

control.



Process Map: A graphical representation of how the process is actually performed.

All rework operations and non value movements must be included.

-Q-

Qualitative: A word for categorical. Used most often in contrast to quantitative indescribing a variable whose values name categories.

Quantitative: A quantitative variable takes on a continuous range of values. Its

units should reflect how it has been measured. Contrast with categorical variable.

Quartile: Quartiles are roughly the 25% and 75% points of a variable. That is,

about a quarter of the data values fall below the first (25%) quartile, and about a

quarter of the data values fall above the third (75%) quartile. The second quartile,

corresponding to the 50% point, is the median.

-R -

R-chart: An R-Chart is a control chart that displays the ranges of samples from a

process along with suitable control lines.

Random: A phenomenon is random if the values of individual outcomes of that

phenomenon are uncertain, but nonetheless a regular distribution of outcome values

can be seen over a large number of repetitions.

Random Sample: In a random sample individuals are selected from a population bychance in an attempt to reduce bias and make the sample representative of the

population.

Random Sampling: Random sampling occurs when a sample is drawn from a

population and every individual has an equal chance of being selected.

Random Variable: A random variable is a variable whose value is the outcome of a

random phenomenon. When a random phenomenon is described by a random

variable, the sample space is just the possible values of the random variable. A

discrete random variable takes on a finite number of possible values. A continuous

random variable can take on any value in a range of values.

Randomization: Randomization involves running the experimental runs in random

order. Randomization is a key component in ensuring that the experimental

assumptions are valid.



Range: Range is the difference between the largest observation and the smallest

observation in the data set

Regression: Is a prediction equation, which allows the output to be predicted using

the input [Y= F(x)]. A technique for investigating and modeling the relationship

between input and output variables.

Residual: Difference between the observed or measured value and the fitted value

predicted by the equation.

Replicate: When performing an experiment, it is important to observe the

consequences of any treatment many times. A single observation may be unusual.

Replicating and then combining the many resulting observations reduces the

variance of the observed outcomes.

Replication: Repeating the entire experiment. All treatment combinations arerepeated. Two benefits to replication:

- Provides for an experimental error.

- Provides a better estimate of the variable or interaction effect.

Response: In experiments, we apply a treatment to individual subjects and observe

or measure their response. The response variable is thus the attribute of the subjects

that we anticipate will be changed by the treatment.

Response variable: The response variable is the variable whose values are

predicted or described in terms of the explanatory variables or factors. In anexperiment, the response variable should be identified before the experiment is

performed to prevent invalid hunts for possible responses that may show a

significant difference. Surveys may yield several possible response variables. The

convention in statistics is to denote the response variable "y" and display it on the

vertical axis of a plot.

Roll throughput yield (RTY): The probability of a unit traveling through all process

steps defect free.



-S-

S-Chart: An S-Chart is a control chart that displays the standard deviations of

samples from a process along with suitable control lines.

Sample: A subset of the population taken over time or at a single sampling.

Sample size: The number of cases in the sample. Larger sample sizes permit more

reliable and more accurate estimation of population parameters, so the sample size

is an important consideration in designing studies. The sample size is denoted n by

common convention.

Sampling: Sampling is the act of drawing a sample or portion of a larger population

to represent that population so that observations or measurements on the sample

can provide information about the population. The simplest common method of

sampling is a simple random sample.

Scatter Plot: Scatter plots display the relationship between two variables. By

convention, the (vertical) y-axis displays the values of the variable that we hope to

predict or explain, and the (horizontal) x-axis displays the values of the variable on

which we will base our prediction or explanation. If neither variable is to be predicted

or explained by the other, then either can appear on either axis.

Sigma: A statistic used to quantify the variability in a process. It quantifies the

spread about the mean.

Sigma Level: A statistic used to describe the performance of a process to the

specification limits. The sigma level is the number of standard deviations from the

specification limits to the mean of the process.

Shape: The shape of a density curve or histogram is usually described in terms of

whether it is symmetric or skewed, how many modes it has, and whether it has long

or short tails.

Significance Level: In significance testing, the significance level, commonly

denoted and called the alpha level, is a probability that specifies how extraordinary

the observed result must be for us to reject the null hypothesis. Typical significance

levels are .10, .05, and .01, but any value may be used provided it is specified

before the data are examined. In practice, if we observe a statistic whose P-value

computed based on the null hypothesis is less than the pre-determined significance

level then we reject the null hypothesis. Because the significance level defines a

"rare event," and because rare events do happen sometimes, the significance level

determines the probability of a Type I error.



Significance Test: A formal inference method in which we assert a null hypothesis,

which specifies a value for a population parameter. We then assess the likelihood (if

the null hypothesis were true) of observing the statistic value we have indeed found.

A low probability leads us to doubt the null hypothesis.

(Also called a hypothesis test)

Simple Random Sample: A simple random sample (SRS) of size n is one in which

each possible sample of n individuals has an equal chance of selection. An alternative

(and equivalent) definition is that each individual in the population has an equal and

independent chance of being selected. The randomness applied deliberately in simple

random sampling is what allows the derivation of a sampling distribution for statistics

and thus what allows statistical inference.

Simple Random Sampling: A simple random sample (SRS) of size n is one in

which each possible sample of n individuals has an equal chance of selection. An

alternative (and equivalent) definition is that each individual in the population has an

equal and independent chance of being selected.

Standard Deviatio n: The square root of the variance. The standard deviation has

the same units as the original data.

SPC: Statistical Process Control. SPC is a statistically based graphing technique that

compares current process data to a set of stable control techniques established from

normal process variation.

Skewed: A histogram or density curve is said to be skewed if one tail stretches

substantially farther from the center than the other. The skewness is said to be "tothe right" or "to the left" according to which tail (right or left, respectively) is the

longer one. Shapes that are not skewed are said to be symmetric.

Skewness: A histogram or density curve is said to be skewed if one tail stretches

substantially farther from the center than the other. The skewness is said to be "to

the right" or "to the left" according to which tail (right or left, respectively) is the

longer one. Shapes that are not skewed are said to be symmetric.

Standardize: A variable is said to be standardized when it has been adjusted to

have a mean of zero and a standard deviation of one. Usually this is accomplished

by subtracting the sample mean and dividing by the sample standard deviation.Standardized values are sometimes called z-scores.

Stratified sampling: A method of sampling in which the population is divided into

two or more strata and samples selected from each. Members of a stratum should

share characteristics. If we know the true population proportions of these

characteristics, we can arrange that the sample represent the population by having

the correct proportion of each stratum.



Statistical Significance: used when rejecting Ho , i.e. , where the sample data too

different from Ho to be reasonably attributed to chance( sampling error. )

-T-

t distribution: The t distributions are a family of unimodal, symmetric density

curves that use an additional parameter, the degrees of freedom. The t distributions

are especially important because they describe the sampling distribution of many

statistics, and thus are useful for inference. For example, the sampling distribution of

the sample mean when it has been standardized by the observed standard deviation

follows a t-distribution on n-1 degrees of freedom. The t distribution with infinite

degrees of freedom is the normal distribution, and in practice any t distribution with

more than 100 degrees of freedom (or so) is not discernibly different from the

normal t-distributions commonly arise as the ratio of a variable that follows a Normal

distribution and a variable that independently follows a chi square distribution.

Test Statistic: A standardized value (Z, t, F, etc.) calculated from sample data.

Used to calculate risk of decision error by comparison with a known distribution if Ho

is true.

* A Hypothesis test is an a priori theory relating to differences between variables.

A statistical test or Hypothesis test is performed to prove or disprove the theory.

* A Hypothesis test converts the practical problem into a statistical problem.

- Since relatively small sample sizes are used to estimate population

parameters there is always a chance of collecting a non-representative

sample.

- Inferential statistics allows us to estimate the probability of getting a non-

representative sample.

Time Series Chart: A graph that plots performance data over time for a process,usually as a line chart.

Type I Error: The error in rejecting Ho when it is in fact true (e.g., saying there is

difference when there really is not.)

Type II Error: The error in failing to reject Ho when it is in fact false (e.g., saying

there is no difference when there really is)



-U-

U Chart: Number of defects per unit. They can have unequal subgroup size.

Unimodal: A distribution shape is unimodal if it has a single mode. Distributions

with two modes are called bimodal. Because few statisticians can count above two,distributions with three or more modes are called multimodal.

Uncontrolled or Special Cause Variation: Variation that occurs when an abnormal

action enters a process and produces unexpected and unpredictable results.

-V-

Variable: A variable is a collection of measurements or observations of the same

aspect about related individuals. The individuals are commonly called subjects or

cases. Variables may hold numbers, in which case they are called quantitative, orthey may hold names, in which case they are called categorical.

Variation: Data vary. Even attempts to reproduce an experiment or observation

exactly cannot reproduce the exact outcome. The observation-to-observation

differences are variation. Statistics studies ways to minimize, control, and

understand variation, and ways to draw intelligent conclusions from data despite

their variation.

Variable Gage R&R: A methodology to assess measurement error in continuous

measurement systems. Total observed variability includes both part and

measurement system variation. Measurement system analysis addresses error dueto accuracy and precision. Gage R&R focuses only on precision component.

Variance: The variance is the square of the standard deviation. Although the

variance is a poor measure of spread (for example, because it is not in the same

units as the data), standard deviations are most often combined by squaring them

(to obtain variances) and then combining the variances.



-X-

X i – MR Chart: Individuals and Moving Range Chart. Can be used for tracking both X

and Y.

X-bar chart: An x-bar control chart plots the means of samples taken from a

process (usually at regular intervals), and compares them to a specified mean and

standard deviation.

x-variable: "x-variable" is a colloquial term for the variable plotted on the x-axis in

a scatterplot or used as the predictor variable in a regression. It may also be called

the factor.

-Y -

y-variable: "y-variable" is a colloquial term for the variable plotted on the y-axis in

a scatterplot or used as the dependent variable in a regression. It may also be called

the response.

YX Diagram: A tool used to identify/ collate potential X’s and assess their relative

impact on multiple Y’s (all Y’s which are customer focused).

-Z-

Z-scores: Z-scores are values that have been standardized to have a mean of zero

and a standard deviation of one. They often appear with reference to the standardnormal distribution.

Documents

6 Sigma Definitions