40
Articles MINI-PAPER Basic Concepts of Statistical Process Control .................. 6 COLUMNS Design of Experiments........17 Statistical Process Control ............................ 18 Stats 101 ......................... 23 Hypothesis Testing ............. 24 Testing and Evaluation ....... 31 Standards InSide-Out ......... 33 FEATURE Not Significant, But Important?........................ 35 Departments Chair’s Message ................. 1 Editor’s Corner.................... 3 Upcoming Conference Calendar .......................... 38 Statistics Division Committee Roster 2017 ...... 39 Message From the Chair Richard Herb McGrath Well, this is my last note as chair of the Statistics Division. The Statistics Digest editor, Matt Barsalou is happy because he will no longer have to remind me that my submission is late! I want to thank Matt for his patience and for the great job he does putting this publication together. I know you all enjoy reading the great content he puts together. I also want to thank the leadership team. As past chair, Teri Utlaut has been very generous in helping me learn the ropes. Chair Elect Steve Schuelka has leveraged his vast knowledge of ASQ and his contact list to guide us through this year. (Between Teri and Steve, there was not much left for the chair to do.) I am confident that Steve will do a great job leading the Division next year. Mindy Hotchkiss has kept the finances in order as the Treasurer as well as our Fall Technical Conference representative. Jennifer Williams, secretary, as well as our Vice Chairs Gary Gehring, Amy Ruiz, and Brian Sersion have all provided dedicated service to the Division. There are many other volunteers that help keep the Division running and I want to thank all of them. We are still finalizing the leadership team for next year and should have that completed by our planning meeting this October. The leadership team is here to help you make the most of your membership. One way divisions are judged by ASQ headquarters is based on member value. The Statistics Digest is but one example of how we provide value to our membership. Adam Pintar has ensured we have a half dozen or so free webinars presented each year. We sponsor talks at the World Conference on Quality Improvement (WCQI). In conjunction with the Chemical Process Industry Division (CPID) we also hold a networking event at WCQI. Along with CPID and two sections of the American Statistical Association, we co-sponsor the Fall Technical Conference (FTC). The conference is in Philadelphia this year and, as always, we will be hosting the opening reception. Other conferences that we are partial sponsors of include the Audit Division’s conference and the Lean Six Sigma Conference. Next Spring we will also be a sponsor of the Stu Hunter Conference. So if your work and travel schedules permit, please plan on attending a conference to expand your knowledge as well as your network. Continued on page 3 Statistics Digest THE NEWSLETTER OF THE ASQ STATISTICS DIVISION IN THIS ISSUE VOL. 36, NO. 3 | OCTOBER 2017 asq.org/statistics

THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

ArticlesMINI-PAPERBasic Concepts of Statistical Process Control ..................6

COLUMNSDesign of Experiments ........17

Statistical Process Control ............................18

Stats 101 ......................... 23

Hypothesis Testing .............24

Testing and Evaluation .......31

Standards InSide-Out .........33

FEATURENot Significant, But Important? ........................35

DepartmentsChair’s Message .................1

Editor’s Corner ....................3

Upcoming Conference Calendar ..........................38

Statistics Division Committee Roster 2017 ......39

Message From the ChairRichard Herb McGrath

Well, this is my last note as chair of the Statistics Division. The Statistics Digest editor, Matt Barsalou is happy because he will no longer have to remind me that my submission is late! I want to thank Matt for his patience and for the great job he does putting this publication together. I know you all enjoy reading the great content he puts together.

I also want to thank the leadership team. As past chair, Teri Utlaut has been very generous in helping me learn the ropes. Chair Elect Steve Schuelka has leveraged his vast knowledge of ASQ and his contact list to guide us through this year. (Between Teri and Steve, there was not much left for the chair to do.) I am confident that Steve will do a great job leading the Division next year. Mindy Hotchkiss has kept the finances in order as the Treasurer as well as our Fall Technical Conference representative. Jennifer Williams, secretary, as well as our Vice Chairs Gary Gehring, Amy Ruiz, and Brian Sersion have all provided dedicated service to the Division. There are many other volunteers that help keep the Division running and I want to thank all of them. We are still finalizing the leadership team for next year and should have that completed by our planning meeting this October.

The leadership team is here to help you make the most of your membership. One way divisions are judged by ASQ headquarters is based on member value. The Statistics Digest is but one example of how we provide value to our membership. Adam Pintar has ensured we have a half dozen or so free webinars presented each year. We sponsor talks at the World Conference on Quality Improvement (WCQI). In conjunction with the Chemical Process Industry Division (CPID) we also hold a networking event at WCQI. Along with CPID and two sections of the American Statistical Association, we co-sponsor the Fall Technical Conference (FTC). The conference is in Philadelphia this year and, as always, we will be hosting the opening reception. Other conferences that we are partial sponsors of include the Audit Division’s conference and the Lean Six Sigma Conference. Next Spring we will also be a sponsor of the Stu Hunter Conference. So if your work and travel schedules permit, please plan on attending a conference to expand your knowledge as well as your network.

Continued on page 3

Statistics DigestTHE NEWSLETTER OF THE ASQ STATISTICS DIVISION

IN THIS ISSUE

VOL. 36, NO. 3 | OCTOBER 2017 asq.org/statistics

Page 2: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

The Statistics Division was formed in 1979 and today it consists of both statisticians and others who practice statistics as part of their profession. The division has a rich history, with many thought leaders in the field contributing their time to develop materials, serve as members of the leadership council, or both. Would you like to be a part of the Statistics Divisions’ continuing history? Feel free to contact chair @asqstatdiv.org for information or to see what opportunities are available. No statistical knowledge is required, but a passion for statistics is expected.

VisionThe ASQ Statistics Division promotes innovation and excellence in the application and evolution of statistics to improve quality and performance.

MissionThe ASQ Statistics Division supports members in fulfilling their professional needs and aspirations in the application of statistics and development of techniques to improve quality and performance.

Strategies1. Address core educational needs of members

• Assess member needs• Develop a “base-level

knowledge of statistics” curriculum

• Promote statistical engineering• Publish featured articles, special

publications, and webinars

2. Build community and increase awareness by using diverse and effective communications

• Webinars• Newsletters• Body of Knowledge• Web site• Blog• Social Media (LinkedIn)

• Conference presentations (Fall Technical Conference, WCQI, etc.)

• Short courses• Mailings

3. Foster leadership opportunities throughout our membership and recognize leaders

• Advertise leadership opportunities/positions

• Invitations to participate in upcoming activities

• Student grants and scholarships

• Awards (e.g. Youden, Nelson, Hunter, and Bisgaard)

• Recruit, retain and advance members (e.g., Senior and Fellow status)

4. Establish and Leverage Alliances

• ASQ Sections and other Divisions

• Non-ASQ (e.g. ASA)• CQE Certification

• Standards• Outreach

(professional and social)

Updated October 19, 2013

Statistics Digest

2 VOL. 36, NO. 3 | 2017 asq.org/statistics

DisclaimerThe technical content of material published in the ASQ Statistics Division Newsletter may not have been refereed to the same extent as the rigorous refereeing that is undergone for publication in Technometrics or J.Q.T. The objective of this newsletter is to be a forum for new ideas and to be open to differing points of view. The editor will strive to review all articles and to ask other statistics professionals to provide reviews of all content of this newsletter. We encourage readers with differing points of view to write to the editor and an opportunity to present their views via a letter to the editor. The views expressed in material published in this newsletter represents the views of the author of the material, and may or may not represent the official views of the Statistics Division of ASQ.

Submission GuidelinesMini-PaperInteresting topics pertaining to the field of statistics; should be understandable by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

FeatureFocus should be on a statistical concept; can either be of a practical nature or a topic that would be of interest to practitioners who apply statistics. Length: 1,000-3,000 words.

General InformationAuthors should have a conceptual understanding of the topic and should be willing to answer questions relating to the article through the newsletter. Authors do not have to be members of the Statistics Division. Submissions may be made at any time to [email protected].

All articles will be reviewed. The editor reserves discretionary right in determination of which articles are published. Submissions should not be overly controversial. Confirmation of receipt will be provided within one week of receipt of the email. Authors will receive feedback within two months. Acceptance of articles does not imply any agreement that a given article will be published.

Page 3: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Message From the Chair Continued

The leadership team met in Newport, KY, just across the river from Cincinnati, this July for our first of two planning meetings. The second one will be held in Philadelphia immediately following the FTC. In that meeting we will finalize the business plan for next year including possible new ways to deliver value to our membership. One new vehicle is the Technical Community I mentioned in this column previously. The Data Science and Analytics Technical Community will start this Fall. If you are interested in joining the community or volunteering within the Division, please let me know!

Editor’s CornerMatt Barsalou

Welcome to another issue of Statistics Digest! This issue’s Mini-Paper is Basic Concepts of Statistical Process Control by Joseph D. Conklin and Michael J. Mazu. The Mini-Paper is based on a webinar, which is available online at the ASQ Statistics Division YouTube page.

Douglas Montgomery and Bradley Jones will be discussing “Partial Replication of Small Factorial Designs” in their DoE column. The SPC column by Donald J. Wheeler informs us “Outliers are Pure Gold” and Jack B. ReVelle’s Stats 101 describes “Process Capability Indices (Cp and Cpk).” Hypothesis Testing by Jim Frost will covers “Using Hypothesis Tests to Make Quality Decisions.” Laura Freeman’s Testing and Evaluation column lists the “Top Ten Guidelines for Testing Complex Systems.” In Standards InSide-Put, Mark Johnson tells us of the recent standards meeting in Cape Town.

We also have a Letter to the Editor from Necip Doganaksoy, Gerry Hahn and Bill Meeker in regards to Jim Frost’s column on Hypothesis testing from the last issue as well as Jim’s response.

This issue’s feature is “Not Significant, But Important?” by Julia E. Seaman and I. Elaine Allen.

With two letters, a feature, and a column related to hypothesis testing and p-values, I’d like to refer readers to the American Statistical Association’s statement on p-values in volume 70, number 2 of The American Statistician in 2016.

You may have noticed a new look with this issue. ASQ has decided that all Division content must be published in an approved ASQ newsletter template. If you would like to contribute to Statistics Digest or have an opinion on the new look, please feel free to contact me at [email protected].

Statistics Digest

3 VOL. 36, NO. 3 | 2017 asq.org/statistics

Page 4: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Comment on Jim Frost JUNE ASQ Statistics Digest Article on Hypothesis TestingWe noted with much interest the column “Hypothesis Testing” by Jim Frost in the June 2017 ASQ Statistics Digest. This article is based on the premise “Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a sample.”

We do not share Jim’s enthusiasm for hypothesis testing (and significance testing) in real-world applications, in general, and in quality assurance, in particular. Indeed, in a Statistics Spotlight article, tentatively entitled “Fallacies of Statistical Significance” and scheduled to appear in the November 2017 issue of Quality Progress, we state that “Unfortunately, the practical implications of statistical significance often turn out to be very limited and are frequently misinterpreted and overstated, or even stated incorrectly.”

Our major justification for this claim is that

• If the sample size for the given data is large, a statistically significant result is likely to be obtained even when the actual magnitude of the effect is relatively small and of little or no practical importance.

• When the sample size for the given data is small, it is quite likely that “insufficient evidence” for a statistically significant result will be obtained—even though there is, indeed, an effect of practical importance.

To put it succinctly, practical significance depends on real-world considerations, dealing with such matters as customer satisfaction, management expectations, and cost. Statistical significance and p -values, on the other hand, are heavily impacted, not only by the magnitude of the real-world effects (as they should be), but also by the size of the sample upon which the analysis is based—typically an extraneous factor in considering practical significance.” Instead we “urge practitioners to carry their analyses beyond significance tests—often by constructing appropriate confidence intervals.”

Respectfully yours.

Necip Doganaksoy, Gerry Hahn, and Bill Meeker

Jim Frost’s ResponseI agree with many of the concerns that Doctors Doganaksoy, Hahn, and Meeker mention. Specifically, I agree that the interpretations of statistical significance are often misinterpreted and overstated. However, I respectfully disagree with their notion that these issues should limit the usage of hypothesis testing. Instead, I firmly believe in educating practitioners about the proper interpretations and how to avoid these problems. These tests are powerful tools that help you separate the signal of population level effects from the noise in sample data. Like all powerful tools, they provide many benefits, but you need to know how to use them properly or you risk serious mistakes.

The column they refer to is simply the introduction to this new series. Many of the issues they mention are topics for future columns. However, elsewhere I have written at length about common misinterpretations of p-values, statistical significance, effect sizes, and practical versus statistical significance. While I have written widely about these issues, I recommend reading the following for a good overview: Five Tips to Avoid Being Fooled by False Positives and other Misleading Hypothesis Test Results.

Letters to the Editor

4 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest

Page 5: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

My goal is always to educate in order to prevent problems. Knowledge is key!

There is one statistical point where I disagree. They refer to situations where small samples sizes produce p-values that are not statistically significant. Their suggestion is that while the test result is not significant, the effect exists in the population. On the contrary, I view the insignificant p-values for this condition to be an extremely valuable warning.

Small samples produce erratic and imprecise estimates. Consequently, sample size is a relevant factor in practical significance. While an observed effect may appear large in a small sample, the insignificant p-value indicates that you can’t trust the sample estimate of this effect. You do not want to base an important decision on an untrustworthy estimate!

The insignificant p-value warns you that you need a larger sample size to learn the truth about your process. Indeed, if you were to perform the study again, it would not be surprising to obtain results that are entirely different. If the effect does not exist in the population, you will not obtain the benefits that you expect based on the small sample.

While I am a strong believer in confidence intervals, they will not help you in this situation because the intervals will be wide (indicates imprecision) and contain zero (no effect).

Respectfully,

Jim Frost

ASQ Statistics Division Travel Grants for Students and Early CareeristsEvery year, the ASQ Statistics Division awards Travel Grants for Students and Early Careerists to attend the Fall Technical Conference (FTC). The FTC has a big focus on statistics and is held annually in early October. This year we awarded 8 recipients.

We had a stupendous collection of Student Travel Grant Recipients at the 2017 FTC.

Student:Carly MetcalfeMona KhoddamKatherine AllenHabe MelnkovDavid Cole

A great group of Early Career Travel Grant Recipients joined us:

Early Career:Santhosh SubramanianHongyue SunArlyn Nieto

5 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | LETTERS TO THE EDITOR

Page 6: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

IntroductionThe American Society for Quality (ASQ) offers professional certifications in nearly all aspects of the quality profession. Each certification has its own body of knowledge (BoK). The first ASQ certification was introduced in 1968—Certified Quality Engineer.

The body of knowledge for the CQE exam stipulates that a quality engineer should have an in-depth knowledge of management and leadership in quality engineering; quality systems development; product, process, and service design; product and process control; continuous improvement; quantitative methods and tools; and risk management. Under quantitative methods and tools there is a sub category titled Statistical Process Control (SPC). The knowledge requirements for SPC include:

• objectives and benefits • common and special causes • selection of variable • rational subgrouping • control charts and analysis

Definition and BenefitsStatistical process control, or SPC for short, is a quality control and improvement strategy. SPC is

• prevention oriented, • data driven, • graphic centered, • operator run, and • used to signal the need for corrective action.

What separates SPC from other important and valuable quality related data efforts is

• a greater emphasis on the role of process operators,• more real time, on the spot analysis, and • a greater emphasis on preventing problems versus explaining why they occurred.

Basic Concepts of Statistical Process Control Joseph D. Conklin, Applied Statistician Michael J. Mazu, Principle Consultant, MJM Associates

6 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | MINI-PAPER

Page 7: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

A helpful way to understand SPC is to look at the definitions of statistics, process, and control.

• Statistics is the art and science of collecting, analyzing, and interpreting data. • A process is the combination of people, equipment, material, procedures, environmental

conditions, and measurement that combine to produce an item or a service. • Control is a method for achieving conformance to requirements, of reducing the difference

between what is expected and what is obtained.

When SPC is implemented well, some of its benefits are

• reduced defects and nonconformances,• a better ability to match production to the right process, • an improved understanding of the implications of process changes, and • an increased sense of employee involvement and ownership.

Challenge and Types of Variation SPC exists because variation exists. Variation is the tendency of outputs to differ in their characteristics even if consecutive units are produced under the same or similar conditions. No two examples of an item or a service, even those produced back to back, are ever exactly alike in all respects. A key assumption of SPC is that variation is present in all processes.

Variation occurs in more than one form. Every process has a minimum amount of variation as a consequence of its design. This minimum amount is unavoidable and is the result of many small causes that combine in random ways. Identifying individual causes in this context is neither practical nor economical. This is inherent or natural variation.

Variation can also reflect causes that are not built in to the process design. They show themselves as interruptions or disruptions. These causes tend to be few in number, large in effect, and economically worthwhile to identify and to remove or reduce. Call this special or assignable variation.

SPC looks for evidence of assignable variation so it can be removed or reduced. As a popular example of the distinction between inherent and assignable variation, consider driving a car. Imagine a daily commute to work or a trip to some store you visit regularly.

If you were to time consecutive trips, you would find they do not all take exactly the same length of time. The times vary slightly. What is going on is the combination of

• how well all the parts of the engine work together, • weather conditions, • the number of other vehicles on the road, • road conditions, and• many other causes you could name if you stopped to list them.

7 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 8: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

These many causes vary in their effect and how they combine each time, leading to different travel times—inherent variation in action.

It is easy to imagine a selective set of causes such as an accident, a flat tire, or a dead battery, that increase your travel way beyond normal or perhaps cancel a trip altogether—assignable variation in action.

We cannot drive a car without dealing with engines, weather, traffic, and road conditions—they are built into the process. Through strategies such as defensive driving and preventive maintenance, we can keep accidents and breakdowns from being a regular part of reaching our destination. We can reduce or remove the assignable variation.

So where can variation occur in a process? The answer is anywhere. An especially important place it can occur is in measurement systems. A successful SPC project requires a stable and precise measuring system. One reason why SPC projects sometimes fail is because of an inadequate measuring system. The techniques of gage R&R studies and measurement assurance, when implemented well, mean SPC projects will have the measuring systems required.

Patterns of Assignable Variation How does assignable variation show up in a process? How can we tell it is there? There are three classic patterns we will discuss now. The easiest and fastest way to see assignable variation is to plot process data in time order.

One classic pattern is the sudden shift.

Figure 1: Assignable Variation—Sudden Shift

In this pattern, the process moves at one average level and then in a very short time moves to a different average level. The process appears to jump.

A second classic pattern is a gradual drift.

8 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 9: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

The process moves from one average level to another, but the change is a gradual drift over time.

The third classic pattern is cyclical. The process alternates from low to high to low in a periodic fashion over time.

Figure 2: Assignable Variation—Gradual Drift

Figure 3: Assignable Variation—Cyclic Pattern

Meet the Control Chart Speaking of plots, here is one you want to know—a control chart. A control chart draws a picture of what is going on in a process. It shows the variation profile. On the horizontal axis, there are different time periods. Control charts are concerned with how a process runs in time order.

On the vertical axis are sample results that depend on the type of chart. The center line on a control chart shows the process average or expected value. The area bounded by the upper and lower control limits is where we find the zone of inherent variation.

9 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 10: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Control charts have several purposes. Three big ones are

• signaling the need for corrective action, • maintaining process stability, and • establishing process specifications.

In and Out of Statistical Control With a control chart, one of two basic conclusions applies at any given time. Either the process is in statistical control, or it is out of statistical control. In statistical control is the place we want to be. The evidence of statistical control is indicated on a control chart by random movement between the control limits, the majority of the points being near the center line, and a few points being near the control limits.

A control chart is designed so it is possible for a very small percentage, approximately 0.27%, of points to be outside the control limits when the process is in statistical control, but in practice, a useful rule is to say no points should be outside the limits.

When the process is in a state of statistical control, the only way to improve performance is to redesign the process for the purpose of reducing the inherent variation. This is a management decision and not an operator decision, although operators should be involved in improving the process if that is necessary.

Intervening in the process when it is in statistical control, if these interventions are not part of a long term redesign, is likely to make matters worse. When a process is in statistical control, the outputs will be stable and predictable.

Figure 4: Example of Control Chart

10 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 11: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

This does not mean they will all be exactly alike in all respects. It means histograms of the output values will be similar from hour to hour, day to day, week to week, or over whatever time frame makes for sensible comparisons.

Figure 5: Example of Control Chart for Process in Statistical Control

Figure 6: Example of Control Chart for Process Out of Statistical Control

A process is out of statistical control when points plot outside the limits, the points hug closely to the center line, the points hug closely to one of the control limits, or there is some nonrandom pattern such as a sudden shift, gradual drift, or cycle.

11 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 12: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

There are rules you can use to tell if a process is out of statistical control. A run rule says if nine or more consecutive points are on the same side of the center line, then the process is out of statistical control. A trend rule says if six or more consecutive points move either gradually up or gradually down, then the process is out of statistical control.

For simplicity, the immediately preceding discussion assumes the process has a fixed, constant target. Not all processes are like this. Some processes involve tools that wear gradually with time. These processes have a trend built into the basic design. Some pieces of measurement equipment have small but distinct cyclic variations in response to changes in temperature and humidity.

If some nonrandom pattern is part of the process design, the preceding discussion still applies, but the analysis changes. In these cases, we take the process values we observe and analyze their differences from the values we expect given current conditions. This is like the concept in statistical modeling of fitting an equation and analyzing the residuals.

Being in statistical control means the process is stable and predictable. It does necessarily mean the output is good. It is possible for a process to stably and predictably produce bad output.

What being in statistical control allows is knowledge of what type of improvement makes sense—long term redesign, or a less radical, on the spot, corrective action by the operator. Also, if a process must be improved, starting with one that is stable and predictable makes it easier to tell whether a proposed change will make a difference.

Control Limits versus Specification Limits—the DifferenceIt is easy to confuse control limits with specification limits. It is worth some time to make sure we are clear on the difference. You can think of control limits as the voice of the process. They tell you what to expect from the process given the current design.

Figure 7: Four States of a Process

The specification limits are the voice of the customer. They are what the customer wants. The process does not know what the customer wants. Management has to create and support an environment where the right processes are matched to the right requirements. There are four possible states for a process.

The ideal state is for the process to be in statistical control and in specification. The customer is happy and we as the supplier are meeting requirements in an efficient and consistent manner. If our process is in statistical control but out of specification, the customer is not happy. As a supplier, we stand the best chance of making things better because we start with a stable process and can figure out most quickly what improvements are the best.

12 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 13: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

If our process is out of control and in specification, the customer is happy but as a supplier, we are worried. The process is inefficient and may require extensive babysitting and handholding. The atmosphere is one of frustration.

If our process is out of control and out of specification, nobody is happy. As a supplier, we are not sure what is happening next, we are not sure which changes are the best improvements, and the atmosphere is one of panic and chaos. Of the four states, this last one demands the most immediate attention. The moral here is we need to care about both internal and external customer requirements for the greatest chance of success.

Variables Control ChartsThere are two large and useful categories of control charts—variables and attributes. They are named for the type of data plotted on the charts. Variables data means measurements—of time; of dimensions; or of physical, electrical, or chemical properties. Variables data may have fractional values.

Variables control charts are used in pairs. One chart of a pair tracks the average performance of the process. The other chart of a pair tracks the variation of the process. There are four common pairs of variables control charts. They are X-bar and range, X bar and standard deviation, median and range, and individuals and moving range.

X-bar, median, and individuals charts track the process average. Range, standard deviation, and moving range track the process variation.

• X bar charts plot the arithmetic average of a sample of units from the process.• Median charts plot the median of a sample of units. • Individuals charts plot the measurement of a single process item.

Individuals charts excel when it is not practical or economical to measure a sample of several items at one time.

For variation charts,

• range charts plot the range of sample units,• standard deviation charts plot the standard deviation of sample units, and• moving range charts plot the absolute difference between consecutive samples.

Figure 8: Four Customary Pairs of Variables Control Charts

13 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 14: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Attributes Control ChartsWhen you hear attributes data, think counting. Attribute data in SPC applications is a count of the number of faults (defects) or a count of the number of units with at least one fault (defectives). A defective may have more than one defect. This type of data is counted, not measured. Only whole numbers make sense, not fractional values.

There are four common types of attributes control charts.

With attributes control charts, a single chart tracks both process average and variation. A percent defective, or p chart, plots the percent or fraction defectives in a sample.

If a sample contains 100 units and ten of them are defective, the p chart will plot 10% or 0.10. A number of defectives, or np chart, will plot 10. A number of defects, or c chart, plots the number of defects in the sample. If the sample consists of ten units, and each unit has exactly two defects, the c chart will show 20. A defects per unit, or u chart, will show 2.0. To insure comparability between time periods, np and c charts require a constant sample size; p and u charts may have varying sample sizes across time periods.

Comparing Variables and Attributes Control Charts

Figure 9: Four Customary Attributes Control Charts

Figure 10: Comparing Variables and Attributes Control Chart

To collect the data for attribute control charts, some defects have to have already occurred. This means that compared to variables control charts, attributes control charts are not as quick to react to assignable variation.

Other important distinctions between variables and attributes control charts are: variables data contains more information per sample than attributes data. Therefore, sample sizes for variables data can be smaller by comparison.

Since it is usually easier to count something than to measure it, variables control charts tend to be more complex to set up and manage. Variables control charts are ideal for applications involving critical product characteristics demanding the fastest reaction to evidence of assignable variation.

14 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 15: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Attributes control charts are ideal for applications involving several equally important characteristics that are most easily managed and summarized as a group. This tends to be the case for applications involving business processes such as accounting, inventory, or training.

Designing and Constructing Control ChartsControl charts require sound design and construction. Control charts track critical or important quality characteristics where the operators can take action to respond to unwanted variation.

Before implementing a control chart, the measurement system should be checked to insure it provides precise data. Here is where the concepts of gage R&R studies play a vital role. The sampling method for process data has to match the type of chart, and samples should be taken according to a standardized procedure.

The sampling frequency depends on how long a period of time provides an opportunity for assignable variation to occur. The shorter this period the more frequent the samples should be. Each item in a sample should be collected over a short enough period of time so each one reflects the same or very similar process conditions. In SPC, this is referred to as the concept of a rational subgroup.

Most importantly, the operators need a reaction plan telling them what to do if the control chart indicates the process is out of statistical control. We will say more about reaction plans in a few moments.

To implement a control chart, we need an initial set of data from the process, over a time period where the important sources of variation have a chance to operate. We take our measurements or counts, calculate the values to be plotted on the chart, and plot them. We should also document any special or unusual events observed during data collection.

Once we have the data recorded, we calculate the center line and control limits. Then we evaluate the chart for statistical control. If we are implementing a variables control chart, the chart that tracks variation is evaluated first. After removing or reducing causes of assignable variation there, we next tackle the chart that tracks location. You will find formulas for calculating control limits in any basic SPC book.

If the process shows statistical control after the initial data collection, it and a reaction plan can be turned over to the operators. If the process is out of statistical control, we need to find the causes of the assignable variation, remove or reduce them, and start over with a new data collection.

Control limits, once set, are not forever. When the process is redesigned or improved, new control limits are necessary. If the quality characteristic becomes more or less important, the type of chart may change, and this will change the control limits. If the customer requirements change, it may also be necessary to change the control limits. In all these cases, the design and construction phase starts over.

Control Chart Reaction PlansWhere do we find sources for reaction plans to go with our control charts? The short answer is almost anywhere: customer or vendor documentation, industry standards or best practices, benchmarking of other plants or divisions in the company, the lessons of cross functional process improvement projects, process operators, process capability studies, or designed experiments are some possibilities.

Reaction plans may be simple—as in calling the supervisor, maintenance, or engineering—or more complex—as in checking equipment settings, verifying tooling, or recalibrating.

15 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 16: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Figure 11: Sources of Information for Control Chart Reaction Plans

Figure 12: Sample Control Chart Reaction Plans

Summing UpThe 1996 edition of the Glossary and Tables for Statistical Quality Control, published by Quality Press, defines statistical thinking as follows:

Statistical thinking is the manner in which information is viewed, processed and converted into action steps. Statistical thinking is a philosophy of learning and action based on the following fundamental principles:

• All work occurs in a process of interconnected processes,

• Variation exists in all processes and

• Understanding and reducing variation are keys to success.

How does SPC fit into the general picture of quality improvement? A useful distinction is between SPC principles and SPC tools. SPC principles—customer focus, fact based decisions, an emphasis on prevention, and the importance of employee ownership and involvement—are also important general principles in the field of quality and are always in season.

Control charts have great power, flexibility, and widespread applicability. They can be an important part of solving a quality problem, but they may not be the entire solution. When confronting a quality challenge, the right question to ask is what tools should be on our team, and how can control charts work together with these other tools for an optimal solution.

About the AuthorsJoseph D. Conklin is an applied statistician in Washington, D.C. He earned a master’s degree in statistics from Virginia Tech and is a Senior Member of ASQ. Conklin is also an ASQ certified quality manager, quality engineer, quality auditor, reliability engineer, and Six Sigma black belt.

Michael J. Mazu is the principal consultant at MJM Associates in Newburgh, IN. He earned a master’s degree in statistics at the University of Akron in Ohio. He is an ASQ fellow and a former ASQ regional director.

ReferenceGlossary and Tables for Statistical Quality Control, 3rd edition, by ASQ Statistics Division (ASQ Quality Press, 1996).

16 VOL. 36, NO. 3 | 2017 asq.org/statistics

MINI-PAPER Continued

Basic Concepts of Statistical Process Control

Page 17: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

The 2k series of designs and their fractions are among the most widely used experimental designs. The 22, 23, and 24 designs with 4, 8, and 16 runs respectively are popular because they employ all possible combinations of the levels of the factors. This allows for fitting models having an intercept, all the main effects, and two-way up to k-way interactions where k is the number of factors. Frequently, practitioners limit consideration to either the main effects model or the model containing main effects and two-factor interactions (2FIs). This allows for using higher order interaction effects as lack-of-fit degrees of freedom for estimating the error variance. There are two problems with this approach for small full-factorial designs. First, there are too few higher order effects to provide a precise estimate of the error variance. Second, if one or more of the higher order effects used to estimate the error variance is not negligible, then the resulting variance estimate will be too large.

Replication is typically used to produce a model-independent estimate of error (sometime called pure error). In addition to providing an unbiased estimate of error, replication can improve the power of the statistical tests to detect active effects. However, fully replicating all of the runs in the basic design is often expensive, so strategies based on replicating some runs are worth exploring.

For example, consider the 24 factorial design. If only the basic 16-run design is run and the experimenter is interested in all four main effects and the six two-factor interactions the power to detect effects sizes of two

standard deviations is 0.887, which is acceptable in most cases. Suppose that the experimenter wants to add some degrees of freedom to estimate pure error. If every run in the design is performed twice the power for detecting effects of size two standard deviations is essentially 1. This provides 16 degrees of freedom for pure error, but it doubles the cost of the experiment. We now explore what can be done with fewer runs.

Suppose that we can afford to augment this design with four runs. Many experimenters would immediately think of adding four center runs to the basic design. This would produce three degrees of freedom for pure error but it would not increase the precision with which factor effects are estimated. Furthermore, if all or many of the factors are categorical adding center runs will either not work or be awkward to implement. Replicating four of the original factorial design points would produce four degrees of freedom for pure error and will improve the precision of effect estimates. This strategy can also be used regardless of whether the factors are continuous or categorical. The question is, which four runs should be replicated?

There are C(16,4) = 1820 ways to replicate four of the 16 runs in the full-factorial design. The D-efficiency of the resulting design ranges from 96.2 to 96.7. The A-efficiency ranges from 93.19 to 93.8. Sixteen of the designs have both the maximum D- and A-efficiency as well as having the same relative standard errors for all the main effects and 2FIs (0.231). We think that having the same relative standard errors for all effects of the same type is important, as it does not show favoritism to any particular factor. One of these sixteen designs is shown in Table 1 where replicated pairs of rows are marked with asterisks.

Design of ExperimentsPartial Replication of Small Factorial Designs

Bradley Jones, PhD, JMP Division of SAS Douglas Montgomery, Arizona State University

Run X1 X2 X3 X4

*1 –1 –1 –1 –1

*2 –1 –1 –1 –1

3 –1 –1 –1 1

4 –1 –1 1 –1

*5 –1 –1 1 1

*6 –1 –1 1 1

7 –1 1 –1 –1

*8 –1 1 –1 1

*9 –1 1 –1 1

10 –1 1 1 –1

11 –1 1 1 1

12 1 –1 –1 –1

*13 1 –1 –1 1

*14 1 –1 –1 1

15 1 –1 1 –1

16 1 –1 1 1

17 1 1 –1 –1

18 1 1 –1 1

19 1 1 1 –1

20 1 1 1 1

Table 1. The 24 Design with Four Replicated Runs

17 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | COLUMN

Page 18: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Note that this design is not balanced; the first three columns (X1–X3) have 11 runs at the low level (–) and only 9 runs at the high level (+). The last column has 9 runs at the low level and 11 runs at the high level. Consequently, the design is not orthogonal. However, the absolute correlation for all pairs of main effect columns is only 0.0101. For a model consisting of only main effects the variance inflation factors are 1.0003, so the nonorthogonality of the main effects has no practical consequence. Every main effect has absolute correlations with all the two-factor interactions of 0.1005. The correlation between 12 of the 15 pairs of two-factor interactions is zero. The other three pairs have absolute correlations of 0.2.

The additional four runs has increased the power for detecting effects of size two standard deviations from 0.887 to 0.97. For detecting effect sizes of one standard deviation the power is approximately 0.49.

For additional information about partial replication strategies for factorial experiments, see Jones and Montgomery (2017). These authors conclude that so long as the basic factorial design has at least 12 runs, an augmentation of four runs should be adequate to both provide a reasonable number of pure error degrees of freedom, improve the precision of effect estimates and substantially increase the power of statistical tests.

ReferenceJones, B. and Montgomery, D.C. (2017), “Partial replication of small two-level factorial designs”, Quality Engineering, Vol. 29, No. 3, pp. 190–195.

COLUMN

Statistical Process ControlOutliers are Pure Gold

Donald J. Wheeler, Statistical Process Controls, Inc.

Much of modern statistics is concerned with creating models which contain parameters that need to be estimated. In many cases these estimates can be severely affected by unusual or extreme values in the data. For this reason students are often taught to polish up the data by removing the outliers. While this might improve your estimates, you could well be throwing away the most interesting part of your data. To illustrate this we shall look at the difference between estimating parameters and characterizing process behavior.

EstimationTo illustrate how polishing the data can improve our estimates we will use the data of Table 1. These values are 100 determinations of the weight of a ten-gram chrome steel standard known as NB10. These values

Table 1: NB10 Values for Weeks 1 to 100

18 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Design of Experiments

Page 19: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

were obtained once each week at the Bureau of Standards, by one of two individuals, using the same instrument each time. The weights were recorded to the nearest microgram. Since each value has the form of 9,999,xxx micrograms, the four nines at the start of each value are not shown in the table—only the last three values in the xxx positions are recorded. The values are in time order by column.

If we compute the usual descriptive statistics we find that the average of the tabled values is 595.4 micro-grams and their standard deviation statistic is 6.47 micrograms. Using these two values to define a normal distribution we would end up with the curve shown superimposed upon the histogram in Figure 1. Both the area under the curve and the area of the histogram are the same. Yet the curve does not really match up with the histogram. It is too heavy in the regions around 585 and 605, and not high enough near 595.

Figure 1: Histogram and Normal Curve for NB10 Values

The outliers in the histogram create the mismatch between the fitted model and the data. Seven values look like outliers in Figure 1. If we delete the four values below 586 and the three values above 606, and recumpute our descriptive statistics we find the revised histogram has an average of 595.6 micrograms and a standard deviation statistic of 3.74 micrograms. Using these two values to define a normal distribution we end up with the curve shown in Figure 2. Now we have a much better fit between our model and the histogram.

The whole operation of deleting outliers to obtain a better fit between the model and the data is based upon computations which implicitly assume that the data are homogeneous. However, when you have outliers, this assumption becomes questionable. If the data are homogeneous, where did the outliers come from? Thus, whether the data are homogeneous or not must be the primary question for any type of analysis. While this is the one question we do not address in our statistics classes, it is precisely the question considered by the process behavior chart.

Figure 2: Histogram and Normal Curve for Revised NB10 Values

19 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Statistical Process Control

Page 20: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

The Characterization of Process BehaviorWhat about the seven values we simply deleted in order to obtain the better fit between our assumed model and our revised data set? What were these values trying to tell us about this process? Here the question is not one of estimation, but rather one of using the data to characterize the underlying process represented by the data.

Figure 3 contains the XmR Chart for the 100 values of Figure 1. The limits are based upon the Median Moving Range of 4.0 micrograms. Here we have clear evidence of at least three upsets or changes in the process of weighing NB10. Five of the seven outliers that we deleted in order to fit the model in Figure 2 are signals that reveal that this set of values is not homogeneous. This lack of homogeneity undermines the model of Figure 2 and makes it inappropriate. If you want to use your data to gain insight into the underlying process that creates the data, then the outliers are the most important values in the data set! Yet students are routinely taught to delete those pesky outliers. After all, when you are looking for iron and tin, you should not let silver and gold get in the way.

Figure 3: XmR Chart for 100 Weighings of NB10

Don’t the Outliers Distort the Limits?But don’t we need to remove the outliers to compute good estimates of location and dispersion? No, we don’t. To see why this is so it is helpful to consider the impact of outliers upon the limits of a process behavior chart.

We commonly base our limits on the average and an average range. The average may be affected by some very extreme values, but this effect is usually much smaller than people think it will be. In Table 1 some values are out of line with the bulk of the data by as much as 30 micrograms. However, the average value of approximately 595 micrograms was found by dividing 59,500 by 100. If the total of 59,500 is adjusted up or down by 30, 60, or even 90 units, it will have a very small effect upon the average. In

20 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Statistical Process Control

Page 21: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

this example deleting the outliers changed the average from 595.4 to 595.6. Thus, the average is a very robust measure of location, which is why we use it as our main statistic for location. Of course, whenever we have reason to think that the average may have been affected by the presence of several extreme values that are all on the same side, we can always use the median instead. Hence, while our primary measure of location is robust, we have an alternative for those cases where one is needed.

Likewise, when we compute an average range, we are once again diluting the impact of any extreme values that are present in the set of ranges. In general, a few large ranges will not have an undue impact upon the average range. However, if they do appear to have inflated the average range, we can resort to using the median range. In Figure 3 the limits are based upon the Median Moving Range of 4.0 micrograms. This results in an estimated dispersion for the individual values of:

It is instructive to compare this with the two values for the standard deviation statistic computed from these data. Using all 100 values shown in Figure 1 we found s = 6.47 micrograms. Using only the 93 values shown in Figure 2 we found s = 3.74 micrograms. Thus, the Median Moving Range (based on all 100 values) gives an estimate for dispersion that is quite similar to the descriptive statistic computed after the outliers had been removed. This robustness which is built into the computations for the process behavior charts removes the need to polish the data prior to computing the limits. The computations work even in the presence of outliers and signals of exceptional variation.

Don’t we Need a Predictable Range Chart?The fact that the computations work even in the presence of outliers is important in the light of the advice given in some SPC texts. These texts warn the student to check the Range Chart before computing limits for the X Chart or the Average Chart. If the Range Chart is found to display evidence of unpredictable behavior, then the student is advised to avoid computing limits for the Average Chart or the X Chart. The idea being that signals on the Range Chart will corrupt the average range and hence corrupt the limits on the other chart. This advice is motivated by a desire to avoid using anything less than the best estimates possible. However, the objective of a process behavior chart is not to estimate, but rather to characterize the process as being either predictable or unpredictable.

Given the conservative nature of three-sigma limits we do not need high precision in our computations. Three sigma limits are so conservative that any uncertainty in where our computed limits fall will not greatly affect the coverage of the limits. To characterize this effect Figure 4 shows the coverages associated with limits ranging from 2.8 sigma to 3.2 sigma on either side of the mean. There we see that regardless of the shape of the distribution, and regardless of the fact that there is uncertainty in our computations, our three-sigma limits are going to filter out virtually all of the routine variation. This is what allows us to compute limits and characterize process behavior without having to first delete the outliers. The limits approximate what the process has the potential to deliver, while the running record shows the actual process performance. When the running record has points outside the limits the process is not being operated up to its full potential. In making this broad comparison approximate results are good enough. The computations are robust, and as a consequence, the technique is sensitive.

Thus, the advice to make sure the Range Chart is predictable prior to computing limits for the Average Chart or the X Chart is just another version of the delete the outliers argument. These arguments are built on a misunderstanding of the objective of process behavior charts and a failure to appreciate that the computations are already robust.

21 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Statistical Process Control

Page 22: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

SummarySo, should you delete outliers before you place your data on a process behavior chart? Only if you want to throw away the most interesting and valuable part of your data!

Deleting the outliers may improve your estimates of process parameters, but when a process is being operated unpredictably the notion of process parameters is not well-defined. This means that regardless of the quality of your computations, you are likely to have both your analysis and your conclusions undermined by the unpredictable nature of the process.

The outliers are the interesting part of your data. In the words of George Box, “The key to discovery is to get alignment between an interesting event and an interested observer.” Outliers tell you where to look in order to learn something about the underlying process that is generating your data. When you delete your outliers you are losing an opportunity for discovery.

Figure 4: Three-Sigma Limits Filter Out Virtually All of the Routine Variation Regardless of the Shape of the Histogram and Regardless of the Uncertainty in Our Estimates of the Limits

22 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Statistical Process Control

Page 23: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Stats 101Process Capability Indices (Cp and Cpk)

Jack B. ReVelle, PhD, Consulting Statistician at ReVelle Solutions, LLCReprinted with permission from Quality Press © 2004 ASQ; www.asq.org. No further distribution allowed without permission.

OverviewDesign engineers and process engineers speak two very different languages which, until recently, inhibited their ability to communicate. But now, with the development of two process status indices, the facility to share important design and development as well as manufacturing and assembly information has been considerably enhanced.

Before either of the two indices can be determined, it is mandatory that the subject process be in statistical control. This means that it is necessary to first establish the appropriate control chart and confirm that the process data does not violate any of the rules of control chart interpretation. When it has been confirmed that the process is in control then, and only then, the indices can be applied.

The process capability index, Cp, indicates if a process is capable of consistently providing virtually defect-free product, whereas the mean-sensitive process capability index, Cpk, indicates the location of the process natural range relative to product specifications or requirements.

Process Capability Index (Cp)The process capability index, Cp, is a ratio which relates the width of an engineering design specification (the numerator) to the natural range of a process (the denominator). Both entities are expressed in terms of sigma, the process standard deviation.

When Cp is equal to 1.00, then the upper and lower specification limits (the engineering nominal or target value plus or minus the acceptable limits of variation) are identical to the upper and lower control limits (the overall process average plus or minus three times sigma). The defect rate associated with this condition is 2700 ppm (parts per million defect opportunities), also known as dpmo (defects per million defect opportunities).

When Cp is greater than one, then the upper and lower spec limits are outside the control limits when the process is centered, i.e., when the engineering nominal value is equal to the overall process average. The greater the value of Cp, the lower the ppm.

When Cp is less than one, then the upper and lower spec limits are inside the control limits when the process is centered. The smaller the value of Cp, the greater the ppm.

Mean-Sensitive Process Capability Index (Cpk)The mean-sensitive process capability index, Cpk, is a ratio which quantitatively describes the location of the natural range of a process with respect to the limits of an engineering design specification. The interpretation of Cpk values is the same as those for Cp values.

Cp vs. CpkThe use of these indices is straightforward. First, determine if a process is capable at whatever level is desired, i.e.: 1.00, 1.33, 1.50, 1.67, 2.00 or greater. Then, determine at what level the process is performing, i.e., the same values. When Cpk is equal to Cp, then the process is centered. This means that the lowest possible defect rate for a given Cp has been achieved. Note that Cpk is closely related to Cp, the difference between Cpk and Cp represents the potential gain to be had from centering the process.

23 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | COLUMN

Page 24: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Hypothesis TestingUsing Hypothesis Tests to Make Quality Decisions

Jim Frost

In my last column, I introduced the basic concepts of hypothesis testing and explained the need for performing these tests. In this column, I’ll build on that and highlight the connection between different types of data, the various hypothesis tests, some of the options, and the different conclusions that you can draw. Along the way, I’ll point out important planning, related analyses, and interpretation pitfalls.

One thing I hope to convey in this column is that hypothesis testing is just one part of the statistical process. There are prerequisites, such as using control charts to ensure that your processes are stable. After all, if a process is not stable, the inferences you make today might not be valid tomorrow. And, there are other types of analyses that you might need to conduct afterward, such as a capability analysis which answers different types of questions.

A hypothesis test uses sample data to assess two mutually exclusive theories about the properties of a population. In the context of the quality improvement industry, a population can be defined as all future output from a production or transactional process. Hypothesis tests allow you to use a manageable-sized sample from the process to draw inferences about the entire process output and make sound quality improvement decisions.

I’ll cover hypothesis tests for three types of data that are common within the field of quality improvement—continuous, binary, and count data. Recognizing the different types of data is crucial because the type of data determines the hypothesis tests you can perform and, critically, the nature of the conclusions that you can draw. If you collect the wrong data, you might not be able to get the answers that you need.

Perform Power and Sample Size Analyses Before You Collect DataStatistical power is the probability that a hypothesis test will detect an effect in sample data when it actually exists in the population. Due to the inherent uncertainty that samples introduce into an analysis, the probability of detecting a real effect is never 100%. Sometimes the effect exists in the population, but a combination of random sample error and an insufficient amount of data can thwart a hypothesis test from detecting it. This probability varies based on the effect size, variability, and sample size.

Ideally, you do not pluck a sample size out of the air for a study. Instead, you should perform a power and sample size analysis. This type of analysis allows us to use our process knowledge to estimate the factors that influence power. As I mentioned, using control charts to confirm that the processes are stable is a crucial prerequisite. Fortunately, this step is doubly helpful because it can provide the process properties that are required inputs in the sample size calculations.

A power and sample size analysis helps you choose a good sample size. A good sample size is large enough to detect an effect that is meaningful to your process but not so large as to be prohibitively expensive. A power and sample size analysis helps you strike this balance.

Suppose we are assessing the flexibility of a product, and the target value is 100 units. Using our understanding of the process, we know that a deviation as small as plus or minus 5 units causes problems. Consequently, we want to be sure that the sample size is large enough to detect

24 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | COLUMN

Page 25: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

differences of 5 units from the target value with a high probability. A power and sample size analysis helps ensure that we can detect a difference this small.

While the details of this type of analysis are outside the scope of the article, it is a crucial step to perform before you collect any data.

Hypothesis Tests for Continuous DataContinuous data can take on any numeric value, and it can be meaningfully divided into smaller increments, including fractional and decimal values. There are an infinite number of possible values between any two values. You often measure a continuous variable on a scale. For example, when you measure height, weight, and temperature, you have continuous data. With continuous variables, you can use hypothesis tests to assess the mean, median, and standard deviation.

When you collect continuous data, you usually get the most bang for your data buck compared to discrete data. The two key advantages of continuous data are that you can:

• Draw conclusions with a smaller sample size.• Use a wider variety of analyses, which allows you to learn more.

I’ll cover two of the more common hypothesis tests that you can use with continuous data—t-tests to assess means and variance tests to evaluate dispersion around the mean. Both of these tests come in one-sample and two-sample versions. One-sample tests allow you to compare your sample estimate to a target value, such as a process requirement. The two-sample tests let you compare the samples to each other, which can help you determine if one is better than the other.

There is also a group of tests that assesses the median rather than the mean. These are known as nonparametric tests and practitioners use them less frequently. However, consider using a nonparametric test if your data are highly skewed and the median better represents the actual center of your data than the mean.

For this article, I will focus on using two-sample tests to help us make quality improvement decisions.

Graphing the DataSuppose we have two production methods and our goal is to determine which one produces a stronger product. To evaluate the two methods, we draw a random sample of 30 products from each production line. Before we perform any analyses, it’s always a good idea to graph the data because they provide an excellent overview of the data.

These histograms suggest that Method 2 has a higher mean and that Method 1 has lower variability. The higher mean strength is good for our product, but the greater variability might produce more defects.

Graphs provide a good picture, but they do not statistically test the data. The differences in the graphs might be caused only by random sample error rather than an actual difference between production methods. If the observed differences are due to random error, it would not be surprising if another sample showed different patterns. It can be a costly mistake to base decisions on “results” that vary with each sample. Hypothesis tests factor in random error to improve our chances of making correct decisions.

Keep these graphs in mind when we look at binary data because they illustrate how much more information continuous data convey.

25 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Hypothesis Testing

Page 26: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Two-sample t-test to Compare MeansThe first thing we want to determine is whether one of the methods produces stronger products. We’ll use a two-sample t-test to determine whether the population means are different. The hypotheses for our 2-sample t-test are:

• Null hypothesis: The mean strengths for the two populations are equal.• Alternative hypothesis: The mean strengths for the two populations are different.

A p-value less than the significance level indicates that you can reject the null hypothesis. In other words, the sample provides sufficient evidence to conclude that the population means are different. Below is the output for the analysis.

Figure 1: Histograms of Method 1 and Method 2

The p-value is less than 0.05. From the output, we can see that the difference between the mean of Method 2 (98.39) and Method 1 (95.39) is statistically significant. We can conclude that Method 2 produces a stronger product on average. The 95% CI for the difference indicates that we can be confident that the true difference between the population means is between –0.27 and –6.44.

26 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Hypothesis Testing

Page 27: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

That sounds great, and it appears that we should use Method 2 to manufacture a stronger product. However, there are other considerations. The t-test tells us that Method 2’s mean strength is greater than Method 1, but it says nothing about the variability of strength values. For that, we need to use another test.

2-Variances Test to Compare VariabilityA production method that has excessive variability creates too many defects. Consequently, we will also assess the standard deviations of both methods. To determine whether either method produces greater variability in the product’s strength, we’ll use the 2 Variances test. The hypotheses for our 2 Variances test are:

• Null hypothesis: The standard deviations for the populations are equal.• Alternative hypothesis: The standard deviations for the populations are different.

The 2-Variances procedure performs several tests and produces several p-values. If the p-values differ, you need to determine which test best applies to your data. A p-value less than the significance level indicates that you can reject the null hypothesis. In other words, the sample provides sufficient evidence for concluding that the population standard deviations are different. The 2-Variances output for our product is below.

Figure 2: Test and Confidence interval for Method 1 and Method 2

Both of the p-values are less than 0.05. The output indicates that the variability of Method 1 is significantly less than Method 2. We can conclude that Method 1 produces a more consistent product.

What We Know and Do Not KnowThe hypothesis test results confirm the patterns in the graphs. Method 2 produces stronger products on average while Method 1 produces a more consistent product. The statistically significant test results indicate that these results are likely to represent actual differences between the production methods rather than sampling error.

27 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Hypothesis Testing

Page 28: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Keep in mind that no test can completely rule out an incorrect decision. In this case, these test results could be a Type I error, which occurs when you reject a null hypothesis that is actually true (i.e., a false discovery). Working with samples is great because it is usually impossible to measure the entire population. Unfortunately, samples introduce error in the findings, which can lead to incorrect decisions. However, basing decisions on hypothesis tests improves our chances of making the correct decision.

Our example also illustrates how you can assess different properties using continuous data, which can point towards different decisions. We might want the stronger products of Method 2 but the greater consistency of Method 1. To navigate this dilemma, we’ll need to use our process knowledge.

Finally, it’s crucial to note that the tests produce estimates of population parameters—the population means (μ) and the population standard deviations (σ). While these parameters can help us make decisions, they tell us little about where individual values are likely to fall. In quality improvement, knowing the proportion of values that fall within the specs limits is crucial.

To better understand the distribution of individual values, you can use the following analyses:

Tolerance intervals: A tolerance interval is a range that likely contains a specific proportion of a population. For our example, we might want to know the range where 99% of the population falls for each production method. We can compare the tolerance interval to our requirements to determine whether there is too much variability.

Capability analysis: This type of analysis uses sample data to determine how effectively a process produces output with characteristics that fall within the spec limits. These tools incorporate both the mean and spread of your data to estimate the proportion of defects.

Proportion Hypothesis Tests for Binary DataLet’s switch gears and move away from continuous data. Suppose we take another random sample of our product from each of the production lines. However, instead of measuring a characteristic, inspectors evaluate each product and either accept or reject it.

Binary data can have only two values. If you can place an observation into only two categories, you have a binary variable. For example, pass/fail and accept/reject data are binary. Quality improvement practitioners often use binary data to record defective units.

Binary data are useful for calculating proportions or percentages, such as the proportion of defective products in a sample. You simply take the number of defective products and divide by the sample size. Hypothesis tests that assess proportions require binary data and allow you to use sample data to make inferences about the proportions of populations.

For our example, we will make a decision based on the proportions of defective parts. Our goal is to determine whether the two methods produce different proportions of defective parts.

To make this determination, we’ll use the 2 Proportions test. For this test, the hypotheses are as follows:

• Null hypothesis: The proportions of defective parts for the two populations are equal.• Alternative hypothesis: The proportions of defective parts for the two populations are different.

28 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Hypothesis Testing

Page 29: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

A p-value less than the significance level indicates that you can reject the null hypothesis. In this case, the sample provides sufficient evidence for concluding that the population proportions are different. The 2 Proportions output for our product is below.

Both p-values are less than 0.05. The output indicates that the difference between the proportion of defective parts for Method 1 (~0.062) and Method 2 (~0.146) is statistically significant. We can conclude that Method 1 produces defective parts less frequently.

1 Proportions Example: Comparison to a TargetThe 1 Proportion test is also handy because you can compare a sample to a target value. Suppose you receive parts from a supplier who guarantees that less than 3% of all parts they produce are defective. You can use the 1 Proportion test to assess this claim.

First, collect a random sample of parts and determine how many are defective. Then, use the 1 Proportion test to compare your sample estimate to the target proportion of 0.03. Because we are interested in detecting only whether the population proportion is greater than 0.03, we’ll use a one-sided test. One-sided tests have greater power to detect differences in one direction, but no ability to detect differences in the other direction. Our one-sided 1 Proportion test has the following hypotheses:

• Null hypothesis: The proportion of defective parts for the population equals 0.03 or less.• Alternative hypothesis: The proportion of defective parts for the population is greater than 0.03.

For this test, a significant p-value indicates that the supplier is in trouble! The sample provides sufficient evidence to conclude that the proportion of all parts from the supplier’s process is greater than 0.03 despite their assertions to the contrary.

Comparing Continuous Data to Binary DataThink back to the graphs for the continuous data. At a glance, you can see both the central location and spread of the data. If we added spec limits, we could see how many data points are close and far away from them. Is the process centered between the spec limits? Continuous data provide a lot of insight into our processes.

Now, compare that to the binary data that we used in the 2 Proportions test. All we learn from that data is the proportion of defects for Method 1 (0.062) and Method 2 (0.146). There is no distribution to analyze, no indication of how close the items are to the specs, and no indication of how they failed the inspection. We only know the two proportions. Additionally, the samples sizes are much larger for the binary data than the continuous data (130 vs. 30). When the difference between proportions is smaller, the required sample sizes can become quite large. Again, use a power and sample size analysis to be sure that you are working with a good sample size. Had we used a sample size of 30 like before, we almost certainly would not have detected this difference.

29 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Hypothesis Testing

Page 30: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

In general, binary data provide less information than an equivalent amount of continuous data. If you can collect continuous data, it’s the better route to take!

Poisson Hypothesis Tests for Count DataNow, let’s move to count data. For this scenario, we’ll assume that we receive shipments of parts from two different suppliers. Each supplier sends the parts in the same sized batch. We need to determine whether one supplier produces fewer defects than the other supplier. To perform this analysis, we’ll randomly sample batches of parts from both suppliers. The inspectors examine all parts in each batch and record the count of defective parts.

Count data can have only non-negative integers (e.g., 0, 1, 2, etc.). In statistics, we often model count data using the Poisson distribution. Poisson data are a count of the presence of a characteristic, result, or activity over a constant amount of time, area, or other length of observation. For example, you can use count data to record the number of defects per item or defective units per batch. With Poisson data, you can assess a rate of occurrence.

For this scenario, we need to make a decision based on the rate of defects per a constant-sized batch. Our goal is to determine whether the two suppliers create defects at different rates. We’ll use the 2-Sample Poisson Rate test. For this test, the hypotheses are as follows:

• Null hypothesis: The rates of defective parts for the two populations are equal.• Alternative hypothesis: The rates of defective parts for the two populations are different.

A p-value less than the significance level indicates that you can reject the null hypothesis because the sample provides sufficient evidence to conclude that the population rates are different. The 2-Sample Poisson Rate output for our product is below.

Both p-values are less than 0.05. The output indicates that the difference between the rate of defects per batch for Supplier 1 (3.56667) and Supplier 2 (5.36667) is statistically significant. We can conclude that Supplier 1 produces defects at a lower rate than Supplier 2.

Hypothesis tests are a great tool that you can use to help make quality-related decisions. These tests allow you take relatively small samples and draw conclusions about entire populations. There is a selection of tests available, and different options within the tests, which make them useful for a wide variety of situations. However, hypothesis tests are just one type of tool that provides a particular kind of answer. The chances are high that you will need to use them in conjunction with other statistical tools.

30 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Hypothesis Testing

Page 31: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Testing and EvaluationTop Ten Guidelines for Testing Complex Systems

Laura Freeman, PhD, Assistant Director of the Operational Evaluation Division and Test Science Task Leader at the Institute for Defense Analyses

In the past several years, nearly every major military system tested in the U.S. DoD has used statistical design of experiments in developing at least some aspect of their operational testing. Through this broad application of experimental design, we have identified several best practices in developing test programs for complex defense systems. It is critical to start early in the test planning process. Programs successful using experimental design establish test planning working groups that include all stakeholders, including the program manager, the requirements representative, developmental and operational testers, and subject matter experts in experimental design and analysis. All of these stakeholders are necessary to identify the key elements of test planning including: goals, response variables, factors, analysis requirements, and resource constraints. In addition to having the right people, several specific analytical best practices have proven useful in ensuring effective testing while reducing the test resources required. These best practices include the following:

1. Do not limit test goals to verifying a single narrowly-defined requirement in a static set of conditions. Rather, testing should aim to characterize performance of the unit when equipped with the system across all feasible and operationally realistic conditions. For example, an unmanned intelligence aircraft might be required to detect 90 percent of pick-up trucks at a slant range of 10 miles. Adequate testing does not only look at one size truck at 10 miles. The requirement should be considered in the wider context of all potential targets of operational interest (e.g., cars, trucks, tanks, etc.). Detection ranges should be characterized to fully understand the probability of detection as a function of range for a variety of potential targets.

2. Related to point 1, when sizing tests, avoid using a single hypothesis tests to determining test adequacy. This approach is generally inadequate for ensuring that system performance is well-characterized. When experimental designs are employed for test planning, the related factor-by-factor power calculations should be the primary focus for test sizing, vice a single “roll-up” power estimate. Historically, test planners have used one-sample hypothesis calculations to compare directly to a single requirement, even in cases where multiple conditions are evaluated. This practice fails to capture if the test design is truly adequate to characterize system capabilities across the operational environment.

3. Use multiple response variables. Complex systems often require multiple dimensions to capture successful outcome. For example, the unmanned intelligence aircraft should provide timely intelligence information, but if it is not accurate the system is not effective. Alternatively, accurate intelligence may be ineffective if not provided in time. It is essential to capture both timeliness and accuracy data to tell the full story.

4. Use continuous metrics as the primary measures of system performance as opposed to pass/fail probability-based metrics. Continuous metrics reduce test resource requirements for the same level of information. However, many DoD requirements are specified in terms of probabilities. They provide intuitive measures of success and can often be captured far more easily than continuous measures but, they require more observations to provide precise estimates and often do not facilitate more nuanced differentiations. For example, in the comparison of two new unmanned intelligence aircraft, both may be able to accomplish a mission successfully, but one could

31 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | COLUMN

Page 32: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

outperform the other in terms of providing higher accuracy. Analysts should think hard about what underlying continuous variable results in success/failure to ensure testing captures as much information as possible.

5. Design tests around the superset of response variables, prioritizing appropriately. In the unmanned intelligence aircraft example, different factors might impact the timeliness of information and the accuracy of information. For example, timeliness may be affected by the presence of decoy targets; but the accuracy, once the target of interest was identified, would not be affected by the presence of decoys. Here, the presence/absence of decoys is a factor in timeliness, but not accuracy. It is often more productive to develop the list of all factors that could impact both time and accuracy and use that list to design the experiment. This also results in experiments that tend to encompass the full operational environment instead of breaking the operational space into more narrowly defined experiments.

6. Use continuous factors when possible to cover the operational envelope. Identifying these continuous factors, or casting operational conditions in a continuous manner, enables the use of response surface design techniques specifically available for continuous factors. Using these techniques results in test efficiencies when compared to categorically binned factor levels and provide information-rich test results by enabling interpolation between observed test points. For example, using the radar cross section of a target as a factor as opposed to a categorical binning (e.g., car, truck, and tank). That way detection range can be estimated as a function of target size as opposed to only reported for the few targets used in test.

7. Include all relevant factors in the test design. By selecting relevant test factors and forcing purposeful control of those factors we can ensure that the test covers conditions the system will encounter once fielded. Leverage data from previous tests to narrow the list of relevant factors and mitigate the risk of excluding important factors. Avoid omitting factors known to be important from the test designs in an effort to save test resources. Such practices result in holes in our knowledge of system capability.

8. Use sequential experimentation approaches to reduce required test resources in each test phase while developing a comprehensive view of system capability. Sequential testing can reduce overall size/resource requirements for full campaign. This is possible by using sparse screening experiments early in the test process to identify the most important factors. Follow-on phases of test designs refine our knowledge about the reduced set of factors. Smaller, strategically linked tests not only have the ability to reduce the overall test program, but also reduce the amount of time speculating and arguing about unknowns, and thus amount of resources spent in both planning time and unnecessarily “covering all our bases.”

9. Acknowledge that complex systems rarely operate themselves. Thus, operators and maintainers are part of the system. This knowledge should impact the test design. Good tests designs should control for order effects, include multiple representative users, account for differences between users, tie user data to system state and performance, and capture data about the user experience through validated surveys as well as behavioral metrics.

10. Finally, we often have to make hard decisions about cutting down the size of the test—if we have to reduce the size of the test we should understand and clearly acknowledge the implications of what information is lost. Is it limited to a reduction in statistical power or are we cutting a portion of the tested operational envelope, or both?

11. An important concept often overlooked when testing complex systems is that there is no one right test design and there are always multiple acceptable test design approaches. However, the best approaches we have observed follow these 10 guidelines!

32 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Testing and Evaluation

Page 33: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Standards InSide-OutCape Town Recap

Mark Johnson, PhD, University of Central Florida, Standards Representative for the Statistics Division

One of the pleasures of working in international statistical standards (ISO TC69) is to attend the annual meeting hosted by a participating member country. The country hosting the meeting follows an irregular rotation that has been London, Dalian (China), Vienna, (under the aegis of the UK), Milwaukee, Tokyo, Berlin, Paris, Kuala Lumpur, Beijing and Copenhagen for the past ten years. The 2017 meeting took place last June in Cape Town, South Africa, hosted by South Africa Bureau of Standards under the auspices of Marius Cronjes in the SABS Cape Town offices.

Cape Town came highly recommended by our counterparts from the UK, several of whom have either lived in South Africa or visited it extensively. I would concur that Table Mountain overlooking this coastal city is impressive, as are the Botanical Gardens, the Harbor Front and the port itself. The cuisine is also worth noting in that a mixed grill barbeque typically includes springbok, wild boar, ostrich, crocodile, kudu, sausage and the ever-present side dish pap (i.e., maize polenta–a staple in SA). With the dollar now at 13 rand per dollar versus 6.6 per rand the last time we were in SA (2007), the trip was rather more affordable but a considerable trek from the US. Although June is winter in South Africa, the climate in Cape Town is temperate so that there was no heat wave to deal with as in Europe this past June.

Rather than a blow-by-blow description of the numerous documents reviewed and progressed towards standardization, I am devoting this column to a few of the highlights of the technical aspects of the TC69 meeting and some accomplishments of late.

Subcommittee 4 (ISO TC69/SC4) is officially “Applications of statistical methods in process management” and announced that their standards have become strongly endorsed and implemented by the Automotive Industry. This puts their standards at the front line of usage. The standard ISO 22514 particularly stands out “Statistical methods in process management” having several parts on General principles and concepts, Machine performance studies for measure data on discrete parts, and Process capability estimates. SC4 has several other popular standards involving statistical process control and control charts. Credit should be given to the US leadership of SC4 (John Kim and Steve Walfish) as well as to Edgar Dietrich (Hexagon and previously Q-Das) from Germany who has hosted SC4 interim meetings and is sponsoring next year’s ISO TC69 meeting in Berlin.

Another key development within ISO TC69 is the establishment of a Working Group under the aegis of TC69 (not assigned to a particular subcommittee) to deal with Big Data standards and technical reports. Two initiatives were undertaken with New Work Item Proposals drafted. The “official” decision is embodied in the following resolution approved at the closing plenary on Friday, June 21:

Resolution 6/2017—WG4 Big Data AnalyticsISO/TC 69 resolves to create WG4 Big Data Analytics to initiate two projects – Data Science Life Cycle Model and Model Validation. The convenor for WG4 will be Radouane Oudrhiri (UK) and his term will last through 2020. It is requested that all members of former AHG7 Big Data become members of WG4 Big Data Analytics.

Note that this working group is the natural descendent of the original ad hoc group that had launched this initiative. The ISO TC69 Chair, Michéle Boulanger (Rollins College) has led the effort and has attracted the participation of numerous experts both in the US and abroad. For further details on this work, Quality Progress has an article appearing in the September 2017 issue (cover story in fact)

33 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | COLUMN

Page 34: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

that describes in detail the plans and progress to date, much of it joint with IEC through the joint technical committee JTC1. The Big Data effort in TC69 will initially consist of three New Work Item Proposals:

Model Validation, project leader Gilbert SaportaLife Cycle Model, project leader Nancy GradyTerms in Big Data, project leader Mark Johnson (under the auspices of SC1) on Vocabulary and Symbols.

For purposes of TC69, Big Data concerns situations involving the “Four V’s”: Volume (data set size), Variety (diverse data sets residing in multiple sites), Velocity (data being generated and/or transmitted quickly—possibility faster than it can be analyzed), and Variability (not in the statistical sense but in the non-constancy of Volume, Variety and Velocity).

Another subcommittee that is extremely active is SC8 (Application of statistical and related methodology for new technology and product development). They are working on an important standard ISO 16355-1:2015 “Application of statistical and related methods to new technology and product development process” consisting of the following eight parts:

Part 1: General principles and perspectives of Quality Function DeploymentPart 2: Non-quantitative approaches for the acquisition of voice of customer and voice of stakeholderPart 3: Quantitative approaches for the acquisition of voice of customer and voice of stakeholderPart 4: Analysis of non-quantitative and quantitative voice of customer and voice of stakeholderPart 5: Solution strategyPart 6: QFD—related approaches to optimizationPart 7: Other approaches to optimizationPart 8: Guidelines for commercialization and life cycle

Hiroe Tsubaki of Japan is the Chair of SC8 while many of the parts of the standard are being ably produced by the project leader Glenn Mazur, who is also the US Lead Delegate for SC8. SC8 has an active and productive work program.

Finally, I would like to acknowledge the US Delegation who participated in the Cape Town meeting and recognize the support of their institutions for both the work and the travel support in these endeavors. The US delegation in Cape Town was, as follows:

Jennifer Admussen (Secretary TC69 and SC4; ASQ)Brenda Bishop (Vice Chair US Delegation, TC69/WG3 Lead)Kelly Black (US Head of Delegation; Neptune and Company Inc.)Michele Boulanger (Chair, ISO TC69; Rollins College)Nien-Fan Zhang (SC6 Lead Delegate; NIST)Mark Johnson (Chair, SC1 and US Lead Delegate; Univ. Central Florida)Glen Mazur (SC8 Lead Delegate; QFD Institute)Michael Morton (SC6, Altria Client Services,)Dan Tholen (ILAC liaison, Dan Tholen Statistical Consulting) (Participating by WebEx) Nancy Grady (Big Data Working Group; SAIC)Tom Kubiak (SC7 Lead Delegate and Big Data Working Group; Performance Improvement Solutions)John Vandenbemden (SC5 Lead Delegate) (Participating by WebEx)

Dan Tholen announced his retirement from Standards work and his excellent contributions over the years are much appreciated.

US experts interested in contributing to international statistical standards work should contact Jennifer Admussen of ASQ at [email protected]. Non-US citizen international experts should contact their country’s official standards organization. By applying in the next few months, delegate status could be achieved to allow official participation in next year’s meeting to be held in Berlin June 25–29, 2018.

34 VOL. 36, NO. 3 | 2017 asq.org/statistics

COLUMN Continued

Standards InSide-Out

Page 35: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Not Significant, But Important?Julia E. Seaman and I. Elaine Allen

Reprinted with permission from Quality Press © 2011 ASQ; www.asq.org. No further distribution allowed without permission.

Know the Pitfalls of p-values and Formal Hypothesis TestsIn March 2011, the Supreme Court ruled that even if a result from a controlled clinical trial was not statistically significant, it still might be important.

In the case, brought by investors in the biopharmaceutical firm Matrixx Initiatives, judges ruled that the company failed to disclose reports that its over-the-counter medicine, Zicam, sometimes caused a loss of the sense of smell.

Matrixx Initiatives argued it had no reason to disclose these adverse events because the results did not reach statistical significance. The court rejected that argument and let the case proceed in a lower court. An article in the Wall Street Journal trumpeted this result and quoted statisticians supporting the Supreme Court.1

What does this say about hypothesis testing and statistical evidence? Doesn’t this type of ruling fly in the face of the scientific method?

Since R.A. Fisher’s investigation of agricultural plantings in the 1920s, statisticians have used hypothesis testing and critical significance levels to make conclusions about their data. Surprisingly, when Fisher proposed his null and alternative hypotheses, they were not welcomed by all statisticians.

Jerzy Neyman and Egon Pearson attacked this notion of statistical significance and suggested it made more sense to test competing hypotheses against one another. Rather than “rejecting” or “not rejecting” a null hypothesis, these competing hypotheses-statistical tests gauge the likelihood of a “false positive.” To Fisher, it’s necessary to note the exact p-value was important and not a critical cut-off value, such as 5% probability for overall decision making.2

P-valuesWhat is a p-value? What exactly are you testing? What does probability mean in real life?

The definition of a p-value is correctly explained by Tom Siegfried: “Experimental data yielding a p-value of 0.05 means that there is only a 5% chance of obtaining an observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct).”3

The definition shows the p-value gives information about the probability of obtaining evidence. It doesn’t quantify the strength of the evidence. So, if there is a significant difference and the null hypothesis is rejected, how do you know if the result is practically important? The short answer is you don’t.

A nonsignificant difference may or may not be an important difference, as shown in Table 1, which contains some results from a study by the Global Entrepreneurship Monitor (GEM). The study was written after the research consortium interviewed more than 200,000 individuals globally in more than 60 countries. Each country contributed a sample of 2,000 or more.4

For example, some results from the study compare male and female entrepreneurship rates globally and within Massachusetts. The rates are comparable, but the statistical results are conflicting. Are these results important? Why are the tests giving different results?

35 VOL. 36, NO. 3 | 2017 asq.org/statistics

Statistics Digest | FEATURE

Page 36: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

The Massachusetts sample is 1,000, of which only about 100 are entrepreneurs. The global sample is close to 200,000, of which about 20,000 are entrepreneurs. In both tests, the percentage of males starting a business is almost 50% larger than females starting a business.

Given differing statistical conclusions for the same evidence, it is important not to come to different overall conclusions. One way to ensure statistical significance is to increase your sample size. Alternatively, if your sample size is not large enough, nothing will be significant because there is not enough power to discriminate between the two groups.

A difference may be significant in multiple tests but remain unimportant to the overall evaluation or project. With advances in manufacturing using computers to input the specifications, machines may have small but highly significant differences that are meaningless to the overall process because the tolerances are so small.

In Table 2, for example, statistical significance is driven by the small (or lack of) variability within a machine, unlike the earlier GEM example, which was driven by sample size. A p-value indicates the likelihood there is a relationship that is not due to chance. A p-value does not indicate the strength of the relationship or whether the differences it is examining are relevant.5

Table 2 shows how two cutters created the same widget part of length 5 in a factory. Three samples from each cutter were accurately measured and showed significant differences. Is this result important?

The machines consistently cut widget parts that were different lengths from one another. But both cutters created parts that were much less than 0.1% different from the true length, a difference that was likely to have little overall effect. In this case, the machines were so accurate that a significant difference was achieved, but it was a trivial significance.

Conditional ProbabilityIt is also critical to understand that the statistical hypothesis test is always a conditional test—conditional on the null hypothesis (usually of “no difference”) being true. It is not, as usually stated, that there is a real statistically significant difference between two sets of data, but rather the conditional probability of observing data is as extreme (or more extreme) than what was seen in the sample.

In other words, the significance test calculates the resulting p-value, assuming the null hypothesis is true. The p-value result describes the sample gathered for the test and tells the investigator how unusual the sample is.

A statistically significant test result is meaningless without the proper design and interpretation. Before any analysis, the investigator must be careful the hypothesis in question is actually being tested.

Additionally, some thought should be given ahead of time to possible confounders, quality control and statistical corrections. If a hypothesis is not rejected, appropriate questions for the investigator are, “Was the sample size large enough? Was the outcome measured with sufficient precision as to detect a difference (or discriminate) between the null and alternative hypotheses if a true difference existed?”

When a hypothesis is rejected, the appropriate question is, “Was my sample size so large—or my measurement error so small—that I would have rejected any null hypothesis?” Table 3 converts these questions into a short list of how to report hypothesis-based test results.

Table 1: Important vs. Significant Difference Table

Table 2: Non-important Significance Example

36 VOL. 36, NO. 3 | 2017 asq.org/statistics

FEATURE Continued

Not Significant, But Important?

Page 37: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Frequentist vs. BayesianEven following the rubric in Table 3, many statisticians still believe classic statistical tests are inferior because they rely on strict comparison to the null hypothesis. Is there a better way? What might be in the future for null hypotheses and statistical tests?

To avoid the pitfalls of p-values and formal hypothesis tests, many researchers are pointing to Bayesian methods instead of the classical frequentist methods examined here and based on Fisher’s work. Bayesian techniques rely on the data for pointing in the direction of any conclusions based on the data.

Stated another way, the Bayesian approach calculates the probability of the hypothesis given the data, while the frequentist approach computes the probability of the data given the (null) hypothesis. More flexible in terms of including data and information but complicated in its calculation of prior and posterior probabilities, Bayesian methods of analysis will be the topic of a future discussion.

References1. Carl Bialik, “Making a Stat Less Significant,” The Wall Street

Journal, April 2, 2011.

2. Tobias Johansson, “Hail the Impossible: P-values, Evidence and Likelihood,” Scandinavian Journal of Psychology, Vol. 52, 2011, pp. 113–125.

3. Tom Siegfried, “Odds Are, It’s Wrong: Science Fails to Face the Shortcomings of Statistics,” Science News, Vol. 27, March 2010.

4. Global Entrepreneurship Monitor (GEM), U.S. GEM Report, 2009, www3.babson.edu/eship/research-publications/gem.cfm.

5. Michael Januszyk and G.C. Gurtner, “Statistics in Medicine,” Plastic and Reconstructive Surgery, Vol. 127, No. 1, 2011, pp. 437–444.

BibliographyGorard, Stephen, “All Evidence is Equal: The Flaw in Statistical Reasoning,” Oxford Review of Education, Vol. 36, No. 1, 2010, pp. 63–77.

Lee, J. Jack, “Demystify Statistical Significance—Time to Move on From the P Value to Bayesian Analysis,” Journal of the National Cancer Institute, Vol. 103, No. 1, 2010, pp. 2–3.

About the AuthorsJulia E. Seaman is a doctoral student in pharmacogenomics at the University of California, San Francisco, and a statistical consultant for the Babson Survey Research Group at Babson College. She earned a bachelor’s degree in chemistry and mathematics from Pomona College in Claremont, CA.

Elaine Allen is research director of the Arthur M. Blank Center for Entrepreneurship, director of the Babson Survey Research Group, and professor of statistics and entrepreneurship at Babson College in Wellesley, MA. She earned a doctorate in statistics from Cornell University in Ithaca, NY. Allen is a member of ASQ.

Table 3: Reporting Statistical Results and p-values

37 VOL. 36, NO. 3 | 2017 asq.org/statistics

FEATURE Continued

Not Significant, But Important?

Page 38: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Conference on Statistical Practice15–17 February 2018 | Portland, OR | http://www.amstat.org/

The 2018 American Statistical Association Conference on Statistical Practice aims to bring together hundreds of statistical practitioners and data scientists—including data analysts, researchers, and scientists—who engage in the application of statistics to solve real-world problems on a daily basis.

The goal of the conference is to provide participants with opportunities to learn new statistical methodologies and best practices in statistical analysis, design, consulting, and programming. The conference is designed to help applied statisticians improve their abilities in consulting with and aiding customers and organizations solve real-world problems.

2018 Lean and Six Sigma Conference26–27 February 2018 | Phoenix, AZ | http://asq.org/conferences/

Tips and Tricks: Sustaining Results: This area focuses on implementation, getting results from that implementation, and exploring ways to ensure that the achieved results are sustained.

Lean and Six Sigma in the Age of Digital Transformation: area of focus explores the opportunities that exist to leverage lean and Six Sigma when addressing the challenges and opportunities brought on by disruptive technologies.

Lessons Learned: Implementation of Lean and Six Sigma: In this focus area, we are looking for real-life examples from quality practitioners who have applied lean and Six Sigma tools, methodologies, and techniques.

Masters Series: This focus area explores lean and Six Sigma from the perspective of the seasoned professional and offers advanced context covering the more complex and intricate areas of lean and Six Sigma methodologies.

2018 World Conference on Quality and Improvement30 April – 2 May 2018 | Seattle, WA | http://asq.org/wcqi/

There are more than 100 sessions and workshops for you to choose from. Each session will present real-life applications, solutions, and results based on quality principles, while the workshops allow you to dive deeper into quality theories with hands-on learning activities. The “After 5” sessions will even demonstrate how quality can be translated into social activities.

Conference on Statistical Practice15–17 February 2018| Portland, OR | https://ww2.amstat.org/meetings/

The Conference on Statistical Practice aims to bring together hundreds of statistical practitioners—including data analysts, researchers, and scientists—who engage in the application of statistics to solve real-world problems on a daily basis. The goal of the conference is to provide participants with opportunities to learn new statistical methodologies and best practices in statistical analysis, design, consulting, and statistical programming. The conference also will provide opportunities for attendees to further their career development and strengthen relationships in the statistical community.

Upcoming Conference Calendar

Statistics Digest

38 VOL. 36, NO. 3 | 2017 asq.org/statistics

Page 39: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

Statistics Division Committee Roster 2017OFFICERSCHAIRHerb McGrath [email protected]

CHAIR-ELECTSteve Schuelka [email protected] 219-689-3804

TREASURERMindy Hotchkiss [email protected] 561-882-5331

SECRETARYJennifer Williams [email protected] 209-877-7784

PAST CHAIRTheresa Utlaut [email protected] 503-613-7763

APPOINTEDOperationsOPERATIONS CHAIRJoel Smith [email protected] 814-753-3224

MEMBERSHIP CHAIRGary Gehring [email protected] 306-787-8418

VOICE OF THE CUSTOMER CHAIRJoel Smith [email protected] 814-753-3224

CERTIFICATION CHAIRBrian Sersion [email protected] 513-363-0177

STANDARDS CHAIRMark Johnson [email protected] 407-823-2695

Member DevelopmentMEMBER DEVELOPMENT CHAIRGary Gehring membershipchair@ asqstatdiv.org 306-787-8418

OUTREACH/SPEAKER LIST CHAIRSteve Schuelka [email protected] 219-689-3804

EXAMINING CHAIRDaksha Chokshi [email protected] 561-796-8373

ContentCONTENT CHAIRAmy Ste. Croix [email protected] 850-324-0904

NEWSLETTER EDITORMatthew Barsalou [email protected] +49-152-05421794

WEBINAR COORDINATORAdam Pintar [email protected] 301-975-4554

SOCIAL MEDIA MANAGERBrian Sersion & Joshua Briggs [email protected] 513-363-0177

WEBSITE AND INTERNET LIAISONLandon Jensen [email protected] 801-767-3328

STATISTICS BLOG EDITORGordon Clark [email protected] 614-888-1746

STATISTICS DIGEST REVIEWER AND MEMBERSHIP COMMUNICATIONS COORDINATORAlex Gutman [email protected] 513-622-1822

AwardsAWARDS CHAIRPeter Parker [email protected] 757-864-4709

OTT SCHOLARSHIP CHAIRLynne Hare [email protected] 774-413-5268

FTC STUDENT/ EARLY CAREER GRANTSJennifer Williams [email protected] 209-877-7784

HUNTER AWARD CHAIROpen

NELSON AWARD CHAIROpen

BISGAARD AWARD CHAIROpen

YOUDEN AWARD CHAIRTheresa Utlaut [email protected] 503-613-7763

ConferencesWCQIGary Gehring membershipchair@ asqstatdiv.org 306-787-8418

FTC STEERING COMMITTEEBill Myers [email protected]

FTC PROGRAM REPRESENTATIVEMindy Hotchkiss [email protected] 561-882-5331

FTC SHORT COURSE CHAIRYongtao Cao [email protected] 724-357-4767

Statistics Digest

39 VOL. 36, NO. 3 | 2017 asq.org/statistics

Page 40: THE NEWSLETTER OF THE ASQ STATISTICS DIVISIONasq.org/statistics/2017/10/statistics/statistics-digest-october-2017.pdfOct 19, 2013  · Six Sigma Conference. Next Spring we will also

The ASQ Statistics Division Newsletter is published three times a year by the Statistics Division of the American Society for Quality.

All communications regarding this publication, EXCLUDING CHANGE OF ADDRESS, should be addressed to:

Matthew Barsalou Editor email: [email protected]

Other communications relating to the ASQ Statistics Division should be addressed to:

Richard Herb McGrath Division Chair email: [email protected]

Communications regarding change of address should be sent to ASQ at:

ASQ P.O. Box 3005 Milwaukee, WI 53201-3005

This will change the address for all publications you receive from ASQ. You can also handle this by phone (414) 272-8575 or (800) 248-1946.

Upcoming Newsletter Deadlines for Submissions

Issue Vol. No. Due Date

Month 37 1 December 16

VISIT THE STATISTICS DIVISION WEBSITEwww.asq.org/statistics

ASQ Periodicals with Applied Statistics content

Journal of Quality Technology http://www.asq.org/pub/jqt/

Quality Engineering http://www.asq.org/pub/qe/

Six Sigma Forum http://www.asq.org/pub/sixsigma/

STATISTICS DIVISION RESOURCESLinkedIn Statistics Division Group

https://www.linkedin.com/groups/ASQ-Statistics-Division-2115190

Scan this to visit our LinkedIn group!

Connect now by scanning this QR code with a smartphone (requires free QR app)

Check out our YouTube channel at youtube.com/asqstatsdivision

Original data should be presented in a way that will preserve the evidence in the original data for all the predictions assumed to be useful.Walter A. Shewhart

Statistics Digest

40 VOL. 36, NO. 3 | 2017 asq.org/statistics