Statistical Analysis for the Military Decision Maker (Part I)faculty.nps.edu/rdfricke/NWC Class/Lecture 2-1.pdfInterceptor Body Armor Source: “DoD Testing Requirements for Body Armor”,

1

Statistical Analysis

for the Military Decision Maker (Part I)

Professor Ron Fricker

Naval Postgraduate School

Monterey, California

2

• The major ideas in statistics: – Using samples to infer information about populations

– Understanding sources of error

– Quantifying uncertainty

• Confidence intervals and hypothesis testing– What are they and why are they important?

– What is a "margin of error"?

– What does "statistical significance" mean?

• Linear and other regression modeling– What does it mean to model?

– What are the assumptions?

Goals for this Block

3

Two Roles of Statistics

• Descriptive: Describing a sample or population

– Numerical: (mean, median, standard deviation, mode)

– Graphical: (pie charts, histograms (bar charts), boxplots)

• Inferential: Using a sample to infer facts about

a population

– Estimating (e.g., estimating the average time deployed for an

O-5 SWO)

– Testing theories (e.g., evaluating whether O-5 SWOs with

more deployment are promoted faster)

– Building models (e.g., modeling the relationship between

amount of deployment and promotion rate)

4

A Descriptive Statistics Question: What is the average time deployed for an O-5 SWO (either in the whole Navy or some sample)?

5

An Inferential Question: Given a sample of O-5 SWOs, what is the average time deployed for all O-5 SWOs in the Navy?

6

Inferential Statistics

• Point & Interval Estimation

– E.g., confidence intervals

• Hypotheses Testing

– Testing sample means

and variances

• Model Building

– Linear regression

– Other types of models

It’s all about inferring from a sample to a population

or trying to understand/quantify relationships

7

• The major ideas in statistics:

– Using samples to infer information about

populations

– Understanding sources of error

– Quantifying uncertainty

• Confidence intervals and hypothesis testing

– What are they and why are they important?

– What is a "margin of error"?

– What does "statistical significance" mean?

Goals for this Lecture

8

Samples versus Populations

• A population consists of all possible

observations

– Example: All military officers currently enrolled in a

Naval War College class

– When data is collected on the whole population, it

is a called a census

• A sample is a subset of the population

– Example: Students in this class are a sample of all

officers currently enrolled in a Naval War College

class

9

Select somehow

from population

Population Sample

All military officers currently enrolled in a Naval War College

class

Some military officers currently enrolled in a Naval War College

class

Samples versus Populations

10

Types of Samples

• Convenience sample: Individuals in the

population decide to join the sample

– 900 number and other call-in polls

– Internet and e-mail surveys (usually)

– Shopper and visitor surveys

• Random sample: Individuals or units are

chosen randomly from the population

– Whether or not part of the sample is not

individual’s choice/decision

11

Types of Random Sampling

• Simple random sample (SRS): any two samples of the same size are equally likely to be selected

• Some other possible random sampling methods: – Stratified sampling

• Divide population into nonoverlapping, homogeneous groups and then draw a SRS from each group

– Cluster sampling

• Data naturally occurs in clusters

• Use SRS to select clusters

• E.g., why conduct

– Operational test and

evaluation on a sample of

prototypes

– Nielson survey of the

viewing habits of a sample of

US television viewers

– Clinical trial of how a drug

affects a sample of

individuals in the trial

12

Collecting data for whole populations can be expensive

and/or impossible

Why Sample?

• Rather than:

– Assess how each and every

piece of military equipment

performs

– Evaluate the TV viewing

preferences for every

individual in the US

– Test how a drug affects

each existing and future

member of a population

Point and Interval Estimation

• Want to use a sample to infer something

about a larger population

• Point estimates use a single number to infer

the population quantity of interest

– It’s our “best guess” of the population quantity

• Interval estimates use an interval to infer the

population quantity of interest

– “Confidence intervals” do this in such a way that:

• We can know how precise our estimates are, and

• We can define the probability we are right

13

Illustrating Point Estimation

• Consider the problem of estimating the

probability a new type of body armor will be

penetrated by a particular ballistic threat

• After testing, logical point estimate is the

number of armor plates penetrated divided by

the total number tested

– Denote this point estimate as

• Is there an issue with this approach?

14

p̂

Yes, there is no indication of the uncertainty in

the point estimate: clearly it’s not exactly right,

so how far off from the real value?

15

One Solution: Confidence Intervals

• A confidence interval shows how variable the

sample statistic is

– Narrow: real (population) value unlikely to be far

away

– Wide: little information about the population value

• Two CIs for the hypothetical example:

Confidence interval #1

Confidence interval #2

0 10.5Observed fraction of body

armor penetrated ( )p̂

16

What Does “Margin of Error” Mean?

• Margin of error is just half the width of a 95 percent confidence interval– In this hypothetical example it’s

• Common survey terminology– Convention is a 3% margin of error

– Means a 95% CI is the survey result +/- 3%

• To achieve a desired margin of error, must have the right sample size (n)– Power calculations are done by statisticians to

figure out the required sample size to achieve a particular margin of error

ˆ ˆ2 (1 ) /p p n

17

What is a Hypothesis Test?

• Rather than calculate a CI, we can do a

hypothesis test to test whether p>0.95 or not

– That is, we do a test:

• After n ballistic tests we get h penetrations,

• compare to 0.95 – close or far away?

– Hard part: determining what “close” and “far away”

mean

• Far too much in this area to cover here

– Lots of terminology, mathematical detail, etc.

ˆ /p h n

18

What is “Statistical Significance”?

• Statistical significance says the observed result was unlikely to have occurred by chance alone – so something else must account for the result besides chance– It is a probability statement, which means there is also some

(usually small) possibility it was only chance

• There are always assumptions that must be made that determine what “chance” means

• And always remember:– With a big enough sample, almost anything can be made

statistically significant

– Just because something is statistically significant does not mean it has any practical significance

19

Case #1: 1970 Military Draft Lottery

• December 1, 1969 lottery held to determine

the order of the military draft

– Idea: randomly choose birth days (1/1-12/31)

– Order chosen is order drafted

– E.g., September 14th chosen first, so all men born

that day (between 1944 and 1950) were first to be

inducted

• After lottery conducted, controversy arose

about whether it was truly random

20

Results of the Lottery

21

Random?

22

Hmmm…How to Test This?

23

One Possible Hypothesis Test

• Split the year in half and do a two-sample test for the

mean selection number:

• Results are statistically significant

Birth days in the second half of the year more likely to have

lower draft numbers than birth days in the first half of the year

• However…

N Mean Std. Dev. Std. Err.

JAN-JUN 182 206.32 106.144 7.868

JUL-DEC 184 160.92 100.757 7.428

Mean Diff. Std. Err. t df p-value lower 95% upper 95%

45.40 10.817 4.197 364.00 0.000 24.13 66.67

Descriptive Statistics

t-Test Analysis

24

Is this Appropriate?

• How to justify comparing the first six months against the second six months?– That is, how to defend against allegation of

“fishing” through the data to find “significance”?

– After all, we may have chosen this comparison after looking at the side-by-side box plot first

• Remember, one should generate a hypothesis before looking at the data

25

More Advanced Analysis: ANOVA

• ANalysis Of VAriance is one possible analytical solution

• ANOVA is hypothesis test of the equality of many means– Months are a natural break, not researcher defined

– Are hypotheses as desirable?

• Output from Stata:

26

Outcomes

• The Draft lottery was indeed not random

• Subsequent review of the methodology showed that the balls were not properly mixed allowing birth dates later in the year to be more easily chosen

• On January 4, 1970, the New York Times ran a long article, "Statisticians Charge Draft Lottery Was Not Random" with a bar chart of the monthly averages

• The method for randomizing later lotteries was subsequently modified to ensure randomization

Photo of the first capsule being drawn by

Congressman Alexander Pirnie (R-NY) of

the House Armed Services Committee

27

Take-Aways

• Bias is not just a problem in surveying

• It can (subtly) creep into data collection and analysis

in many ways

• Must always be on guard:

– How could the data collection have gone wrong?

– Does the sample look unrepresentative in some way?

– Was a particular analysis undertaken with a preconceived

result in mind?

– Do the result just look funny in some way and, if so, what is

the explanation?

• Data fishing – beware!

• Armor manufactured from

various materials has been

used throughout recorded

history

– Animal skins → fabrics →

wood → metal → advanced

materials

• US forces wear body armor

for ballistic protection from

– Penetration of projectiles

– Blunt force trauma of impact

Case #2: Testing Body Armor

28

• Kevlar and ceramic materials

used in modern armor systems

– Lighter than traditional metallic

alloy-based armor

– Ceramics have superior

hardness, low density, and

high compressive strength

• Typical insert (“plate”)

– Consists of a layer of dense

boron carbide or silicon

carbide backed by a layer of

metal or polymer composite

– Entire plate wrapped in tightly

woven ballistic fabric

– Plate breaks up an incoming

projectile and dissipates its

kinetic energy

Modern Body Armor

29

Interceptor Body Armor

30Source: “DoD Testing Requirements for Body Armor”, Inspector General, United States Department

of Defense, Report No. D-2009-047, January 29, 2009.

USSOCOM Body Armor

31Source: “DoD Testing Requirements for Body Armor”, Inspector General, United States Department

of Defense, Report No. D-2009-047, January 29, 2009.

Sources: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army, The National Academies,

Dec. 30, 2009 & https://peosoldier.army.mil/factsheets/SEQ_SSV_IBA.pdf (accessed 7/26/10).

• Program Executive Officer –

Soldier: “…there have been

no known soldier deaths due

to small arms that were

attributable to a failure of the

issued ceramic body armor”

• Ceramic materials preferred

because they are relatively

light compared to traditional

armor made of metallic alloys– However, all effective body armor systems currently add a significant

burden of weight on the soldier

– Interceptor body armor (size medium) w/ all protective plates ~ 33 lbs.

Current Body Armor is Effective

32

150 lbs. of lightweight gear

• Before awarding contracts

to buy body armor, DoD

conducts “first article

testing” or FAT

• Goal is to determine

whether product meets

purchase specifications

• For body armor, it is a

destructive ballistic test

– I.e., representative armor is

shot at under various

conditions

Body Armor Testing, In Brief

33Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.

Army, The National Academies, April 22, 2010. 33

Clay as Recording Medium


Army, The National Academies, April 22, 2010.

• Test consists of mounting

“shoot pack” on clay backing

• Use of clay based on Prather

et al. (1977) study which

found clay measurements

could be “correlated to tissue

response for use in

characterizing both the

penetration and deformation

effects of ballistic impacts on

soft body armor materials.”

• Changes in clay formulation

over time have resulted in

extensive effort to try to

maintain test clay consistency

34

Test Metrics



• Penetration

– Resistance to projectiles fired

at a constant velocity

– May be partial (plate, Kevlar)

or complete (bullet or bullet

fragments into clay backing)

• Back face deformation (BFD)

– BFD is the depth of the crater

left in the clay after impact

– Surrogate measure for blunt

force trauma

35

• Total of 27 plates tested:

– 1 plate against threats “A,” “B,” and “C” and 3

plates against threat “D” in ambient conditions

– 1 plate for each of nine environmental conditions

– Also, 12 plates for “V50” tests

• Passing standards

– For threats “A,” “B,” and “C,” no penetration

allowed and BFD less than 48 mm

– For threat “D,” a point system was used to score

shots based on penetration and BFD

• An accumulation of six or few points was passing

Original Army FAT Protocol


Army, The National Academies, April 22, 2010. 36

• Testing protocols differ across DoD

• Army protocol not statistically based

– DoD IG: “standardization of body armor testing and

acceptance will ensure that Service members receive body

armor that has been rigorously tested and will provide

uniform protection in the battlefield”1

• Clay-based testing:

– Clay formulation has changed over time, resulting in a

formulation that is temperature sensitive

• How much variation in test results attributable to variation in test

conditions and how much due to plate variation unknown

– Scientific connection between clay test results and protection

of human beings tenuous

(Some) Body Armor Testing Issues

371 “DoD Testing Requirements for Body Armor”, Inspector General, United States Department of

Defense, Report No. D-2009-047, January 29, 2009. 37

NAS Committee

• Three-phase study

– Phase 1: Completed 30 December 2009

– Phase 2: Competed 22 April 2010

– Phase 3: Final report being completed now

• First two phases conducted as intense four-

day meetings

– Days 1 and 2, briefings and site visits

– Days 3 and 4, draft committee letter report

• Chaired by retired Army Major General with

10-11 members (engineers and statisticians)

38

• DOT&E tasked the committee to:

– “…comment on the validity of using laser

profilometry/laser interferometry techniques to

determine the contours of an indent made by a

ballistic test in a non-transparent clay material at

the level of precision established in the Army’s

procedures for testing personal body armor.”

– “…provide interim observations regarding the

column-drop performance test described by the

Army for assessing the part-to-part consistency of

a clay body used in testing body armor.”

Phase I

39Source: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army,

The National Academies, December 30, 2009.

• Digital caliper used to measure BFD has

several shortcomings, including

– If deepest location in the clay indent is displaced

from the aim point, must estimate original clay

surface at the impact point

– Caliper subject to operator judgment because one

must measure a soft, deformable surface by

barely touching and yet not disturbing the clay

• Standard error for measuring etched metal

gage block on order of 0.1 mm; for BFD in

soft clay medium on the order of 1 mm

Digital Caliper



• Laser used to take three

dimensional measure of

clay surface before and

after test

– Differences of two surfaces

used to measure BFD

• Benefits:

– Does not require contact

with clay

– Measurements collected

over whole surface

• However, system more complicated and costly

Laser Profilometry/Interferometry



• “The digital caliper is adequate for

measurements of displacements created in

clay by the column-drop performance test…”

• “Surface profilometry by a laser… is a valid

approach for determining the contours of an

indent in a nontransparent clay material at a

level of precision adequate for the Army’s

current ballistic testing of body armor.”

Phase I Recommendations



• DOT&E tasking

– “In Phase II, the committee will consider in greater detail

[than in Phase I] the validity of using the column drop

performance test described by the Army for assessing the

part-to-part consistency of a clay body within the level of

precision that is identified by the Army test procedures.”

– “The final report will document the committee’s findings

pertaining to…the appropriate use of statistical techniques

(e.g., rounding numbers, choosing sample sizes, or test

designs) in gathering the data.”

Phase II



Proposed New FAT Specifications

• 60 plates tested spread over a combination of plate

sizes, environmental conditions, and shot order

• Passing standards:

– Penetration:

• One-sided 90 percent lower confidence bound for the

probability of complete system penetration is greater than

0.9 (first shot) and greater than 0.8 (second shot)

– BFD:

• First shot: one-sided 90% upper tolerance limit for BFD

must be less than 44.0 mm with 90 percent confidence

• Second shot: one-sided 80% upper tolerance limit for

BFD must be less than 44.0 mm with 90 percent

confidence



45

Statistical Protocol Allows Explicit

Risk Trade-Offs To Be Made

Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.


Variation Introduced by

Test Protocol Unknown



• “Column drop” test used to test clay for consistency prior

to ballistic testing

– Clay heated until indentation depth of weight dropped into clay

meets standard

– Indentations from 3 drops must all be within 25 mm ± 3 mm

• Yet clay performance may still vary substantially due to

temperature and

other factors

• How much variation

this introduces into

ballistic test results

unknown

47

Effect(s) of New Protocol Standards

on Manufacturers Unknown

Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.


• “…expedite the research necessary both to quantify the medical

results of blunt force trauma on tissue and to use those results as

the updated mathematical underpinnings of the back face

deformation (BFD) body armor testing methodology.”

• “The Army should develop ballistic testing performance

specifications and properties that will lead to a short-term,

standard replacement for the current Roma Plastilina #1 oil-based

modeling clay.”

• “Since oil-based modeling clay is time and temperature sensitive,

a post-drop calibration test is needed to validate that the clay

remains within specification at the end of a body armor test.”

Some Phase II Clay-Related

Recommendations



• “The committee unequivocally supports the concept of a

statistically based test protocol…”

• “…the Army should quickly develop and experiment with a gas

gun calibrator, or equivalent device…to estimate as accurately as

possible the variation of back face deformation measurements

both within a given box and between boxes, under realistic testing

conditions using existing test protocols.”

• “…the results of the experiments and analyses proposed in this

report, should be used as due diligence to carefully and

completely assess the effects, large and small, of the proposed

statistically based protocol before it is formally adopted across the

body armor testing community.”

Phase II Recommendations Related

to Statistical Methodology



• DOT&E has tasked the committee to:

– Develop ideas for revising/replacing the Prather

study methodology

– Provide a roadmap to reduce variability of clay

processes and how to migrate from clay to future

solutions

– Within the time and funding available, review and

comment on methodologies and technical

approaches to military helmet testing

Phase III

50

Take-Aways

• Critical that testing methodology be

statistically-based

– Decision makers need to understand the

uncertainty in the data

– Allows them to make explicit risk trade-offs

• Sometimes the statistics driven by

organizational and other issues

• Critical to base test criteria and procedures

on solid science

51

52

Case #3: WMD Preparedness

• Survey fielded to solicit local responder

opinions of Federal WMD preparedness

programs

• Purpose was to provide:

– Wide local responder input into Congressionally-

mandated Panel deliberation process

– Independent evaluation/confirmation of Panel’s

recommendations

• Fielded from March to early September

– Those few received after 9/11 have been set aside

53

Complex Set Of 10 Surveys Designed,

Fielded, And Analyzed In About One Year

Fin

al P

anel

Re

port

1st Quarter

FY2001

DRAFT

• Instrument construction

• Sample selection

• Pretesting

• Materials preparation

• Respondent pre-contact

2nd Quarter

FY2001

3rd Quarter

FY2001

4th Quarter

FY2001

ANALYZE

• Data coded and entered

• Non-response evaluated

• Survey weights constructed

• Analyses

1st Quarter

FY2002

REPORT

• Reported results to Panel

• Conducted additional analyses as required

• Write final report

FIELD

• Surveys distributed

• Conducted intensive follow-up

• Evaluated response patterns

Fie

ldin

g p

rep

ara

tio

ns

54

Survey Designed To Elicit Detailed

Information About Local Responders

Section 1: Organizational Information Relevant organizational demographics

Section 2: Organizational Experience & Threat Perceptions Expectation of a terrorist WMD incident in the next 5 years

Organizational experience with actual incidents and hoaxes

Section 3: Emergency Response Planning Activities Organizational participation in emergency response planning

Description of emergency response plans and exercises conducted

Section 4: Responding to Specific WMD Terrorist Incidents Measures of preparedness for WMD incidents:

(1) Conventional explosives; (2) Chemical; (3) Biological; (4) Radiological

Section 5: Assessment of Federal Programs Application and/or receipt of Federal WMD support

Assessment of various Federal WMD programs, offices, exercises

55

Surveyed Ten Specific Types Of

State and Local Responder Organizations

• LOCAL (CITY/COUNTY)

– Law enforcement

– Fire departments

• Combination

• Paid

• Volunteer

– Hospitals

– EMS

– Public health depts

– OEMs

• REGIONAL

– EMS

• STATE*

– EMS

– Public health depts

– OEMs

* Also, Washington, D.C. sent state-level

surveys and state OEM and public health

surveys sent to US territories: Puerto Rico,

Guam, Virgin Islands, and Northern

Mariana's Islands

56

Sampling Strategy Designed To Obtain

Statistically Valid National Sample

• Randomly selected 200 counties in U.S.– Greater chance of choosing counties with larger

populations

– Also added in large organizations in “sensitized” counties

• Randomly sampled one of each type of local responder agency within each county

• Also took:– All regional EMS organizations containing one of

the 200 counties

– Census of state agencies

57

Selected Counties Representative

Of The Entire United States

Notes:

(1) Some counties

too small to see

(2) Hawaii and

Alaska not

depicted

Randomly sampled counties

“Sensitized” counties that were

randomly sampled

Remaining “sensitized” counties

58

A Detailed Survey Fielding Process

Used To Maximize Response Rates

• Population identification and sample selection

• Advance letter and postcard– Follow-up call to those not returning postcard

• Survey package– Cover letter from Governor Gilmore

– Commemorative coin incentive

• Thank you/reminder– Each respondent organization called

• Nonresponse follow-up– Second survey mailing to nonrespondents

– Second round of follow-up telephone calls

– Continued individualized follow-up

59* Does not count 4 surveys sent to US territories

Survey Achieved Excellent

Response Rates

1,679 66%Total/Overall rate

52%155EMS

69%51EMS

74%199Public Health

78%51*OEM

68%443Fire Department

As ofTotalSent

82%51*Public Health

111

208

202

208

48%Regional EMS

50%Hospital

71%OEM

71%Law enforcement

9/11/01Organization

60

Percent with WMD Plan Sufficient

to Address the Given Scenario

0

10

20

30

40

50

60

70

80

90

100

Fire EMS Law Public

Health

OEM Hospital

Local Organizations

Perc

en

t

Chemical

Biological

61

Percent with WMD Plan Sufficient

to Address the Given Scenario

0

10

20

30

40

50

60

70

80

90

100

EMS OEM Public Health

State Organizations

Perc

en

t

Chemical

Biological

62

Percent Exercising WMD Plan for

the Particular Type of Incident

0

20

40

60

80

100

Fire Law EMS Public

Health

OEM Hospital

Local Response Organizations

Perc

en

t

Chemical

Biological

63

Take-Aways

• Conducting censuses are frequently

infeasible/impossible, but even if possible, a good

sample can provide as good or better information

• Appropriately sampling provides some assurance

that the results are representative

• There is a rich literature about how to maximize

survey response rates – use it

• Even with survey results in Excel graphs it is possible

to show the uncertainty in a point estimate

• Per yesterday’s lecture: original data collection takes

time and effort

64

Some Briefing Questions

• How did you get your sample of data?

– Is it representative of the population? Why or why not? How

do you know?

– If not collected as a random sample: How can you even

know whether the sample is representative?

• What sorts of biases may be present in the data due

to the data collection methodology?

• If only presenting point estimates: What are the

standard errors (or the margin of error) of your

estimates?

• If making claims about results: Are your results

statistically significant?

65

Some people hate the very name of statistics,

but I find them full of beauty and interest.

Whenever they are not brutalized, but delicately

handled by the higher methods, and are warily

interpreted, their power of dealing with

complicated phenomena is extraordinary. They

are the only tools by which an opening can be

cut through the formidable thicket of difficulties

that bars the path of those who pursue the

Science of man.

– Francis Galton

Documents

Statistical Analysis for the Military Decision Maker (Part I)faculty.nps.edu/rdfricke/NWC Class/Lecture 2-1.pdfInterceptor Body Armor Source: “DoD Testing Requirements for Body Armor”,