Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
1
Statistical Analysis
for the Military Decision Maker (Part I)
Professor Ron Fricker
Naval Postgraduate School
Monterey, California
2
• The major ideas in statistics: – Using samples to infer information about populations
– Understanding sources of error
– Quantifying uncertainty
• Confidence intervals and hypothesis testing– What are they and why are they important?
– What is a "margin of error"?
– What does "statistical significance" mean?
• Linear and other regression modeling– What does it mean to model?
– What are the assumptions?
Goals for this Block
3
Two Roles of Statistics
• Descriptive: Describing a sample or population
– Numerical: (mean, median, standard deviation, mode)
– Graphical: (pie charts, histograms (bar charts), boxplots)
• Inferential: Using a sample to infer facts about
a population
– Estimating (e.g., estimating the average time deployed for an
O-5 SWO)
– Testing theories (e.g., evaluating whether O-5 SWOs with
more deployment are promoted faster)
– Building models (e.g., modeling the relationship between
amount of deployment and promotion rate)
4
A Descriptive Statistics Question: What is the average time deployed for an O-5 SWO (either in the whole Navy or some sample)?
5
An Inferential Question: Given a sample of O-5 SWOs, what is the average time deployed for all O-5 SWOs in the Navy?
6
Inferential Statistics
• Point & Interval Estimation
– E.g., confidence intervals
• Hypotheses Testing
– Testing sample means
and variances
• Model Building
– Linear regression
– Other types of models
It’s all about inferring from a sample to a population
or trying to understand/quantify relationships
7
• The major ideas in statistics:
– Using samples to infer information about
populations
– Understanding sources of error
– Quantifying uncertainty
• Confidence intervals and hypothesis testing
– What are they and why are they important?
– What is a "margin of error"?
– What does "statistical significance" mean?
Goals for this Lecture
8
Samples versus Populations
• A population consists of all possible
observations
– Example: All military officers currently enrolled in a
Naval War College class
– When data is collected on the whole population, it
is a called a census
• A sample is a subset of the population
– Example: Students in this class are a sample of all
officers currently enrolled in a Naval War College
class
9
Select somehow
from population
Population Sample
All military officers currently enrolled in a Naval War College
class
Some military officers currently enrolled in a Naval War College
class
Samples versus Populations
10
Types of Samples
• Convenience sample: Individuals in the
population decide to join the sample
– 900 number and other call-in polls
– Internet and e-mail surveys (usually)
– Shopper and visitor surveys
• Random sample: Individuals or units are
chosen randomly from the population
– Whether or not part of the sample is not
individual’s choice/decision
11
Types of Random Sampling
• Simple random sample (SRS): any two samples of the same size are equally likely to be selected
• Some other possible random sampling methods: – Stratified sampling
• Divide population into nonoverlapping, homogeneous groups and then draw a SRS from each group
– Cluster sampling
• Data naturally occurs in clusters
• Use SRS to select clusters
• E.g., why conduct
– Operational test and
evaluation on a sample of
prototypes
– Nielson survey of the
viewing habits of a sample of
US television viewers
– Clinical trial of how a drug
affects a sample of
individuals in the trial
12
Collecting data for whole populations can be expensive
and/or impossible
Why Sample?
• Rather than:
– Assess how each and every
piece of military equipment
performs
– Evaluate the TV viewing
preferences for every
individual in the US
– Test how a drug affects
each existing and future
member of a population
Point and Interval Estimation
• Want to use a sample to infer something
about a larger population
• Point estimates use a single number to infer
the population quantity of interest
– It’s our “best guess” of the population quantity
• Interval estimates use an interval to infer the
population quantity of interest
– “Confidence intervals” do this in such a way that:
• We can know how precise our estimates are, and
• We can define the probability we are right
13
Illustrating Point Estimation
• Consider the problem of estimating the
probability a new type of body armor will be
penetrated by a particular ballistic threat
• After testing, logical point estimate is the
number of armor plates penetrated divided by
the total number tested
– Denote this point estimate as
• Is there an issue with this approach?
14
p̂
Yes, there is no indication of the uncertainty in
the point estimate: clearly it’s not exactly right,
so how far off from the real value?
15
One Solution: Confidence Intervals
• A confidence interval shows how variable the
sample statistic is
– Narrow: real (population) value unlikely to be far
away
– Wide: little information about the population value
• Two CIs for the hypothetical example:
Confidence interval #1
Confidence interval #2
0 10.5Observed fraction of body
armor penetrated ( )p̂
16
What Does “Margin of Error” Mean?
• Margin of error is just half the width of a 95 percent confidence interval– In this hypothetical example it’s
• Common survey terminology– Convention is a 3% margin of error
– Means a 95% CI is the survey result +/- 3%
• To achieve a desired margin of error, must have the right sample size (n)– Power calculations are done by statisticians to
figure out the required sample size to achieve a particular margin of error
ˆ ˆ2 (1 ) /p p n
17
What is a Hypothesis Test?
• Rather than calculate a CI, we can do a
hypothesis test to test whether p>0.95 or not
– That is, we do a test:
• After n ballistic tests we get h penetrations,
• compare to 0.95 – close or far away?
– Hard part: determining what “close” and “far away”
mean
• Far too much in this area to cover here
– Lots of terminology, mathematical detail, etc.
ˆ /p h n
18
What is “Statistical Significance”?
• Statistical significance says the observed result was unlikely to have occurred by chance alone – so something else must account for the result besides chance– It is a probability statement, which means there is also some
(usually small) possibility it was only chance
• There are always assumptions that must be made that determine what “chance” means
• And always remember:– With a big enough sample, almost anything can be made
statistically significant
– Just because something is statistically significant does not mean it has any practical significance
19
Case #1: 1970 Military Draft Lottery
• December 1, 1969 lottery held to determine
the order of the military draft
– Idea: randomly choose birth days (1/1-12/31)
– Order chosen is order drafted
– E.g., September 14th chosen first, so all men born
that day (between 1944 and 1950) were first to be
inducted
• After lottery conducted, controversy arose
about whether it was truly random
20
Results of the Lottery
21
Random?
22
Hmmm…How to Test This?
23
One Possible Hypothesis Test
• Split the year in half and do a two-sample test for the
mean selection number:
• Results are statistically significant
Birth days in the second half of the year more likely to have
lower draft numbers than birth days in the first half of the year
• However…
N Mean Std. Dev. Std. Err.
JAN-JUN 182 206.32 106.144 7.868
JUL-DEC 184 160.92 100.757 7.428
Mean Diff. Std. Err. t df p-value lower 95% upper 95%
45.40 10.817 4.197 364.00 0.000 24.13 66.67
Descriptive Statistics
t-Test Analysis
24
Is this Appropriate?
• How to justify comparing the first six months against the second six months?– That is, how to defend against allegation of
“fishing” through the data to find “significance”?
– After all, we may have chosen this comparison after looking at the side-by-side box plot first
• Remember, one should generate a hypothesis before looking at the data
25
More Advanced Analysis: ANOVA
• ANalysis Of VAriance is one possible analytical solution
• ANOVA is hypothesis test of the equality of many means– Months are a natural break, not researcher defined
– Are hypotheses as desirable?
• Output from Stata:
26
Outcomes
• The Draft lottery was indeed not random
• Subsequent review of the methodology showed that the balls were not properly mixed allowing birth dates later in the year to be more easily chosen
• On January 4, 1970, the New York Times ran a long article, "Statisticians Charge Draft Lottery Was Not Random" with a bar chart of the monthly averages
• The method for randomizing later lotteries was subsequently modified to ensure randomization
Photo of the first capsule being drawn by
Congressman Alexander Pirnie (R-NY) of
the House Armed Services Committee
27
Take-Aways
• Bias is not just a problem in surveying
• It can (subtly) creep into data collection and analysis
in many ways
• Must always be on guard:
– How could the data collection have gone wrong?
– Does the sample look unrepresentative in some way?
– Was a particular analysis undertaken with a preconceived
result in mind?
– Do the result just look funny in some way and, if so, what is
the explanation?
• Data fishing – beware!
• Armor manufactured from
various materials has been
used throughout recorded
history
– Animal skins → fabrics →
wood → metal → advanced
materials
• US forces wear body armor
for ballistic protection from
– Penetration of projectiles
– Blunt force trauma of impact
Case #2: Testing Body Armor
28
• Kevlar and ceramic materials
used in modern armor systems
– Lighter than traditional metallic
alloy-based armor
– Ceramics have superior
hardness, low density, and
high compressive strength
• Typical insert (“plate”)
– Consists of a layer of dense
boron carbide or silicon
carbide backed by a layer of
metal or polymer composite
– Entire plate wrapped in tightly
woven ballistic fabric
– Plate breaks up an incoming
projectile and dissipates its
kinetic energy
Modern Body Armor
29
Interceptor Body Armor
30Source: “DoD Testing Requirements for Body Armor”, Inspector General, United States Department
of Defense, Report No. D-2009-047, January 29, 2009.
USSOCOM Body Armor
31Source: “DoD Testing Requirements for Body Armor”, Inspector General, United States Department
of Defense, Report No. D-2009-047, January 29, 2009.
Sources: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army, The National Academies,
Dec. 30, 2009 & https://peosoldier.army.mil/factsheets/SEQ_SSV_IBA.pdf (accessed 7/26/10).
• Program Executive Officer –
Soldier: “…there have been
no known soldier deaths due
to small arms that were
attributable to a failure of the
issued ceramic body armor”
• Ceramic materials preferred
because they are relatively
light compared to traditional
armor made of metallic alloys– However, all effective body armor systems currently add a significant
burden of weight on the soldier
– Interceptor body armor (size medium) w/ all protective plates ~ 33 lbs.
Current Body Armor is Effective
32
150 lbs. of lightweight gear
• Before awarding contracts
to buy body armor, DoD
conducts “first article
testing” or FAT
• Goal is to determine
whether product meets
purchase specifications
• For body armor, it is a
destructive ballistic test
– I.e., representative armor is
shot at under various
conditions
Body Armor Testing, In Brief
33Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010. 33
Clay as Recording Medium
34Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
• Test consists of mounting
“shoot pack” on clay backing
• Use of clay based on Prather
et al. (1977) study which
found clay measurements
could be “correlated to tissue
response for use in
characterizing both the
penetration and deformation
effects of ballistic impacts on
soft body armor materials.”
• Changes in clay formulation
over time have resulted in
extensive effort to try to
maintain test clay consistency
34
Test Metrics
35Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
• Penetration
– Resistance to projectiles fired
at a constant velocity
– May be partial (plate, Kevlar)
or complete (bullet or bullet
fragments into clay backing)
• Back face deformation (BFD)
– BFD is the depth of the crater
left in the clay after impact
– Surrogate measure for blunt
force trauma
35
• Total of 27 plates tested:
– 1 plate against threats “A,” “B,” and “C” and 3
plates against threat “D” in ambient conditions
– 1 plate for each of nine environmental conditions
– Also, 12 plates for “V50” tests
• Passing standards
– For threats “A,” “B,” and “C,” no penetration
allowed and BFD less than 48 mm
– For threat “D,” a point system was used to score
shots based on penetration and BFD
• An accumulation of six or few points was passing
Original Army FAT Protocol
36Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010. 36
• Testing protocols differ across DoD
• Army protocol not statistically based
– DoD IG: “standardization of body armor testing and
acceptance will ensure that Service members receive body
armor that has been rigorously tested and will provide
uniform protection in the battlefield”1
• Clay-based testing:
– Clay formulation has changed over time, resulting in a
formulation that is temperature sensitive
• How much variation in test results attributable to variation in test
conditions and how much due to plate variation unknown
– Scientific connection between clay test results and protection
of human beings tenuous
(Some) Body Armor Testing Issues
371 “DoD Testing Requirements for Body Armor”, Inspector General, United States Department of
Defense, Report No. D-2009-047, January 29, 2009. 37
NAS Committee
• Three-phase study
– Phase 1: Completed 30 December 2009
– Phase 2: Competed 22 April 2010
– Phase 3: Final report being completed now
• First two phases conducted as intense four-
day meetings
– Days 1 and 2, briefings and site visits
– Days 3 and 4, draft committee letter report
• Chaired by retired Army Major General with
10-11 members (engineers and statisticians)
38
• DOT&E tasked the committee to:
– “…comment on the validity of using laser
profilometry/laser interferometry techniques to
determine the contours of an indent made by a
ballistic test in a non-transparent clay material at
the level of precision established in the Army’s
procedures for testing personal body armor.”
– “…provide interim observations regarding the
column-drop performance test described by the
Army for assessing the part-to-part consistency of
a clay body used in testing body armor.”
Phase I
39Source: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army,
The National Academies, December 30, 2009.
• Digital caliper used to measure BFD has
several shortcomings, including
– If deepest location in the clay indent is displaced
from the aim point, must estimate original clay
surface at the impact point
– Caliper subject to operator judgment because one
must measure a soft, deformable surface by
barely touching and yet not disturbing the clay
• Standard error for measuring etched metal
gage block on order of 0.1 mm; for BFD in
soft clay medium on the order of 1 mm
Digital Caliper
40Source: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army,
The National Academies, December 30, 2009.
• Laser used to take three
dimensional measure of
clay surface before and
after test
– Differences of two surfaces
used to measure BFD
• Benefits:
– Does not require contact
with clay
– Measurements collected
over whole surface
• However, system more complicated and costly
Laser Profilometry/Interferometry
41Source: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army,
The National Academies, December 30, 2009.
• “The digital caliper is adequate for
measurements of displacements created in
clay by the column-drop performance test…”
• “Surface profilometry by a laser… is a valid
approach for determining the contours of an
indent in a nontransparent clay material at a
level of precision adequate for the Army’s
current ballistic testing of body armor.”
Phase I Recommendations
42Source: Phase I Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army,
The National Academies, December 30, 2009.
• DOT&E tasking
– “In Phase II, the committee will consider in greater detail
[than in Phase I] the validity of using the column drop
performance test described by the Army for assessing the
part-to-part consistency of a clay body within the level of
precision that is identified by the Army test procedures.”
– “The final report will document the committee’s findings
pertaining to…the appropriate use of statistical techniques
(e.g., rounding numbers, choosing sample sizes, or test
designs) in gathering the data.”
Phase II
43Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
Proposed New FAT Specifications
• 60 plates tested spread over a combination of plate
sizes, environmental conditions, and shot order
• Passing standards:
– Penetration:
• One-sided 90 percent lower confidence bound for the
probability of complete system penetration is greater than
0.9 (first shot) and greater than 0.8 (second shot)
– BFD:
• First shot: one-sided 90% upper tolerance limit for BFD
must be less than 44.0 mm with 90 percent confidence
• Second shot: one-sided 80% upper tolerance limit for
BFD must be less than 44.0 mm with 90 percent
confidence
44Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
45
Statistical Protocol Allows Explicit
Risk Trade-Offs To Be Made
Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
Variation Introduced by
Test Protocol Unknown
46Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
• “Column drop” test used to test clay for consistency prior
to ballistic testing
– Clay heated until indentation depth of weight dropped into clay
meets standard
– Indentations from 3 drops must all be within 25 mm ± 3 mm
• Yet clay performance may still vary substantially due to
temperature and
other factors
• How much variation
this introduces into
ballistic test results
unknown
47
Effect(s) of New Protocol Standards
on Manufacturers Unknown
Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
• “…expedite the research necessary both to quantify the medical
results of blunt force trauma on tissue and to use those results as
the updated mathematical underpinnings of the back face
deformation (BFD) body armor testing methodology.”
• “The Army should develop ballistic testing performance
specifications and properties that will lead to a short-term,
standard replacement for the current Roma Plastilina #1 oil-based
modeling clay.”
• “Since oil-based modeling clay is time and temperature sensitive,
a post-drop calibration test is needed to validate that the clay
remains within specification at the end of a body armor test.”
Some Phase II Clay-Related
Recommendations
48Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
• “The committee unequivocally supports the concept of a
statistically based test protocol…”
• “…the Army should quickly develop and experiment with a gas
gun calibrator, or equivalent device…to estimate as accurately as
possible the variation of back face deformation measurements
both within a given box and between boxes, under realistic testing
conditions using existing test protocols.”
• “…the results of the experiments and analyses proposed in this
report, should be used as due diligence to carefully and
completely assess the effects, large and small, of the proposed
statistically based protocol before it is formally adopted across the
body armor testing community.”
Phase II Recommendations Related
to Statistical Methodology
49Source: Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S.
Army, The National Academies, April 22, 2010.
• DOT&E has tasked the committee to:
– Develop ideas for revising/replacing the Prather
study methodology
– Provide a roadmap to reduce variability of clay
processes and how to migrate from clay to future
solutions
– Within the time and funding available, review and
comment on methodologies and technical
approaches to military helmet testing
Phase III
50
Take-Aways
• Critical that testing methodology be
statistically-based
– Decision makers need to understand the
uncertainty in the data
– Allows them to make explicit risk trade-offs
• Sometimes the statistics driven by
organizational and other issues
• Critical to base test criteria and procedures
on solid science
51
52
Case #3: WMD Preparedness
• Survey fielded to solicit local responder
opinions of Federal WMD preparedness
programs
• Purpose was to provide:
– Wide local responder input into Congressionally-
mandated Panel deliberation process
– Independent evaluation/confirmation of Panel’s
recommendations
• Fielded from March to early September
– Those few received after 9/11 have been set aside
53
Complex Set Of 10 Surveys Designed,
Fielded, And Analyzed In About One Year
Fin
al P
anel
Re
port
1st Quarter
FY2001
DRAFT
• Instrument construction
• Sample selection
• Pretesting
• Materials preparation
• Respondent pre-contact
2nd Quarter
FY2001
3rd Quarter
FY2001
4th Quarter
FY2001
ANALYZE
• Data coded and entered
• Non-response evaluated
• Survey weights constructed
• Analyses
1st Quarter
FY2002
REPORT
• Reported results to Panel
• Conducted additional analyses as required
• Write final report
FIELD
• Surveys distributed
• Conducted intensive follow-up
• Evaluated response patterns
Fie
ldin
g p
rep
ara
tio
ns
54
Survey Designed To Elicit Detailed
Information About Local Responders
Section 1: Organizational Information Relevant organizational demographics
Section 2: Organizational Experience & Threat Perceptions Expectation of a terrorist WMD incident in the next 5 years
Organizational experience with actual incidents and hoaxes
Section 3: Emergency Response Planning Activities Organizational participation in emergency response planning
Description of emergency response plans and exercises conducted
Section 4: Responding to Specific WMD Terrorist Incidents Measures of preparedness for WMD incidents:
(1) Conventional explosives; (2) Chemical; (3) Biological; (4) Radiological
Section 5: Assessment of Federal Programs Application and/or receipt of Federal WMD support
Assessment of various Federal WMD programs, offices, exercises
55
Surveyed Ten Specific Types Of
State and Local Responder Organizations
• LOCAL (CITY/COUNTY)
– Law enforcement
– Fire departments
• Combination
• Paid
• Volunteer
– Hospitals
– EMS
– Public health depts
– OEMs
• REGIONAL
– EMS
• STATE*
– EMS
– Public health depts
– OEMs
* Also, Washington, D.C. sent state-level
surveys and state OEM and public health
surveys sent to US territories: Puerto Rico,
Guam, Virgin Islands, and Northern
Mariana's Islands
56
Sampling Strategy Designed To Obtain
Statistically Valid National Sample
• Randomly selected 200 counties in U.S.– Greater chance of choosing counties with larger
populations
– Also added in large organizations in “sensitized” counties
• Randomly sampled one of each type of local responder agency within each county
• Also took:– All regional EMS organizations containing one of
the 200 counties
– Census of state agencies
57
Selected Counties Representative
Of The Entire United States
Notes:
(1) Some counties
too small to see
(2) Hawaii and
Alaska not
depicted
Randomly sampled counties
“Sensitized” counties that were
randomly sampled
Remaining “sensitized” counties
58
A Detailed Survey Fielding Process
Used To Maximize Response Rates
• Population identification and sample selection
• Advance letter and postcard– Follow-up call to those not returning postcard
• Survey package– Cover letter from Governor Gilmore
– Commemorative coin incentive
• Thank you/reminder– Each respondent organization called
• Nonresponse follow-up– Second survey mailing to nonrespondents
– Second round of follow-up telephone calls
– Continued individualized follow-up
59* Does not count 4 surveys sent to US territories
Survey Achieved Excellent
Response Rates
1,679 66%Total/Overall rate
52%155EMS
69%51EMS
74%199Public Health
78%51*OEM
68%443Fire Department
As ofTotalSent
82%51*Public Health
111
208
202
208
48%Regional EMS
50%Hospital
71%OEM
71%Law enforcement
9/11/01Organization
60
Percent with WMD Plan Sufficient
to Address the Given Scenario
0
10
20
30
40
50
60
70
80
90
100
Fire EMS Law Public
Health
OEM Hospital
Local Organizations
Perc
en
t
Chemical
Biological
61
Percent with WMD Plan Sufficient
to Address the Given Scenario
0
10
20
30
40
50
60
70
80
90
100
EMS OEM Public Health
State Organizations
Perc
en
t
Chemical
Biological
62
Percent Exercising WMD Plan for
the Particular Type of Incident
0
20
40
60
80
100
Fire Law EMS Public
Health
OEM Hospital
Local Response Organizations
Perc
en
t
Chemical
Biological
63
Take-Aways
• Conducting censuses are frequently
infeasible/impossible, but even if possible, a good
sample can provide as good or better information
• Appropriately sampling provides some assurance
that the results are representative
• There is a rich literature about how to maximize
survey response rates – use it
• Even with survey results in Excel graphs it is possible
to show the uncertainty in a point estimate
• Per yesterday’s lecture: original data collection takes
time and effort
64
Some Briefing Questions
• How did you get your sample of data?
– Is it representative of the population? Why or why not? How
do you know?
– If not collected as a random sample: How can you even
know whether the sample is representative?
• What sorts of biases may be present in the data due
to the data collection methodology?
• If only presenting point estimates: What are the
standard errors (or the margin of error) of your
estimates?
• If making claims about results: Are your results
statistically significant?
65
Some people hate the very name of statistics,
but I find them full of beauty and interest.
Whenever they are not brutalized, but delicately
handled by the higher methods, and are warily
interpreted, their power of dealing with
complicated phenomena is extraordinary. They
are the only tools by which an opening can be
cut through the formidable thicket of difficulties
that bars the path of those who pursue the
Science of man.
– Francis Galton