13
How to Lie With How to Lie With Statistics Statistics Barbara Mehlman Barbara Mehlman LIS 514 LIS 514

How to Lie With Statistics Barbara Mehlman LIS 514

Embed Size (px)

Citation preview

Page 1: How to Lie With Statistics Barbara Mehlman LIS 514

How to Lie With StatisticsHow to Lie With StatisticsBarbara MehlmanBarbara Mehlman

LIS 514 LIS 514

Page 2: How to Lie With Statistics Barbara Mehlman LIS 514

““Lies, damn lies and Lies, damn lies and statistics”statistics”

Mark TwainMark Twain

Page 3: How to Lie With Statistics Barbara Mehlman LIS 514

OverviewOverview

Misleading AveragesMisleading Averages Meaningless RankingsMeaningless Rankings Distorted DataDistorted Data Silly SamplesSilly Samples Cherry PickingCherry Picking

Page 4: How to Lie With Statistics Barbara Mehlman LIS 514

Misleading AveragesMisleading Averages What is the average What is the average

number of cats?number of cats? A.A. 146 cats146 cats B .B . 3 cats3 cats C.C. 2 cats2 cats D.D. All of the aboveAll of the above E.E. None of the None of the

aboveabove

PersonPerson Number of Number of CatsCats

JennJenn 33

PaigePaige 22

BobBob 11

ElizabethElizabeth 1010

KatherineKatherine 55

ClaudiaClaudia 22

MaryMary 999999

Page 5: How to Lie With Statistics Barbara Mehlman LIS 514

All three “averages” describe the All three “averages” describe the answer truthfully!answer truthfully!

This is because there is no This is because there is no one “statistical average”: an one “statistical average”: an average can be represented average can be represented by “mean”, “median” or “mode”by “mean”, “median” or “mode”

Mean is the sum of the Mean is the sum of the numbers divided by the count: numbers divided by the count: the average is the average is 146 cats146 cats

Median is ranking the numbers Median is ranking the numbers and putting them in ascending and putting them in ascending order and then selecting the order and then selecting the middle one: the average is middle one: the average is 3 3 catscats

Mode is the most frequently Mode is the most frequently occurring value in a occurring value in a distribution: the average is distribution: the average is 2 2 cats.cats.

Page 6: How to Lie With Statistics Barbara Mehlman LIS 514

Meaningless RankingsMeaningless RankingsGuestGuest Dan’s Dan’s

ChileChileEvelyn’s Evelyn’s ChileChile

A.J.’s A.J.’s ChileChile

BarbaraBarbara 22 11 33

BrianBrian 11 22 33

SamSam 22 11 33

EliEli 11 22 33

JakeJake 22 11 33

The Great Chile The Great Chile Cookoff! Whose is Cookoff! Whose is the most spicy?the most spicy? Most thought A.J’s Most thought A.J’s

hottesthottest Mean Dan’s Chile Mean Dan’s Chile

equals 1.6equals 1.6 Mean Evelyn’s Mean Evelyn’s

Chile equals 1.4Chile equals 1.4 Mean A.J’s Chile Mean A.J’s Chile

equals 3.0equals 3.0

Page 7: How to Lie With Statistics Barbara Mehlman LIS 514

How spicy was it?How spicy was it?

A.J.is the most spicy A.J.is the most spicy – but by how much? – but by how much?

Did you have to drink Did you have to drink a pitcher of beer to a pitcher of beer to cool off or was it just cool off or was it just slightly hotter than slightly hotter than Evelyn’s. Evelyn’s.

Data only ranks but Data only ranks but gives very little gives very little informationinformation

Page 8: How to Lie With Statistics Barbara Mehlman LIS 514

Misleading DataMisleading Data ““Women are worse drivers than men!”Women are worse drivers than men!”

My mother in lawMy mother in law

Insurance companies usually claim opposite, that women between Insurance companies usually claim opposite, that women between ages 20 and 65 will have fewer accidents than men ages 20 and 65.ages 20 and 65 will have fewer accidents than men ages 20 and 65.

One widely reported insurance company study found:One widely reported insurance company study found: 55 per cent of men – 30 per cent of women – drink drive. 55 per cent of men – 30 per cent of women – drink drive. 47 per cent of men – 38 per cent of women – have rudely gestured at 47 per cent of men – 38 per cent of women – have rudely gestured at

other drivers. other drivers. 84 per cent of men – 77 per cent of women – have crashed their 84 per cent of men – 77 per cent of women – have crashed their

vehicle, vehicle, 51 per cent of men – 40 per cent of women – have been distracted by 51 per cent of men – 40 per cent of women – have been distracted by

billboards while driving billboards while driving 46 per cent of men – 36 per cent of woman – admitted to verbally 46 per cent of men – 36 per cent of woman – admitted to verbally

abusing another driver. abusing another driver. 22 per cent of men – 15 per cent of women – admitted to using their 22 per cent of men – 15 per cent of women – admitted to using their

mobile phones without hands-free accessories while driving mobile phones without hands-free accessories while driving

Page 9: How to Lie With Statistics Barbara Mehlman LIS 514

It’s All About the Question Asked!It’s All About the Question Asked!

But is a safer driver a more skilled driver? But is a safer driver a more skilled driver? Depending on what information we use, my Depending on what information we use, my mother-in-law’s sketchy argument could be mother-in-law’s sketchy argument could be defended after all :defended after all :

men are 3X more likely to die driving, but they drive 60 -men are 3X more likely to die driving, but they drive 60 -65% more than women. 65% more than women.

Women statistically take shorter trips than menWomen statistically take shorter trips than men Studies show that female drivers have a greater number Studies show that female drivers have a greater number

of minor crashes than men.of minor crashes than men. Finally, what I can prove from the results of a phone Finally, what I can prove from the results of a phone

survey of 2,000 drivers from the previous page is limited, survey of 2,000 drivers from the previous page is limited, but don’t tell my mother-in law!but don’t tell my mother-in law!

Page 10: How to Lie With Statistics Barbara Mehlman LIS 514

Sneaky SamplingSneaky Sampling

Random Sample required: a subset of Random Sample required: a subset of individuals that are randomly selected from a individuals that are randomly selected from a population. The goal is to obtain a sample that is population. The goal is to obtain a sample that is representative of the larger population representative of the larger population Ex. Study of television use among preschoolers in Ex. Study of television use among preschoolers in

New York New York Sample is children in nursery school: excludes children can’t Sample is children in nursery school: excludes children can’t

afford to go nursery school or nursery program at home or in afford to go nursery school or nursery program at home or in day care. Also excludes children chronic illness. day care. Also excludes children chronic illness.

Page 11: How to Lie With Statistics Barbara Mehlman LIS 514

Adequate Sample requiredAdequate Sample required

Ex. Flip a coinEx. Flip a coin Flip a coin 5 times and 4 times it’s heads: eighty percent Flip a coin 5 times and 4 times it’s heads: eighty percent

chance heads!?chance heads!? Flip coin 10,000 times, would still have eighty percent chance Flip coin 10,000 times, would still have eighty percent chance

heads? Likelihood in a coin toss of heads of tails should not be heads? Likelihood in a coin toss of heads of tails should not be eighty percent with a large enough sample!eighty percent with a large enough sample!

School Comparison School Comparison Pretend 89% children passed ELA at Murray Avenue SchoolPretend 89% children passed ELA at Murray Avenue School Pretend 90% children passed ELA at St. John and PaulPretend 90% children passed ELA at St. John and Paul

There are approximately 100 third graders at Murray There are approximately 100 third graders at Murray Avenue SchoolAvenue School

There are approximately 30 third graders at SJPSThere are approximately 30 third graders at SJPS The sample size is not comparable. At SJPS 27 passed. At The sample size is not comparable. At SJPS 27 passed. At

Murray 89 children passed. Is SJPS the superior school based Murray 89 children passed. Is SJPS the superior school based on test scoreson test scores??

Page 12: How to Lie With Statistics Barbara Mehlman LIS 514

Cherry PickingCherry Picking

Selecting Statistics Within A Scope or Time Frame Selecting Statistics Within A Scope or Time Frame in Order to Correlate What You Wish to Prove.in Order to Correlate What You Wish to Prove.

Example: We are winning the war on drugsExample: We are winning the war on drugs Heroin production has decreased 20% in AfghanistanHeroin production has decreased 20% in Afghanistan

Even if true, tells us nothing about Even if true, tells us nothing about globalglobal drug production, drug production, perhaps production is up 24% in Columbia.perhaps production is up 24% in Columbia.

Street drug dealing fell by 15% in the last 6 monthsStreet drug dealing fell by 15% in the last 6 months Even if true, the time frame too short to reflect long term trendEven if true, the time frame too short to reflect long term trend

Page 13: How to Lie With Statistics Barbara Mehlman LIS 514

ConclusionConclusion

Always be a savvy consumer when it Always be a savvy consumer when it comes to statistics:comes to statistics: Think about whose data –ulterior motiveThink about whose data –ulterior motive Numbers out of context – cherry pickingNumbers out of context – cherry picking How data collected and displayedHow data collected and displayed