63
Hypothesis Testing: (working from incomplete information) •Jury deliberations •Binomial distribution •Poisson distribution and quantal release •Normal distribution: standard deviation •Stdev of samples of size N •Estimating population statistics from small samples •student’s t-test •Predicting the future •non-parametric statistics: Difference of Proportions •The Black Swan… •Correlation •Fuzzy Logic and fuzzy controllers…

Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Embed Size (px)

Citation preview

Page 1: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Hypothesis Testing:(working from incomplete information)

•Jury deliberations•Binomial distribution•Poisson distribution and quantal release•Normal distribution: standard deviation•Stdev of samples of size N•Estimating population statistics from small samples•student’s t-test•Predicting the future•non-parametric statistics: Difference of Proportions•The Black Swan… •Correlation•Fuzzy Logic and fuzzy controllers…

Page 2: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

handouts:

• Selected pages from chapters 7 and 10 of Loftus & Loftus, Essence of Statistics, 2nd Ed (Knopf, 1988)

• You may have seen some of this material from AM0650, or AM1650…

Page 3: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

• http://www.stat.brown.edu/• 12 biostatisticians (ScM level) on call • “Our mission is to foster research and statistical education at Brown

Medical School and the University at large. Center faculty and staff conduct methodologic research in Biostatistics and interdisciplinary research in a broad range of areas of Medicine and Public Health. The Center is home to the graduate program in Biostatistics and the undergraduate statistics concentration at Brown, and organizes the Brown Statistics Seminar Series.”

• my guy: Brad Snyder…

Page 4: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Hypothesis matrix for a juryREALITY(past, future)

Null hypothesis true

Alternative hypothesis true

Judgment(prediction)

Accept null hypothesis

Accept alternative hypothesis

Page 5: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Null hypothesis is true(innocent verdict for innocent person)

REALITY

Null hypothesis true

Alternative hypothesis true

Judgment Accept null hypothesis

Correct (innocent) but it’s just the status quo

Accept alternative hypothesis

Page 6: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Alternative hypothesis is true(guilty verdict for guilty criminal)

REALITY

Null hypothesis true

Alternative hypothesis true

Judgment Accept null hypothesis

Correct (innocent) but it’s just the status quo

Accept alternative hypothesis

Correct (guilty) change in the future (jail)

Page 7: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Alternative hypothesis is true(innocent verdict for guilty criminal)

REALITY

Null hypothesis true

Alternative hypothesis true

Judgment Accept null hypothesis

Correct (innocent) but it’s just the status quo

Wrong: guilty goes free. missed opportunity

Accept alternative hypothesis

Correct (guilty) change in the future (jail)

Page 8: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Alternative hypothesis is true(guilty verdict for innocent citizen)

REALITY

Null hypothesis true

Alternative hypothesis true

Judgment Accept null hypothesis

Correct (innocent) but it’s just the status quo

Wrong: guilty goes free. missed opportunity

Accept alternative hypothesis

Wrong: very bad result: pursue a dead end

Correct (guilty) change in the future (jail)

Page 9: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Signal detection theory:

http://wise.cgu.edu/sdtmod/index.asp

http://teachline.ls.huji.ac.il/72633/SDT_intro.pdf

Page 10: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

SDT: All about a “detector” making decisions

• Is the “detector” (who can be a human making decisions/judgments) prone to false positives or misses? (mistakes)

• Hits and rejections are correct answers…• Misses a “conservative” detector• False Positives aggressive, optimistic,

paranoid, indefatigable… • Not just about finding significant differences

between samples A and B…• A Grand Jury can be classified as prone to

“guilty” or prone to “innocent”…

Page 11: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

from WISE cgu.edu SDT website:

• “SDT is a method of modeling the decision making process for someone who decides between different classes of items (e.g., friend or [foe]) and their bias to favor a particular type of response.”

• Jury selection (voir dire); jury consultants; hung juries, mistrials; Louisiana v. Morgan—”jury of peers;” civil rights…

Page 12: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

from WISE SDT website , p.3:

• “Note that Misses and Correct Rejections are redundant with Hits and False Alarms.

• The miss rate is 10/50 which is .20 or simply (1 - "hit rate") and the Correct Rejection rate is 45/50 or .90 or (1 - "false alarm rate").

• Therefore, you can perfectly describe all four measures of a person's performance in a signal detection experiment through their Hit and False Alarm rates.”

Page 13: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Detector sensitivity: d’

• “The most commonly used SDT measure of sensitivity is d' (d prime), which is the standardized difference between the means of the Signal Present and Signal Absent distributions. To calculate d', we need only know a person's hit and false alarm rates.

• The formula for d' is as follows: d' = z(FA) - z(H)• where FA and H are the False Alarm and Hit rates,

respectively, that correspond to right-tail probabilities on the normal distribution.”

Page 14: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Criterion

• Criterion is a measure of the willingness of a respondent to say 'Signal Present' in an ambiguous situation.

• The choice of a criterion may depend on perceived consequences of outcomes.

• For example, if the consequences are costly for saying 'Signal Present' when the signal actually is absent, then a respondent may generally be less willing to say 'Signal Present.'

• On the other hand, if the consequences are more costly for failing to detect a signal when it is present, then a respondent may be more willing to say 'Signal Present.'

• Positive Criterion » more willing to say “yes, I saw it” • The ROC is the locus of Criterion points…

Page 15: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

SDT Summary

• “Signal Detection Theory (SDT) allows an analyst to separate sensitivity from response bias. Observers are assumed to make decisions based upon information derived from two distributions. The first (Signal Absent) is assumed to represent a background level of "noise." The second distribution (Signal Present) represents an increase to a background level of noise caused by the introduction of a stimulus. That is why the second distribution is sometimes referred to as the 'Signal + Noise‘ distribution.

• An observer's sensitivity, as indexed by d', is how well the observer can differentiate items coming from the Signal Absent and Signal Present distributions. Criterion (i.e., response bias) represents the minimum level of internal certainty needed for the observer to decide that a signal was present.

• ROCs represent the relationship between hits and false alarms, and can be used to describe performance in terms of d'. SDT has applications in fields such as medical diagnosis, bioinformatics, psychology, and engineering.”

Page 16: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Receiver Operating Characteristics (ROCs)

• “The receiver-operating characteristic (ROC) is a fundamental plot in signal detection theory. A ROC is essentially a scatterplot that shows the relationship between false alarm rates on the x-axis, and hit rates on the y-axis. ROCs describe the relationship between the underlying Signal Absent and Signal Present distributions.”

Page 17: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Null hypothesis, the scientific method, and troubleshooting

• Some independent variable (input) has been changed in the experiment.

• The output is the dependent variable. • The null hypothesis: That the independent variable has no

affect on the dependent variable. • You want to design an experiment to test whether the null or

alternative hypothesis is true. • Something goes wrong with a circuit: Test your hypothesis

as to why. (Lab ADA example…)• Horace Barlow: direction selectivity in rabbit retinal ganglion cells:

2 alt hypotheses, test between them, so as not to favor one…

Page 18: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

ordered combinations and Pascal’s triangle

Pascal’s triangle shows C(N, r) with N as the row and r as the “column”

•An ordered combination deals with items that have individual labels, such as their place in a row… •The number of ordered combinations of N things taken r at a time is

http://ptri1.tripod.com/

Page 19: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Binomial formula

The probability of exactly r successes out of N attempts is

where p is the probability of success and q of failure

Consider a random variable that can be in one of two states:“success” or “failure”

another use of the formula:

Page 20: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Binomial distribution in EXCEL

• The probability that 3 or fewer coin flips come up heads

out of 10 tosses of a fair coin:

= BINOMDIST(3, 10, 0.5, 1)= 0.172

• Also see =COMBIN(10, 2) for EXCEL version of number of combinations of 10 things taken 2 at a time…• or try MATLAB function nchoosek(N, k) say nchoosek(10, 2)...

↓cumulative factor

Page 21: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Or solve using Pascal’s triangle:

• find the row with “10” as the second number

• where 1, 10, 45, 120 are the number of combinations of 10 things taken “0”, 1, 2, 3 at time

• The number 0 represents that none of the coinscame up heads--there’s only 1 way that can happen…

Page 22: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Why roulette is my favorite form of gambling• C:\MatlabR12\work\JDD\roulette13.m• Pascal’s catastrophe… • Thomas Bass, Newtonian Casino, Penguin (1991)

Page 23: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Poisson distribution

where n*p is the average for N trials…

Or try EXCEL=poisson(0, 4.7, 1)=0.009095looking at the probability that there would be no release for one stimulation where n*p = 4.7, and say n=470 and p = 0.01

compare to =binomdist(0, 470, 0.01, 1) = 0.008883 …close

Page 24: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Vesicle release from synapsework of Bernard Katz (Nobel Prize, 1970)

Example: epsp = "excitatory post-synaptic potential". Of 198 stimulus impulses 18 resulted in no epsp (failure of release).

(image from Tepper, at Rutgers...)presynaptic bouton on left.

78 spontaneous epsp’s were observed (next slide) average height 0.4mV.

n*p=2.33

sum 3 bins

m mV calc obs

0 0 19 18

1 0.4 45 44

2 0.8 52 55

3 1.2 41 36

4 1.6 24 25

5 2 11 12

6 2.4 4 5

7 2.8 1 2

8 3.2 0 1

9 3.6 0 0

198 198

Page 25: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

for the event count of m*0.4mV add up 3 neighboring bins, except for 0.

Spontaneous vs evoked epsp’s …

Page 26: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Fitting the data

• Katz’ data are very well fit by a Poisson distribution with n*p = 2.33, the only free parameter in the equation.

• What is n in n*p? Not 198, the number of shocks to the presynaptic axon.

• n is the number of vesicles in the presynaptic synapse. est: n = 800so p = 2.3/800 = 0.002875

Page 27: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

How many vesicles are there per pre-synaptic bouton?

• Anywhere from hundreds to thousands. • One estimate says 987 vesicles per cubic micron. • There are docking vesicles ready to be released,

and reserve vesicles--recently reconstituted, and away from the membrane that is facing the synpatic cleft.

• At any rate, p is the probability that one vesicle will be released (due to one pre-synaptic shock...)

Page 28: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Quiz example of quantal release question:

• Suppose there are 700 vesicles at a synapse and each has 0.002 probability of being released by one pre-synaptic shock. What is the expected number of shocks out of 200 that will result in no vesicles being released?

• n*p = 700*0.002 = 1.4, the mean released…• =POISSON(0, 1.4, 1) = 0.25 • 0.25*200 = 50 shocks will result in

no vesicle being released

• =BINOMDIST(0, 700, 0.002, 1) = 0.246252

Page 29: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

A giant has swallowed 6 dwarfs numbered Di; you hit him on the back and he coughs up N D’s; how many he coughs up fits a binom dist: avg 3

Binomial example of 2 giants coughing…

• % Binom_samp_sze2 11.4.14 • % compare std of sample size 2 from binomdist of 6

% assume 50% probability of success• % possible to cough up 0… • pasc_7 = [ 1 6 15 20 15 6 1] % total of 64… • to_6 = [ 0 1 2 3 4 5 6 ] % avg = 21/7 = 3• dot_prod = sum(pasc_7 .* to_6)• avg1 = dot_prod/sum(pasc_7)

• OR Prob of two times of 5 or 6 = (7/64)^2 ≈ 1%

Page 30: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Normal (Gaussian, Bell-shaped) Distribution

Say the mean of the data is μ and the standard deviation is σ

Page 31: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

cumulative normal probability density function0 mean, 1 stdev

from z= -1.96 to +1.96 is 95% of the area under the curve

Page 32: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

The Black Swan*: How can you tell if your data are NOT normally distributed?

*The Black Swan: The Impact of the Highly Improbable, Nicholas Taleb, Random House (2007)

• mean ≠ median, or• CPDF not sigmoid-shaped, or• PDF has “barbell” distribution or• Fat-tailed asymmetric distribution or• Data “fails” Chi-squared test…

Page 33: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Binomial becomes Normal (SAT?)• Consider a binomial distribution with p =0.5• p(x) vs x will be a symmetric up-down staircase curve• As the number of “coin flips” N in the binomial data set

increases, the curve will look smooth and “normal”

• standard deviation of binomial dist =

connecting the dots of a 40-point binomial plot…

Page 34: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Are you smarter than a 10th grader?

• Sample of one: • Suppose you score 600 on the SAT math test• The average 10th grader scores 500• Standard deviation of the SAT = 100 • What's the probability that you're smarter than a 10th grader? • You did receive a higher score, but (in EXCEL) • =NORMDIST(600, 500, 100, 1) = 0.84• 1-0.84 = 16%• 16% = one-tailed probability that someone will score 600 or more.• You're in the 84th percentile. • You’re not significantly smarter than a 10th grader…

Page 35: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Are you and your 9 left-handed friends smarter than a 10th grader?

• Sample of 10, with (average of the 10) = 600 on SAT math test • The standard deviation of samples size 10 is

sqrt(10000/10) = sqrt(1000) = 31.6• =NORMDIST(600, 500, 31.6, 1) = 0.9992, wildly significant • Example of testing sample against a known population

Say the mean of sample size 10 is 600… Best est. of mean of the means of many samplesof size 10 is the mean of the one sample, 600…

What is the best estimate of the variance of the means of many samples 10 scores drawn from a SAT distribution?Note: it doesn’t matter what the particular standard deviation of the one sample is…if pop. σ known

Page 36: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Comparing two variants of a population

• What about comparing two experimental groups from a known population?

• Form a normalized z term as shown below: Ms1 and Ms2 are the means of the two groups.

• We are interested in the difference of the means here

Page 37: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Comparing two variants of a population(cont)

• Suppose it's known that the average area of maple leaves on the ground in October is 28 cm-sq, with a standard deviation of 5 cm-sq.

• A sample of 12 Japanese maple leaves has an average area of 34 cm-sq, std 4 cm^2

• Someone else comes in and says that a sample of 18 “big leaf” maple leaves had an average area of 38 cm-sq, unknown standard deviation.

• Is it significant at the 5% level that big leaf maple leaves are larger than Japanese maple leaves? (When was the hypothesis conceived?)

• From the formula on the previous slide, the estimated std_dev is 1.86, • →z = 4/1.86 = 2.14

and without having to actually calculate z, =NORMDIST(4, 0, 1.86, 1) = 0.984 • The significance is 1.6% < 5% Answer: yes the difference is significant.

Page 38: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

More Maple Leafs (evs?)

• work\fold23\MapleLeafSizeScript12• MapleTST.xls• Tools\Data Analysis\z-test for 2 means

Page 39: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

The paradox of two tails The area in yellow must be less than 5% of the total for the two-tailed test to be significant.

A two tailed-test is 2x more difficult to pass than a one-tailed

Page 40: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Digression for November elections

• A qualifier seen in news articles about political polling: 3.1% margin of error…

• Suppose X voters out of N sampled will vote for your candidate. What is the number of voters N needed in a sample to insure that 95% of the time the actual percentage of voters underlying your candidate's percentage of X/N will be within ±3.1 percent of X/N?

• This question, whose answer is N=1000 and whose derivation is here, is different from the question:

• What should be N such that you're confident at the 95% level that the range of poll percentages is ±3 percent of X/N if you repeated the poll many times?

Page 41: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Number of voters needed in a poll

Page 42: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Men’s height (age 20-40)and %-age of 7-footers in NBA

• “guys who are just tall…”• http://www.truthaboutit.net/2012/05/true-or-false-half-of-all-7-footers-are-in-the-nba.html

• CDC data: for age 20, mean = 69.8”std_dev = 2.8”

• (84-69.8)/2.8 = 5.07 = z (num of std_dev out)

• 1-NORMDIST(84, 69.8, 2.8, 1) = 2 x 10-7

• 320M/2 = 160M; ¼*160 = 40M men age 20-40• →40x106 * 2 x10-7 = 8, too small of a number: • fat tail distribution…

Page 43: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Estimating unknown population variance

• Suppose the statistics of the underlying population are unknown… • What is the best estimate of population variance? • Remember from AM65?

Page 44: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

t-distributions• Once we enter a world of unknown population statistics, where we

rely on the small sample data alone, we end up dealing with t-distributions--examples below for 3 and 6 deg of freedom…

• Contained within EXCEL are the “t-tables” for each degree of freedom N-1.

Page 45: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Two tails example: Comparing to a standardwith TDIST

• Suppose I do an experiment to see how close people can come to guessing my weight. I ask ten people.

• I know my exact weight, but don't know the standard deviation of all guesses, only the stdev of a sample of 10 guesses.

• Next, I estimate the variance of the population from

• Then I divide the est. variance by N=6 to find the est. variance of the means of sample size 10 = σ2. • Now I calculate t = (diff_mean – wt)/σ and have EXCEL compute • =TDIST(diff_mean, t, 9, 1)• But what if I don’t care if they’re high or low, just wrong in either direction? Time for 2-tails? see weight sheet…

•WeightEst12.xls on the screen

Page 46: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Two tailed test example (cont)• Suppose all the guesses are too high. • Can I do a one-tailed test concerning the hypothesis that

people overestimate my weight? • NO!• The hypothesis was conceived after collecting the data. • The two-tailed criterion must be used. • Whatever the one-tailed (normal) significance, I must multiply it by 2.

(considering significance to be a small number…)• The hypothesis: It is significant that people are wrong about my

weight, guessing either too high or too low. • May lead to a Difference of Proportions test with threshold. • Or what about using the absolute value of the “error” as the data?

Page 47: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Example: femurs in lemurs Suppose a sample of 9 femurs from ring-tailed lemurs show their mean length to be 20 cm, And that the variance of the length is 9 cm. What is the probability that the ringtail lemur femurs are NOT from a population of mean length 24 cm? A one-tailed test?

Page 48: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Example 10-5 from Loftus & Loftus, then use =TTEST(A1, A2, tails, type)

Page 49: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

How can you use the EXCEL tools if all you have is sample size, mean and stdev?

• Create your own sample with the same mean and standard deviation:

• The cpdf--cumulative probability density function--is the integral of the probability distribution.

• Sample at equal intervals of the cumulative probability y-axis; pick off the associated z values, then un-normalize the z’s: x = σ*z+μ

• See test code in folder fold23 function [pdfx, pdfy, samp, samp3, std1, std3] = pdf_tst12(52, 100, 30, 1);

• The result will be a sample with a slightly smaller variance than the underlying population.

• Tweak that data to get the exact stdev, and use EXCEL on the resulting synthetic sample.

• rev12 has rand(div, 1) generate from a UNIFORM distribution…

Page 50: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

How long to find the T?

Page 51: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Nonparametric (not normal) data hypothesis testing: Difference of proportions.

• Example: A study of reaction times for discovering one T among many L’s.

• Some reaction times can be very long• None can be very short or negative• Result: fat-tailed distribution• mean > median• Transform data into binary format:

reaction time > 500 msec? • Now a “binomial” problem…

Page 52: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Courtesy of S Geman AM65 notes - Adapted from McCabe & Moore (6 th ed.)

Page 53: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Other ways to “normalize” your data

• Throw outliers out of data base…

• Take logarithm of data (compression of large values…)

•How to justify either action?

Page 54: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Correlation Assume we have a set of NORMALIZED paired data { x, y }

• r will be a number between -1 and +1 • A LINEAR correlation • correlation is not causation!• In EXCEL use the Pearson or CORREL operation• = PEARSON(M5:M14, N5:N14)• Example: dose and response should be positively correlated• Time-shifting auto-correlation: pattern matching• Why M-1?

Page 55: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Auto-correlation example from wikipedia

• time series with a hidden sine wave • autocorrelation reveals hidden pattern...• “barrel-shifting” the data

Page 56: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Black Swan challenge: Power Lawfrom page 235 of The Black SwanURL: http://www.engin.brown.edu/courses/en123/eqnSTAT/BlackSwan8020.jpg

For his 80/20 example, what is the underlying power in the power law?

Suppose 100 people own land in Italy, and you order and number them from least to most.

The following matlab script gives frac 21% owned by the first 80%, for a power of 6.

(yes, the integral of the power fcn could be used...)

for nn = 1:100

ara(nn) = nn^6; % the power is 6

end,

tot = sum(ara);

tot80 = sum( ara(1:80) );

frac = tot80/tot

but

tot98 = sum(ara(1:98))

and

tot98/frac = 86%, so the top 2% do not own 50% of the land...

Page 57: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Predicting the future: How many hurricanes will hit the USA next year?

• http://weather.unisys.com/hurricane/atlantic/index.html• Hypothesis: As the years go by, anthromorphic release of CO2 into

the atmosphere will warm the Atlantic ocean and cause more hurricanes to spawn off the coast of Africa...

• 2005 (4) Dennis, Katrina, Rita and Wilma struck the US. • 2006: none• 2007: (1½) Humberto and Noel (½)• 2008: (3) Dolly (½), Gustav, Ike, Kyle (½)• 2009: (1) Bill• Explanation: Blame El Nino. (Thank El Nino?)• 2010: (2) Earl, Igor (Blame volcanic ash from Iceland) • 2011: (1) Irene• 2012: (2) Issac, Sandy• 2013: none (Blame polar vortex) • 2014: (1) Arthur (July)

Page 58: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Predicting Global Warming in year 2000:

Page 59: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Probability compared to Fuzzy Logic • grading fuzzy concepts like

“tall” or “close” or “warm” or “fast” or “guilty” • Fuzzy set membership from 0 to 1• fuzzy set membership functions• Fuzzy logic functions: OR AND…• fuzziness and existentialism: the refrigerator example• example: the fuzzy ellipse• “unlike fuzziness, probability dissipates with increasing

information.” (Kosko p. 267)• applications: inverted pendulum; backing up 18 wheeler truck…• Use of fuzzy logic controllers in Japanese rail transport• Lotfi Zadeh, UC Berkeley

(father of fuzzy sets and logic) • Bart Kosko (USC): “Fuzzy Engineering”

Page 60: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:
Page 61: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

http://office.microsoft.com/en-us/excel/HP052042111033.aspx#Statistical%20functions

Page 62: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Rising sea levels:

• From 1900 to 2000 sea levels rose 9”

• Prediction from UN panel: from 2000-2100 levels will rise 36”!

• From 2000-2014 levels have risen 1.5”

• Linear UN prediction for 2014: 5.4”

• need 34.5” in 86 years…

Page 63: Hypothesis Testing: (working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution:

Topics for Stat Quiz Question

• Poisson or binomial functions used to solve release of transmitter vesicle problems

• Significance of difference between two samples taken from known population

• Sig of diff between two samples without knowledge of overlying population

• Correlation of time-shifted signals (stimulus and response waveforms)