First Hw Stat 102

8/10/2019 First Hw Stat 102

http://slidepdf.com/reader/full/first-hw-stat-102 1/23

From:http://www.amstat.org/careers/whatisstatistics.cfm

What Is Statistics?Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation

essential for controlling the course of scientific and societal advances (Davidian, M. and Louis, T. A., 10.1126/science.1218685).

Statisticians apply statistical thinking and methods to a wide variety of scientific, social, and business endeavors in such areas as astronomy, biology,

education, economics, engineering, genetics, marketing, medicine, psychology, public health, sports, among many. "The best th ing about being a

statistician is that you get to play in everyone else's backyard." (John Tukey, Bell Labs, Princeton University)

Many economic, social, political, and military decisions cannot be made without statistical techniques, such as the design of experiments to gain federal

approval of a newly manufactured drug.

Job Characteristics

Use data to solve problems in a wide variety of fields

Apply mathematical and statistical knowledge to social, economic, medical, political, and ecological problems

Work individually and/or as part of an interdisciplinary team

Travel to consult with other professionals or attend conferences, seminars, and continuing education activities

Advance the frontiers of statistics, mathematics, and probability through education and research

If you enjoy any of these, a career in statistics may be right for you!

Statisticians provide crucial guidance in determining what information is reliable and which predictions can be trusted. They often help search for clues to

the solution of a scientific mystery and sometimes keep investigators from being misled by false impressions.

From: https://lsc.cornell.edu/Sidebars/Stats%20Lab%20PDFs/Topic2.pdf

Topic #2: Why Study Statistics?

Hopefully, the discussion above has helped you to understand a little

better what the terms measurement and statistics mean. However, you

may still be wondering "Why do I need to learn statistics?" or "What

future benefit can I get from a statistics class?". Well, since you asked -

there are five major reasons to study statistics:

The first reason is to be able to effectively conduct research. Without

the use of statistics it would be very difficult to make decisions based on

the data collected from a research project. For example, in the study

cited in Chapter One, is the difference in recorded absenteeism

between psychiatric and obstetrics nurses large enough to conclude

that there is meaningful difference in absenteeism between the two

units? There are two possibilities: The first possibility is that the

difference between the two groups is a result of chance factors. In



reality, the two jobs have approximately the same amount of

absenteeism. The second possibility is that there is a real difference

between the two units with the psychiatric unit being more nurses

missing work. Without statistics we have no way of making an educated

decision between the two possibilities. Statistics, however, provides us

with a tool to make an educated decision. We will be able to decide

which of the two possibilities is more likely to be true. We will base this

decision on our knowledge of probability and inferential statistics.

A second point about research should be made. It is extremely

important for a researcher to know what statistics they want to use

before they collect their data. Otherwise data might be collected that is

uninterpretable. Unfortunately, when this happens it results in a loss of

data, time, and money.

Now many a student may by saying to themselves: "But I never plan on

doing any research." While you may never plan to be involved in

research, it may find its way into your life. Certainly, it you decide to

continue your education and work on a masters or doctoral degree, involvement in research will result from that decision. Secondly, more

and more work places are conducting internal research or are becoming

part of broader research studies. Thus, you may find yourself assigned

to one of these studies. Finally, many classes on the undergraduate

level may require you to conduct research (for example, a research

methods or experimental psychology course). In each of these

instances, a knowledge of measurements and statistics will be

invaluable.

The second reason to study statistics is to be able to read journals.

Most technical journals you will read contain some form of statistics.

Usually, you will find them in something called the results section.

Without an understanding of statistics, the information contained in this

section will be meaningless. An understanding of basic statistics will

provide you with the fundamental skills necessary to read and evaluate

most results sections. The ability to extract meaning from journal articles

and the ability to critically evaluate research from a statistical



perspective are fundamental skills that will enhance your knowledge

and understanding in related coursework.

The third reason is to further develop critical and analytic thinking skills.

Most students completing high school and introductory undergraduate

coursework have at their disposal a variety of critical thinking and

analytic skills. The study of statistics will serve to enhance and further

develop these skills. To do well in statistics one must develop and use

formal logical thinking abilities that are both high level and creative.

The fourth reason to study statistics is to be an informed consumer. Like

any other tool, statistics can be used or misused. Yes, it i s true that

some individuals do actively lie and mislead with statistics. More often,

however, well meaning individuals unintentionally report erroneous

statistical conclusions. If you know some of the basic statistical

concepts, you will be in a better position to evaluate the information you

have been given.The fifth reason to have a working knowledge of statistics is to know

when you need to hire a statistician. Most of us know enough about our

cars to know when to take it into the shop. Usually, we don't attempt the

repair ourselves because we don't want to cause any irreparable

damage. Also, we try to know enough to be able to carry on an

intelligible conversation with the mechanic (or we take someone with us

who can) to insure that we don't get a whole new engine (big bucks)

when all we need is a new fuel filter (a few bucks). We should be the

same way about hiring a statistician. Conducting research is time

consuming and expensive. If you are in over your statistical head, it

does not make sense to risk an entire project by attempting to compute

the data analyses yourself. It is very east to compute incomplete or

inappropriate statistical analysis of one's data. As with the mechanic

discussed above, it is also important to have enough statistical savvy to

be able to discuss your project and the data analyses you want

computed with the statistician you hire. In other words, you want to be

able to make sure that your statistician is on the right track.

To summarize, the five reasons to study statistics are to be able to



effectively conduct research, to be able to read and evaluate journal

articles, to further develop critical thinking and analytic skills, to act a an

informed consumer, and to know when you need to hire outside

statistical help.

---------------

From: http://www.bu.edu/stat/undergraduate-program-information/why-study-statistics/

Why Study Statistics

WHAT IS STATISTICS ?Statistics is the science (and, arguably, also the art!) of learning from data. As a discipline it is concerned with the collection, analysis, and interpretation of data, as

well as the effective communication and presentation of results relying on data. Statistics lies at the heart of the type of quantitative reasoning necessary for

making important advances in the sciences, such as medicine and genetics, and for making important decisions in business and public policy.

WHY STUDY STATISTICS ?From medical studies to research experiments, from satellites continuously orbiting the globe to ubiquitous social network sites like Facebook or MySpace, from

polling organizations to United Nations observers, data are being collected everywhere and all the time. Knowledge in statistics provides you with the necessary

tools and conceptual foundations in quantitative reasoning to extract information intelligently from this sea of data.

“The sexy job in the next ten years will be statisticians. Because now we really do have essentially free and ubiquitous data. So the

complimentary factor is the ability to understand that data and extract value from it.”

Hal VarianChief Economist, Google

January, 2009

At Boston University undergraduates can pursue studies in statistics, through degree programs in the Department of Mathematics and Statistics, in a number of ways.

If your main interest is in another field, but you know that statistical skills will be key to making a real impact in that field, then a minor in mathematics, with specialization in

statistics, may make most sense. This are particularly popular with students in psychology, sociology, business, and related social sciences, as well as increasingly so with students

in biology.

If it’s statistics itself that attracts you, then a major in mathematics, with specialization in statistics, is what you want.

Finally, for those that get an early start and do well in their courses, the Department’s joint BA/MA degree deserves serious consideration, allowing students to graduate in as little

as four years with both bachelors and masters degrees in mathematics, with specialization in statistics.

OPPORTUNITIES FOR STATISTICIANS“The demand for statisticians is currently high and growing. According to the Occupational Outlook Handbook, published by the Bureau

of Labor Statistics, the number of nonacademic jobs for statisticians is expected to increase through 2016. ”

– American Statistical Association

Statisticians are in demand in all sectors of society, ranging from government, to business and industry, to universities and research labs. As a statistician you can

be involved with the development of new lifesaving drugs in a pharmaceutical, the shaping of public policy in government, the planning of market strategy in

business, or the management of investment portfolios in finance. Not only are there a wide variety of exciting opportunities for statisticians, but careers in statistics

generally can be quite lucrative, with statisticians of sufficient experience often able to earn six-figure salaries.

For more information on career opportunities, including several opportunities in the US Federal Agencies, see the website of the American Statistical Association,

at www.amstat.org/careers. Of particular interest to undergraduates is the new STATTRAK website

TESTIMONIALS FROM RECENT STUDENTS

The skills I acquired through my coursework and class projects made me an attractive job candidate to a variety of organizations and

helped me secure a great position at a local biotech company weeks after graduation.

Hoxie AckermanBA-MA Statistics, 2009

http://math.bu.edu/

http://math.bu.edu/

http://math.bu.edu/

http://www.bu.edu/stat/undergraduate-program-information/minor-in-statistics/




http://www.bu.edu/stat/undergraduate-program-information/statistics-major/



http://www.bu.edu/stat/undergraduate-program-information/bama-in-statistics/



http://magazine.amstat.org/blog/2010/10/01/world-statistics-day/



http://www.amstat.org/careers



http://stattrak.amstat.org/










http://math.bu.edu/



Hoxie Ackerman “Looking back on my time in the Boston University Mathematics & Statistics department, I was constantly impressed by the high caliber of professors and the

breadth of courses offered. The skills I acquired through my coursework and class projects made me an attractive job candidate to a variety of organizations and helped me secure a

great position at a local biotech company weeks after graduation. Furthermore, the BA/MA program is an incredible opportunity; the additional coursework and degree have both

enhanced my ability to succeed at my job and prepared me for top PhD programs. Strong mathematicians and statisticians will play a prominent role in almost every field in the

twenty-first century, and a degree from the Boston University Mathematics & Statistics department will serve you well for many years to come.”

There are so many companies out there that need Statisticians. Because of my BA/MA in math/stats, I have found a job which I really

love. Thank you BU Math/Stats department!

Mathilde KaperBA-MA Statistics, 2003

Mathilde Kaper : ―I was unsure of what I wanted to major in, but this quickly changed once I started taking more math and stat classes at BU. Couldn’t have asked

for a better group of professors who clearly are there due to their love of their subjects and desire to inspire others. They made time to make sure that you

understood the material and they were there to see to it that you succeeded. One word of advice: make sure to ask lots of questions! And remember that there are

so many companies out there that need Statisticians. Because of my BA/MA in math/stats, I have found a job which I really love.‖

From: http://www.sci.usq.edu.au/statsweb/whystats.html

Why Study Statistics?

Statistics is a very practical discipline, concerned about real problems in thereal world. It has applications in:

bioinformatics; biology (biostatistics or biometrics); climatology; computing or computer science (statistical computing is a highly sought-

after skill); economics (econometrics); finance (financial statistics); psychology (psychometrics); physics (statistical physics is a modern discipline in physics); the health industry (medical statistics).

Studying statistics provides a wealth of job opportunities.

In most disciplines, it is almost never possible to examine or study everything

of interest (such as an item, person, business or animal). For example,suppose we are interested in the effect of a certain drug claimed to improvethe circulation of blood. We cannot test the drug on everyone!

However, we can test the drug on a selection of people. This rasies a numberof questions, however:

http://www.usq.edu.au/handbook/2003/BBIN.html


http://www.usq.edu.au/biophysci/Biology/biology.htm


http://www.usq.edu.au/biophysci/climatology/climatology1.htm


http://www.sci.usq.edu.au/publicity/index.html


http://www.usq.edu.au/faculty/business/departments/erm/index.htm


http://www.usq.edu.au/faculty/business/departments/finance/index.htm


http://psych.sci.usq.edu.au/


http://www.usq.edu.au/biophysci/Physics/physics.htm


http://www.sci.usq.edu.au/statsweb/job.html














What group of people do we choose? Who should be in the group? What can information gathered from a small group of people tell us

about the effect of the drug in the population in general? Won't the measured effect depend on which people are in the group?

Won't it change from one group to the next? So how can any useful

information be found? How many people should be in such a group to obtain useful

information?

To answer these questions, we need statistics.

Statistics appears in almost all areas of science, technology, research, andwherever data is obtained for the purpose of finding information. Statistics hasbeen described as the science of making conclusions in the presence ofuncertainity.

Statistics can provide answers to all the questions listed above.

Despite the affinity of statistics with real situations, it has a strongmathematical foundation.

Why we need to study busin ess stat ist ics?

Statistics ... the most important science in the whole world: for upon it depends the

practical application of every other science and of every art; the one science essential to all political and social administration, all education,

all organisation based upon experience, for it only gives the results of our experience. -Florence Nightingale.

Learning Statistics can help a businessman do forecasting for planning and helps them to make a decision on a certain hypotheses created.

In Statistics, we will extract meaningful information from piles of raw data and make an inference about nature of a population based on

observations of a sample taken from that population. In a simple Statistics we are doing estimation on the population based on the sample

taken. A bigger sample size will have a low variation and will make a better understanding about the population. Statistics can helps use do aprediction on the rates of occurrence of random events; which we called this as probability occurrence on the distribution. The aim of learning

Statistics is to understand and be able to interpret statistical calculations performed by others. Uses of statistics in Business are everywhere;

such as Market Research, Quality Control, Product Planning, Forecasting, Yearly reports, Personnel Management and etc. Eventually, we

are looking at Statistics everyday, as simple as train and busses schedules and routes, football results, debt analysis, houses pricing, survey

of games played by gender and age, birth and death, time series of population of people, students results and many more. The objectives of

learning Statistics in Business is to understands raw data and types of data that’s needed in certain study.

Diyana Abdul Mahad

Lecturer



School of Business and Law

TMC Academy

Types of Statistics

In these pages, we are concerned with two ways of representing descriptive statistics: numerical and pictorial.

Numerical statistics

Numerical statistics are numbers, but clearly, some numbers are more meaningful than others. For example, if you are

offered a purchase price of $1 for an automobile on the condition that you also buy a second automobile, the price of the

second automobile would be a major consideration (its price could be $1,000,000 or $1,000); thus, the average—

or mean—of the two prices would be the important statistic.

Pictorial statistics

Taking numerical data and presenting it in pictures or graphs is known as pictorial statistics. Showing data in the form of

a graphic can make complex and confusing information appear more simple and straight ‐forward. Different types of graphs

are used for quantitative and categorical variables.

From: http://en.wikipedia.org/wiki/Statistics

StatisticsFrom Wikipedia, the free encyclopedia

More probability density is found as one gets closer to the expected (mean) value in a normal distribution. Statistics used in standardized testing assessment are

shown. The scales include standard deviations, cumulative percentages, percentile equivalents, Z-scores, T-scores, standard nines, and percentages in standard

nines.

http://en.wikipedia.org/wiki/Probability_density_function



http://en.wikipedia.org/wiki/Normal_distribution



http://en.wikipedia.org/wiki/Standardized_testing



http://en.wikipedia.org/wiki/Standard_deviations



http://en.wikipedia.org/wiki/File:The_Normal_Distribution.svg

http://en.wikipedia.org/wiki/File:The_Normal_Distribution.svg







Scatter plots are used in descriptive statistics to show the observed relationships between different variables.

Statistics is the study of the collection, analysis, interpretation, presentation and organizationof data.[1] In applying statistics to e.g. a scientific, industrial, or societal problem, it is necessary tobegin with a population or process to be studied. Populations can be diverse topics such as "allpersons living in a country" or "every atom composing a crystal". It deals with all aspects of dataincluding the planning of data collection in terms of the design of surveys and experiments.[1] Incase census data cannot be collected, statisticians collect data by developing specific experimentdesigns and surveysamples. Representative sampling assures that inferences and conclusions cansafely extend from the sample to the population as a whole. An experimental study involves takingmeasurements of the system under study, manipulating the system, and then taking additionalmeasurements using the same procedure to determine if the manipulation has modified the valuesof the measurements. In contrast, an observational study does not involve experimentalmanipulation.

Two main statistical methodologies are used in data analysis: descriptive statistics, whichsummarizes data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draws conclusions from data that are subject to random variation(e.g., observational errors, sampling variation).[2] Descriptive statistics are most often concerned withtwo sets of properties of adistribution (sample or population): central tendency (or location) seeks tocharacterize the distribution's central or typical value, while dispersion (or variability ) characterizesthe extent to which members of the distribution depart from its center and each other. Inferences onmathematical statistics are made under the framework of probability theory, which deals with theanalysis of random phenomena. To be able to make an inference upon unknown quantities, one ormore estimators are evaluated using the sample.

http://en.wikipedia.org/wiki/Scatter_plot


http://en.wikipedia.org/wiki/Data


http://en.wikipedia.org/wiki/Statistics#cite_note-Dodge-1



http://en.wikipedia.org/wiki/Population



http://en.wikipedia.org/wiki/Statistical_survey



http://en.wikipedia.org/wiki/Experimental_design





http://en.wikipedia.org/wiki/Census



http://en.wikipedia.org/wiki/Sample_(statistics)


http://en.wikipedia.org/wiki/Experimental_study



http://en.wikipedia.org/wiki/Observational_study



http://en.wikipedia.org/wiki/Descriptive_statistics



http://en.wikipedia.org/wiki/Index_(statistics)



http://en.wikipedia.org/wiki/Mean



http://en.wikipedia.org/wiki/Standard_deviation



http://en.wikipedia.org/wiki/Statistical_inference



http://en.wikipedia.org/wiki/Statistics#cite_note-LundResearchLtd-2



http://en.wikipedia.org/wiki/Central_tendency



http://en.wikipedia.org/wiki/Statistical_dispersion



http://en.wikipedia.org/wiki/Probability_theory



http://en.wikipedia.org/wiki/Estimator



http://en.wikipedia.org/wiki/File:Scatterplot.jpg
























Standard statistical procedure involve the development of a null hypothesis, a general statement ordefault position that there is no relationship between two quantities. Rejecting or disproving the nullhypothesis is a central task in the modern practice of science, and gives a precise sense in which aclaim is capable of being proven false. What statisticians call an alternative hypothesis is simply anhypothesis which contradicts the null hypothesis. Working from a null hypothesis two basic forms oferror are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive")

andType II errors (null hypothesis fails to be rejected and an actual difference between populationsis missed giving a "false negative"). A critical region is the set of values of the estimator which leadsto refuting the null hypothesis. The probability of type I error is therefore the probability that theestimator belongs to the critical region given that null hypothesis is true (statistical significance) andthe probability of type II error is the probability that the estimator doesn't belong to the critical regiongiven that the alternative hypothesis is true. The statistical power of a test is the probability that itcorrectly rejects the null hypothesis when the null hypothesis is false. Multiple problems have cometo be associated with this framework: ranging from obtaining a sufficient sample size to specifying anadequate null hypothesis.

Measurement processes that generate statistical data are also subject to error. Many of these errorsare classified as random (noise) or systematic (bias), but other important types of errors (e.g.,blunder, such as when an analyst reports incorrect units) can also be important. The presence

of missing data and/or censoring may result in biased estimates and specific techniques have beendeveloped to address these problems. Confidence intervals allow statisticians to express howclosely the sample estimate matches the true value in the whole population. Formally, a 95%confidence interval for a value is a range where, if the sampling and analysis were repeated underthe same conditions (yielding a different dataset), the interval would include the true (population)value in 95% of all possible cases. Ways to avoid misuse of statistics include using proper diagramsand avoiding bias. In statistics, dependence is any statistical relationship between two randomvariables or two sets of data. Correlation refers to any of a broad class of statistical relationshipsinvolving dependence. If two variables are correlated, they may or may not be the cause of oneanother. The correlation phenomena could be caused by a third, previously unconsideredphenomenon, called a lurking variable or confounding variable.

Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC,but it was not until the 18th century that it started to draw more heavily fromcalculus and probabilitytheory. Statistics continues to be an area of active research, for example on the problem of how toanalyze Big data.

Contents

[hide]

1 Scope

o 1.1 Mathematical statistics

2 Overview

3 Data collection

o 3.1 Sampling

o 3.2 Experimental and observational studies

4 Types of data 5 Terminology and theory of inferential statistics

o 5.1 Statistics, estimators and pivotal quantities

o 5.2 Null hypothesis and alternative hypothesis

o 5.3 Error

o 5.4 Interval estimation

o 5.5 Significance

o 5.6 Examples

6 Misuse of statistics

o 6.1 Misinterpretation: correlation

http://en.wikipedia.org/wiki/Null_hypothesis



http://en.wikipedia.org/wiki/Alternative_hypothesis



http://en.wikipedia.org/wiki/Type_I_error



http://en.wikipedia.org/wiki/Type_II_error


http://en.wikipedia.org/wiki/Critical_region



http://en.wikipedia.org/wiki/Statistical_power



http://en.wikipedia.org/wiki/Bias_(statistics)



http://en.wikipedia.org/wiki/Missing_data



http://en.wikipedia.org/wiki/Censoring_(statistics)



http://en.wikipedia.org/wiki/Confidence_interval



http://en.wikipedia.org/wiki/Correlation



http://en.wikipedia.org/wiki/Confounding_variable



http://en.wikipedia.org/wiki/Calculus






http://en.wikipedia.org/wiki/Big_data



http://en.wikipedia.org/wiki/Statistics



http://en.wikipedia.org/wiki/Statistics#Scope


http://en.wikipedia.org/wiki/Statistics#Mathematical_statistics


http://en.wikipedia.org/wiki/Statistics#Overview


http://en.wikipedia.org/wiki/Statistics#Data_collection


http://en.wikipedia.org/wiki/Statistics#Sampling


http://en.wikipedia.org/wiki/Statistics#Experimental_and_observational_studies


http://en.wikipedia.org/wiki/Statistics#Types_of_data


http://en.wikipedia.org/wiki/Statistics#Terminology_and_theory_of_inferential_statistics


http://en.wikipedia.org/wiki/Statistics#Statistics.2C_estimators_and_pivotal_quantities


http://en.wikipedia.org/wiki/Statistics#Null_hypothesis_and_alternative_hypothesis


http://en.wikipedia.org/wiki/Statistics#Error


http://en.wikipedia.org/wiki/Statistics#Interval_estimation


http://en.wikipedia.org/wiki/Statistics#Significance


http://en.wikipedia.org/wiki/Statistics#Examples


http://en.wikipedia.org/wiki/Statistics#Misuse_of_statistics


http://en.wikipedia.org/wiki/Statistics#Misinterpretation:_correlation





































7 History of statistical science

8 Trivia

o 8.1 Applied statistics, theoretical statistics and mathematical statistics

o 8.2 Machine learning and data mining

o 8.3 Statistics in society

o 8.4 Statistical computing

o 8.5 Statistics applied to mathematics or the arts

9 Specialized disciplines

10 See also

11 References

Scope[edit]

Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation orexplanation, and presentation of data,[3] or as a branch of mathematics.[4]Some consider statistics tobe a distinct mathematical science rather than a branch of mathematics.[vague][5][6]

Mathematical statistics[edit] Main article: Mathematical statistics

Mathematical statistics is the application of mathematics to statistics, which was originally conceivedas the science of the state — the collection and analysis of facts about a country: its economy, land,military, population, and so forth. Mathematical techniques which are used for thisinclude mathematical analysis, linear algebra, stochastic analysis,differential equations, and measure-theoretic probability theory.[7][8]

Overview [edit]

In applying statistics to e.g. a scientific, industrial, or societal problem, it is necessary to begin witha population or process to be studied. Populations can be diverse topics such as "all persons livingin a country" or "every atom composing a crystal".

Ideally, statisticians compile data about the entire population (an operation called census). This maybe organized by governmental statistical institutes. Descriptive statisticscan be used to summarizethe population data. Numerical descriptors include mean and standard deviation for continuousdata types (like income), while frequency and percentage are more useful in terms ofdescribing categorical data (like race).

When a census is not feasible, a chosen subset of the population called a sample is studied. Once asample that is representative of the population is determined, data is collected for the samplemembers in an observational or experimental setting. Again, descriptive statistics can be used tosummarize the sample data. However, the drawing of the sample has been subject to an element ofrandomness, hence the established numerical descriptors from the sample are also due touncertainty. In order to still draw meaningful conclusions about the entire population, inferentialstatistics is needed. It uses patterns in the sample data to draw inferences about the populationrepresented, accounting for randomness. These inferences may take the form of: answering yes/noquestions about the data (hypothesis testing), estimating numerical characteristics of the data(estimation), describing associations within the data (correlation) and modeling relationships withinthe data (for example, using regression analysis). Inference can extend toforecasting, prediction andestimation of unobserved values either in or associated with the population being studied; it caninclude extrapolation and interpolation of time series or spatial data, and can also include datamining.

Data collection[edit]

http://en.wikipedia.org/wiki/Statistics#History_of_statistical_science


http://en.wikipedia.org/wiki/Statistics#Trivia


http://en.wikipedia.org/wiki/Statistics#Applied_statistics.2C_theoretical_statistics_and_mathematical_statistics


http://en.wikipedia.org/wiki/Statistics#Machine_learning_and_data_mining


http://en.wikipedia.org/wiki/Statistics#Statistics_in_society


http://en.wikipedia.org/wiki/Statistics#Statistical_computing


http://en.wikipedia.org/wiki/Statistics#Statistics_applied_to_mathematics_or_the_arts


http://en.wikipedia.org/wiki/Statistics#Specialized_disciplines


http://en.wikipedia.org/wiki/Statistics#See_also


http://en.wikipedia.org/wiki/Statistics#References


http://en.wikipedia.org/w/index.php?title=Statistics&action=edit&section=1





http://en.wikipedia.org/wiki/Statistics#cite_note-3



http://en.wikipedia.org/wiki/Mathematics




http://en.wikipedia.org/wiki/Wikipedia:Vagueness










http://en.wikipedia.org/wiki/Mathematical_statistics






http://en.wikipedia.org/wiki/Mathematical_analysis



http://en.wikipedia.org/wiki/Linear_algebra



http://en.wikipedia.org/wiki/Stochastic_analysis


http://en.wikipedia.org/wiki/Differential_equations



http://en.wikipedia.org/wiki/Measure-theoretic_probability_theory








http://en.wikipedia.org/wiki/Statistical_population














http://en.wikipedia.org/wiki/Continuous_probability_distribution




http://en.wikipedia.org/wiki/Categorical_data



http://en.wikipedia.org/wiki/Sampling_(statistics)



http://en.wikipedia.org/wiki/Experiment



http://en.wikipedia.org/wiki/Inferential_statistics




http://en.wikipedia.org/wiki/Hypothesis_testing



http://en.wikipedia.org/wiki/Estimation_theory



http://en.wikipedia.org/wiki/Association_(statistics)



http://en.wikipedia.org/wiki/Correlation_and_dependence



http://en.wikipedia.org/wiki/Regression_analysis



http://en.wikipedia.org/wiki/Forecasting



http://en.wikipedia.org/wiki/Prediction



http://en.wikipedia.org/wiki/Extrapolation



http://en.wikipedia.org/wiki/Interpolation



http://en.wikipedia.org/wiki/Spatial_data_analysis



http://en.wikipedia.org/wiki/Data_mining































































Sampling[edit]

In case census data cannot be collected, statisticians collect data by developing specific experimentdesigns and survey samples. Statistics itself also provides tools for prediction and forecasting theuse of data through statistical models. To use a sample as a guide to an entire population, it isimportant that it truly represent the overall population. Representative sampling assures thatinferences and conclusions can safely extend from the sample to the population as a whole. A majorproblem lies in determining the extent that the sample chosen is actually representative. Statisticsoffers methods to estimate and correct for any random trending within the sample and datacollection procedures. There are also methods of experimental design for experiments that canlessen these issues at the outset of a study, strengthening its capability to discern truths about thepopulation.

Sampling theory is part of the mathematical discipline of probability theory. Probability is used in"mathematical statistics" (alternatively, "statistical theory") to study the samplingdistributions of sample statistics and, more generally, the properties of statistical procedures. Theuse of any statistical method is valid when the system or population under consideration satisfies theassumptions of the method. The difference in point of view between classic probability theory andsampling theory is, roughly, that probability theory starts from the given parameters of a totalpopulation to deduce probabilities that pertain to samples. Statistical inference, however, moves inthe opposite direction—inductively inferring from samples to the parameters of a larger or totalpopulation.

Experimental and observational studies[edit]

A common goal for a statistical research project is to investigate causality, and in particular to draw aconclusion on the effect of changes in the values of predictors or independentvariables on dependent variables or response. There are two major types of causal statisticalstudies: experimental studies and observational studies. In both types of studies, the effect ofdifferences of an independent variable (or variables) on the behavior of the dependent variable areobserved. The difference between the two types lies in how the study is actually conducted. Eachcan be very effective. An experimental study involves taking measurements of the system understudy, manipulating the system, and then taking additional measurements using the same procedure

to determine if the manipulation has modified the values of the measurements. In contrast, anobservational study does not involve experimental manipulation. Instead, data are gathered andcorrelations between predictors and response are investigated. While the tools of data analysis workbest on data from randomized studies, they are also applied to other kinds of data – like naturalexperiments and observational studies[9] – for which a statistician would use a modified, morestructured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that will produce consistent estimators.

Exper iments [ edit ]

The basic steps of a statistical experiment are:

1. Planning the research, including finding the number of replicates of the study, using the

following information: preliminary estimates regarding the size of treatmenteffects,alternative hypotheses, and the estimated experimental variability. Consideration ofthe selection of experimental subjects and the ethics of research is necessary. Statisticiansrecommend that experiments compare (at least) one new treatment with a standardtreatment or control, to allow an unbiased estimate of the difference in treatment effects.

2. Design of experiments, using blocking to reduce the influence of confounding variables, and randomized assignment of treatments to subjects to allow unbiased estimatesoftreatment effects and experimental error. At this stage, the experimenters and statisticians




http://en.wikipedia.org/wiki/Design_of_experiments




http://en.wikipedia.org/wiki/Survey_sampling



http://en.wikipedia.org/wiki/Statistical_model















http://en.wikipedia.org/wiki/Statistical_theory



http://en.wikipedia.org/wiki/Sampling_distribution




http://en.wikipedia.org/wiki/Sample_statistic



http://en.wikipedia.org/wiki/Statistical_decision_theory



http://en.wikipedia.org/wiki/Deductive_reasoning



http://en.wikipedia.org/wiki/Inductive_reasoning





http://en.wikipedia.org/wiki/Causality



http://en.wikipedia.org/wiki/Independent_variable




http://en.wikipedia.org/wiki/Dependent_variable



http://en.wikipedia.org/wiki/Controlled_experiment






http://en.wikipedia.org/wiki/Natural_experiments




http://en.wikipedia.org/wiki/Observational_studies




http://en.wikipedia.org/wiki/Difference_in_differences



http://en.wikipedia.org/wiki/Instrumental_variables



http://en.wikipedia.org/wiki/Consistent_estimators






http://en.wikipedia.org/wiki/Average_treatment_effect






http://en.wikipedia.org/wiki/Experimental_error





http://en.wikipedia.org/wiki/Blocking_(statistics)






http://en.wikipedia.org/wiki/Randomized_assignment



http://en.wikipedia.org/wiki/Bias_of_an_estimator












































write the experimental protocol that shall guide the performance of the experiment and thatspecifies the primary analysis of the experimental data.

3. Performing the experiment following the experimental protocol and analyzing thedata following the experimental protocol.

4. Further examining the data set in secondary analyses, to suggest new hypotheses for futurestudy.

5. Documenting and presenting the results of the study.

Experiments on human behavior have special concerns. The famous Hawthorne study examinedchanges to the working environment at the Hawthorne plant of the Western Electric Company. Theresearchers were interested in determining whether increased illumination would increase theproductivity of the assembly line workers. The researchers first measured the productivity in theplant, then modified the illumination in an area of the plant and checked if the changes in illuminationaffected productivity. It turned out that productivity indeed improved (under the experimentalconditions). However, the study is heavily criticized today for errors in experimental procedures,specifically for the lack of acontrol group and blindness. The Hawthorne effect refers to finding thatan outcome (in this case, worker productivity) changed due to observation itself. Those in theHawthorne study became more productive not because the lighting was changed but because theywere being observed.[10]

Observat ional study [ edit ]

An example of an observational study is one that explores the correlation between smoking and lungcancer. This type of study typically uses a survey to collect observations about the area of interestand then performs statistical analysis. In this case, the researchers would collect observations ofboth smokers and non-smokers, perhaps through acase-control study, and then look for the numberof cases of lung cancer in each group.

Types of data[edit] Main articles: Statistical data type and Levels of measurement

Various attempts have been made to produce a taxonomy of levels of measurement. The

psychophysicist Stanley Smith Stevens defined nominal, ordinal, interval, and ratio scales. Nominalmeasurements do not have meaningful rank order among values, and permit any one-to-onetransformation. Ordinal measurements have imprecise differences between consecutive values, buthave a meaningful order to those values, and permit any order-preserving transformation. Intervalmeasurements have meaningful distances between measurements defined, but the zero value isarbitrary (as in the case with longitude and temperature measurements in Celsius or Fahrenheit),and permit any linear transformation. Ratio measurements have both a meaningful zero value andthe distances between different measurements defined, and permit any rescaling transformation.

Because variables conforming only to nominal or ordinal measurements cannot be reasonablymeasured numerically, sometimes they are grouped together as categorical variables, whereas ratioand interval measurements are grouped together as quantitative variables, which can beeither discrete or continuous, due to their numerical nature. Such distinctions can often be loosely

correlated with data type in computer science, in that dichotomous categorical variables may berepresented with the Boolean data type, polytomous categorical variables with arbitrarilyassigned integers in the integral data type, and continuous variables with the real datatype involving floating point computation. But the mapping of computer science data types tostatistical data types depends on which categorization of the latter is being implemented.

Other categorizations have been proposed. For example, Mosteller and Tukey (1977)[11] distinguishedgrades, ranks, counted fractions, counts, amounts, and balances. Nelder (1990)[12] described

http://en.wikipedia.org/wiki/Protocol_(natural_sciences)






http://en.wikipedia.org/wiki/Analysis_of_variance




http://en.wikipedia.org/wiki/Hawthorne_study



http://en.wikipedia.org/wiki/Western_Electric_Company



http://en.wikipedia.org/wiki/Assembly_line



http://en.wikipedia.org/wiki/Control_group


http://en.wikipedia.org/wiki/Double-blind



http://en.wikipedia.org/wiki/Hawthorne_effect



http://en.wikipedia.org/wiki/Statistics#cite_note-pmid17608932-10






http://en.wikipedia.org/wiki/Case-control_study






http://en.wikipedia.org/wiki/Statistical_data_type



http://en.wikipedia.org/wiki/Levels_of_measurement



http://en.wikipedia.org/wiki/Level_of_measurement



http://en.wikipedia.org/wiki/Stanley_Smith_Stevens



http://en.wikipedia.org/wiki/Longitude



http://en.wikipedia.org/wiki/Temperature



http://en.wikipedia.org/wiki/Celsius



http://en.wikipedia.org/wiki/Fahrenheit



http://en.wikipedia.org/wiki/Categorical_variable



http://en.wikipedia.org/wiki/Variable_(mathematics)#Applied_statistics



http://en.wikipedia.org/wiki/Probability_distribution#Discrete_probability_distribution



http://en.wikipedia.org/wiki/Probability_distribution#Continuous_probability_distribution



http://en.wikipedia.org/wiki/Data_type



http://en.wikipedia.org/wiki/Boolean_data_type



http://en.wikipedia.org/wiki/Integer



http://en.wikipedia.org/wiki/Integer_(computer_science)



http://en.wikipedia.org/wiki/Real_data_type




http://en.wikipedia.org/wiki/Floating_point














































continuous counts, continuous ratios, count ratios, and categorical modes of data. See alsoChrisman (1998),[13] van den Berg (1991).[14]

The issue of whether or not it is appropriate to apply different kinds of statistical methods to dataobtained from different kinds of measurement procedures is complicated by issues concerning thetransformation of variables and the precise interpretation of research questions. "The relationshipbetween the data and what they describe merely reflects the fact that certain kinds of statistical

statements may have truth values which are not invariant under some transformations. Whether ornot a transformation is sensible to contemplate depends on the question one is trying to answer"(Hand, 2004, p. 82).[15]

Terminology and theory of inferential statistics[edit] Statistics, estimators and pivotal quantities[edit]

Consider an independent identically distributed (iid) random variables with a given probabilitydistribution: standard statistical inference and estimation theory defines a random sample asthe random vector given by the column vector of these iid variables.[16] The population beingexamined is described by a probability distribution which may have unknown parameters.

A statistic is random variable which is a function of the random sample, but not a function ofunknown parameters. The probability distribution of the statistic, though, may have unknownparameters.

Consider now a function of the unknown parameter: an estimator is a statistic used to estimate suchfunction. Commonly used estimators include sample mean, unbiased sample variance and samplecovariance.

A random variable which is a function of the random sample and of the unknown parameter, butwhose probability distribution does not depend on the unknown parameter is called a pivotalquantity or pivot. Widely used pivots include the z-score, the chi square statistic and Student's t-value.

Between two estimators of a given parameter, the one with lower mean squared error is said to bemore efficient. Furthermore an estimator is said to be unbiased if it's expected value is equal to thetrue value of the unknown parameter which is being estimated and asymptotically unbiased if itsexpected value converges at the limit to the true value of such parameter.

Other desirable properties for estimators include: UMVUE estimators which have the lowest variancefor all possible values of the parameter to be estimated (this is usually an easier property to verifythan efficiency) and consistent estimators which converges in probability to the true value of suchparameter.

This still leaves the question of how to obtain estimators in a given situation and carry thecomputation, several methods have been proposed: the method of moments, themaximumlikelihood method, the least squares method and the more recent method of estimating equations.

Null hypothesis and alternative hypothesis[edit]

Interpretation of statistical information can often involve the development of a null hypothesis in thatthe assumption is that whatever is proposed as a cause has no effect on the variable beingmeasured.

The best illustration for a novice is the predicament encountered by a jury trial. The null hypothesis,H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H 1, asserts that thedefendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo)stands in opposition to H1 and is maintained unless H1 is supported by evidence "beyond areasonable doubt". However, "failure to reject H 0" in this case does not imply innocence, but merely
















http://en.wikipedia.org/wiki/Independent_identically_distributed



http://en.wikipedia.org/wiki/Probability_distribution










http://en.wikipedia.org/wiki/Random_sample



http://en.wikipedia.org/wiki/Random_vector



http://en.wikipedia.org/wiki/Column_vector



http://en.wikipedia.org/wiki/Statistics#cite_note-Piazza-16






http://en.wikipedia.org/wiki/Statistic






http://en.wikipedia.org/wiki/Sample_mean#Sample_mean



http://en.wikipedia.org/wiki/Sample_variance



http://en.wikipedia.org/wiki/Sample_covariance#Sample_covariance




http://en.wikipedia.org/wiki/Pivotal_quantity




http://en.wikipedia.org/wiki/Z-score



http://en.wikipedia.org/wiki/Chi-squared_distribution#Applications



http://en.wikipedia.org/wiki/Student%27s_t-distribution#How_the_t-distribution_arises




http://en.wikipedia.org/wiki/Mean_squared_error



http://en.wikipedia.org/wiki/Efficient_estimator



http://en.wikipedia.org/wiki/Unbiased_estimator



http://en.wikipedia.org/wiki/Expected_value



http://en.wikipedia.org/wiki/Limit_(mathematics)



http://en.wikipedia.org/wiki/UMVUE



http://en.wikipedia.org/wiki/Consistency_(statistics)



http://en.wikipedia.org/wiki/Converges_in_probability



http://en.wikipedia.org/wiki/Method_of_moments_(statistics)



http://en.wikipedia.org/wiki/Maximum_likelihood



http://en.wikipedia.org/wiki/Least_squares



http://en.wikipedia.org/wiki/Estimating_equations





















































that the evidence was insufficient to convict. So the jury does not necessarily accept H0 but fails toreject H0. While one can not "prove" a null hypothesis, one can test how close it is to being true withapower test, which tests for type II errors.

What statisticians call an alternative hypothesis is simply an hypothesis which contradicts the nullhypothesis.

Error [edit] Working from a null hypothesis two basic forms of error are recognized:

Type I errors where the null hypothesis is falsely rejected giving a "false positive".

Type II errors where the null hypothesis fails to be rejected and an actual difference betweenpopulations is missed giving a "false negative".

Standard deviation refers to the extent to which individual observations in a sample differ from acentral value, such as the sample or population mean, while Standard error r efers to an estimate ofdifference between sample mean and population mean.

A statistical error is the amount by which an observation differs from its expected value, a residual isthe amount an observation differs from the value the estimator of the expected value assumes on a

given sample (also called prediction).

Mean squared error is used for obtaining efficient estimators, a widely used class of estimators. Rootmean square error is simply the square root of mean squared error.

A least squares fit: in red the points to be fitted, in blue the fitted line.

Many statistical methods seek to minimize the residual sum of squares, and these are called"methods of least squares" in contrast toLeast absolute deviations. The later gives equal weight tosmall and big errors, while the former gives more weight to large errors. Residual sum of squares isalso differentiable, which provides a handy property for doing regression. Least squares appliedto linear regression is called ordinary least squares method and least squares applied to nonlinearregression is called non-linear least squares. Also in a linear regression model the non deterministicpart of the model is called error term, disturbance or more simply noise.

Measurement processes that generate statistical data are also subject to error. Many of these errorsare classified as random (noise) or systematic (bias), but other important types of errors (e.g.,blunder, such as when an analyst reports incorrect units) can also be important. The presenceof missing data and/or censoring may result in biased estimates and specific techniques have beendeveloped to address these problems.[17]

Interval estimation[edit] Main article: Interval estimation



http://en.wikipedia.org/wiki/Statisticians
















http://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Type_I_error


http://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Type_II_error




http://en.wikipedia.org/wiki/Standard_error_(statistics)#Standard_error_of_the_mean



http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics#Introduction











http://en.wikipedia.org/wiki/Efficient_estimators



http://en.wikipedia.org/wiki/Root_mean_square_error




http://en.wikipedia.org/wiki/Residual_sum_of_squares






http://en.wikipedia.org/wiki/Least_absolute_deviations



http://en.wikipedia.org/wiki/Differentiable






http://en.wikipedia.org/wiki/Linear_regression



http://en.wikipedia.org/wiki/Ordinary_least_squares



http://en.wikipedia.org/wiki/Nonlinear_regression




http://en.wikipedia.org/wiki/Non-linear_least_squares



http://en.wikipedia.org/wiki/Random_error



http://en.wikipedia.org/wiki/Systematic_error



http://en.wikipedia.org/wiki/Bias


















http://en.wikipedia.org/wiki/Interval_estimation



http://en.wikipedia.org/wiki/File:Linear_least_squares(2).svg








































Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations.

Most studies only sample part of a population, so results don't fully represent the whole population. Any estimates obtained from the sample only approximate the population value. Confidenceintervals allow statisticians to express how closely the sample estimate matches the true value in thewhole population. Often they are expressed as 95% confidence intervals. Formally, a 95%confidence interval for a value is a range where, if the sampling and analysis were repeated underthe same conditions (yielding a different dataset), the interval would include the true (population)value in 95% of all possible cases. This does not imply that the probability that the true value is inthe confidence interval is 95%. From the frequentist perspective, such a claim does not even makesense, as the true value is not a random variable. Either the true value is or is not within the giveninterval. However, it is true that, before any data are sampled and given a plan for how to construct

the confidence interval, the probability is 95% that the yet-to-be-calculated interval will cover the truevalue: at this point, the limits of the interval are yet-to-be-observed random variables. One approachthat does yield an interval that can be interpreted as having a given probability of containing the truevalue is to use a credible interval from Bayesian statistics: this approach depends on a different wayof interpreting what is meant by "probability", that is as a Bayesian probability.

In principle confidence intervals can be symmetrical or asymmetrical. An interval can beasymmetrical because it works as lower or upper bound for a parameter (left-sided interval or rightsided interval), but it can also be asymmetrical because the two sided interval is built violatingsymmetry around the estimate. Sometimes the bounds for a confidence interval are reachedasymptotically and these are used to approximate the true bounds.

Significance[edit] Main article: Statistical significance

Statistics rarely give a simple Yes/No type answer to the question under analysis. Interpretationoften comes down to the level of statistical significance applied to the numbers and often refers tothe probability of a value accurately rejecting the null hypothesis (sometimes referred to as the p-value).

http://en.wikipedia.org/wiki/Confidence_intervals






http://en.wikipedia.org/wiki/Frequentist_inference



http://en.wikipedia.org/wiki/Random_variable






http://en.wikipedia.org/wiki/Credible_interval



http://en.wikipedia.org/wiki/Bayesian_statistics



http://en.wikipedia.org/wiki/Probability_interpretations



http://en.wikipedia.org/wiki/Bayesian_probability






http://en.wikipedia.org/wiki/Statistical_significance



http://en.wikipedia.org/wiki/P-value




http://en.wikipedia.org/wiki/File:NYW-confidence-interval.svg

















In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed

value of the test statistic) and the p-value is represented by the green area.

The standard approach[16] is to test a null hypothesis against an alternative hypothesis. A criticalregion is the set of values of the estimator which leads to refuting the null hypothesis. The probability

of type I error is therefore the probability that the estimator belongs to the critical region given thatnull hypothesis is true (statistical significance) and the probability of type II error is the probabilitythat the estimator doesn't belong to the critical region given that the alternative hypothesis is true.The statistical power of a test is the probability that it correctly rejects the null hypothesis when thenull hypothesis is false.

Referring to statistical significance does not necessarily mean that the overall result is significant inreal world terms. For example, in a large study of a drug it may be shown that the drug has astatistically significant but very small beneficial effect, such that the drug is unlikely to help thepatient noticeably.

While in principle the acceptable level of statistical significance may be subject to debate, the p-value is the smallest significance level which allows the test to reject the null hypothesis. This is

logically equivalent to saying that the p-value is the probability, assuming the null hypothesis is true,of observing a result at least as extreme as the test statistic. Therefore the smaller the p-value, thelower the probability of committing type I error.

Some problems are usually associated with this framework (See criticism of hypothesis testing):

A difference that is highly statistically significant can still be of no practical significance, but it ispossible to properly formulate tests in account for this. One response involves going beyondreporting only thesignificance level to include the p-value when reporting whether a hypothesis isrejected or accepted. The p-value, however, does not indicate the size or importance of the

http://en.wikipedia.org/wiki/Test_statistic



http://en.wikipedia.org/wiki/Critical_region#Definition_of_terms




























http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Criticism



http://en.wikipedia.org/wiki/Significance_level






http://en.wikipedia.org/wiki/Effect_size



http://en.wikipedia.org/wiki/File:P-value_in_statistical_significance_testing.svg



















observed effect and can also seem to exaggerate the importance of minor differences in largestudies. A better and increasingly common approach is to report confidence intervals. Althoughthese are produced from the same calculations as those of hypothesis tests or p-values, theydescribe both the size of the effect and the uncertainty surrounding it.

Fallacy of the transposed conditional, aka prosecutor's fallacy: criticisms arise because thehypothesis testing approach forces one hypothesis (the null hypothesis) to be favored, since

what is being evaluated is probability of the observed result given the null hypothesis and notprobability of the null hypothesis given the observed result. An alternative to this approach isoffered by Bayesian inference, although it requires establishing a prior probability.[18]

Rejecting the null hypothesis does not automatically prove the alternative hypothesis.

As everything in inferential statistics it relies on sample size, and therefore under fat tails p-values may be seriously mis-computed.

Examples[edit]

Some well-known statistical tests and procedures are:

Analysis of variance (ANOVA)

Chi-squared test

Correlation Factor analysis

Mann –Whitney U

Mean square weighted deviation (MSWD)

Pearson product-moment correlation coefficient

Regression analysis

Spearman's rank correlation coefficient

Student's t -test

Time series analysis

Misuse of statistics[edit]

Main article: Misuse of statistics

Misuse of statistics can produce subtle, but serious errors in description and interpretation—subtle inthe sense that even experienced professionals make such errors, and serious in the sense that theycan lead to devastating decision errors. For instance, social policy, medical practice, and thereliability of structures like bridges all rely on the proper use of statistics.

Even when statistical techniques are correctly applied, the results can be difficult to interpret forthose lacking expertise. The statistical significance of a trend in the data—which measures theextent to which a trend could be caused by random variation in the sample —may or may not agreewith an intuitive sense of its significance. The set of basic statistical skills (and skepticism) thatpeople need to deal with information in their everyday lives properly is referred to as statisticalliteracy.

There is a general perception that statistical knowledge is all-too-frequently intentionally misused byfinding ways to interpret only the data that are favorable to the presenter .[19] A mistrust andmisunderstanding of statistics is associated with the quotation, "There are three kinds of lies: lies,damned lies, and statistics". Misuse of statistics can be both inadvertent and intentional, and thebook How to Lie with Statistics [19] outlines a range of considerations. In an attempt to shed light on theuse and misuse of statistics, reviews of statistical techniques used in particular fields are conducted(e.g. Warne, Lazo, Ramos, and Ritter (2012)).[20]




http://en.wikipedia.org/wiki/Prosecutor%27s_fallacy






http://en.wikipedia.org/wiki/Bayesian_inference



http://en.wikipedia.org/wiki/Prior_probability


http://en.wikipedia.org/wiki/Statistics#cite_note-Ioannidis2005-18






http://en.wikipedia.org/wiki/Fat_tails






http://en.wikipedia.org/wiki/Statistical_hypothesis_testing



http://en.wikipedia.org/wiki/Procedure_(term)





http://en.wikipedia.org/wiki/Chi-squared_test




http://en.wikipedia.org/wiki/Factor_analysis


http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U





http://en.wikipedia.org/wiki/Mean_square_weighted_deviation


http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient




http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient


http://en.wikipedia.org/wiki/Student%27s_t-test




http://en.wikipedia.org/wiki/Time_series_analysis





http://en.wikipedia.org/wiki/Misuse_of_statistics








http://en.wikipedia.org/wiki/Statistical_literacy







http://en.wikipedia.org/wiki/Statistics#cite_note-Huff-19


http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics




http://en.wikipedia.org/wiki/How_to_Lie_with_Statistics












































Ways to avoid misuse of statistics include using proper diagrams and avoiding bias.[21] Misuse canoccur when conclusions are overgeneralized and claimed to be representative of more than theyreally are, often by either deliberately or unconsciously overlooking sampling bias.[22] Bar graphs arearguably the easiest diagrams to use and understand, and they can be made either by hand or withsimple computer programs.[21] Unfortunately, most people do not look for bias or errors, so they arenot noticed. Thus, people may often believe that something is true even if it is not

well represented.[22]

To make data gathered from statistics believable and accurate, the sample takenmust be representative of the whole.[23] According to Huff, "The dependability of a sample can bedestroyed by [bias]... allow yourself some degree of skepticism."[24]

To assist in the understanding of statistics Huff proposed a series of questions to be asked in eachcase:[24]

Who says so? (Does he/she have an axe to grind?)

How does he/she know? (Does he/she have the resources to know the facts?)

What’s missing? (Does he/she give us a complete picture?)

Did someone change the subject? (Does he/she offer us the right answer to the wrongproblem?)

Does it make sense? (Is his/her conclusion logical and consistent with what we already know?)

The confounding variableproblem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third

variable Z . Z is called a confounding factor.

Misinterpretation: correlation[edit]

The concept of correlation is particularly noteworthy for the potential confusion it can cause.Statistical analysis of a data set often reveals that two variables (properties) of the population underconsideration tend to vary together, as if they were connected. For example, a study of annualincome that also looks at age of death might find that poor people tend to have shorter lives thanaffluent people. The two variables are said to be correlated; however, they may or may not be thecause of one another. The correlation phenomena could be caused by a third, previouslyunconsidered phenomenon, called a lurking variable or confounding variable. For this reason, thereis no way to immediately infer the existence of a causal relationship between the two variables.(See Correlation does not imply causation.)

History of statistical science[edit]



http://en.wikipedia.org/wiki/Statistics#cite_note-Statistics_in_Archaeology-21



http://en.wikipedia.org/wiki/Hasty_generalization



http://en.wikipedia.org/wiki/Statistics#cite_note-Misuse_of_Statistics-22











http://en.wikipedia.org/wiki/Statistics#cite_note-Modern_Elementary_Statistics-23



http://en.wikipedia.org/wiki/Statistics#cite_note-How_to_Lie_with_Statistics-24















http://en.wikipedia.org/wiki/Data_set






http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation






http://en.wikipedia.org/wiki/File:Simple_Confounding_Case.svg




















Blaise Pascal, an early pioneer on the mathematics of probability.

Main articles: History of statistics and Founders of statistics

Statistical methods date back at least to the 5th century BC.

Some scholars pinpoint the origin of statistics to 1663, with the publication of Natural and PoliticalObservations upon the Bills of Mortality by John Graunt.[25] Early applications of statistical thinkingrevolved around the needs of states to base policy on demographic and economic data, henceits stat- etymology. The scope of the discipline of statistics broadened in the early 19th century toinclude the collection and analysis of data in general. Today, statistics is widely employed ingovernment, business, and natural and social sciences.

Its mathematical foundations were laid in the 17th century with the development of the probabilitytheory by Blaise Pascal and Pierre de Fermat. Mathematical probability theory arose from the studyof games of chance, although the concept of probability was already examined in medieval law andby philosophers such as Juan Caramuel.[26] The method of least squares was first describedby Adrien-Marie Legendre in 1805.

Karl Pearson, the founder of mathematical statistics.

The modern field of statistics emerged in the late 19th and early 20th century in three stages.[27] Thefirst wave, at the turn of the century, was led by the work of Sir Francis Galton and Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just inscience, but in industry and politics as well. Galton's contributions to the field included introducing

http://en.wikipedia.org/wiki/Blaise_Pascal


http://en.wikipedia.org/wiki/History_of_statistics



http://en.wikipedia.org/wiki/Founders_of_statistics



http://en.wikipedia.org/wiki/John_Graunt





http://en.wikipedia.org/wiki/History_of_statistics#Etymology











http://en.wikipedia.org/wiki/Pierre_de_Fermat






http://en.wikipedia.org/wiki/Medieval_Roman_law



http://en.wikipedia.org/wiki/Juan_Caramuel





http://en.wikipedia.org/wiki/Method_of_least_squares



http://en.wikipedia.org/wiki/Adrien-Marie_Legendre



http://en.wikipedia.org/wiki/Karl_Pearson





http://en.wikipedia.org/wiki/Francis_Galton






http://en.wikipedia.org/wiki/File:Mw160883.jpg

http://en.wikipedia.org/wiki/File:Blaise_Pascal_Versailles.JPG

http://en.wikipedia.org/wiki/File:Mw160883.jpg

http://en.wikipedia.org/wiki/File:Blaise_Pascal_Versailles.JPG























the concepts of standard deviation, correlation, regression and the application of these methods tothe study of the variety of human characteristics – height, weight, eyelash length amongothers.[28] Pearson developed the Correlation coefficient, defined as a product-moment,[29] the methodof moments for the fitting of distributions to samples and the Pearson's system of continuous curves, among many other things.[30] Galton and Pearson founded Biometrika as the first journal ofmathematical statistics and biometry, and the latter founded the world's first university statistics

department at University College London.[31]

Ronald Fisher coined the term "null hypothesis".

The second wave of the 1910s and 20s was initiated by William Gosset, and reached its culminationin the insights of Sir Ronald Fisher , who wrote the textbooks that were to define the academicdiscipline in universities around the world. Fisher's most important publications were his 1916

seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance andhis classic 1925 wor kStatistical Methods for Research Workers. His paper was the first to use thestatistical term, variance. He developed rigorousexperimental models and also originated theconcepts of sufficiency, ancillary statistics, Fisher's linear discriminator and Fisher information.[32]

The final wave, which mainly saw the refinement and expansion of earlier developments, emergedfrom the collaborative work betweenEgon Pearson and Jerzy Neyman in the 1930s. They introducedthe concepts of "Type II" error, power of a test and confidence intervals.Jerzy Neyman in 1934showed that stratified random sampling was in general a better method of estimation than purposive(quota) sampling.[33]

Today, statistical methods are applied in all fields that involve decision making, for making accurateinferences from a collated body of data and for making decisions in the face of uncertainty based onstatistical methodology. The use of modern computers has expedited large-scale statistical

computations, and has also made possible new methods that are impractical to perform manually.Statistics continues to be an area of active research, for example on the problem of how toanalyze Big data.[34]

Trivia[edit] Applied statistics, theoretical statistics and mathematical statistics [edit]

"Applied statistics" comprises descriptive statistics and the application of inferential statistics.[35][verification

needed ] Theoretical statisticsconcerns both the logical arguments underlying justification of approaches










http://en.wikipedia.org/wiki/Statistics#cite_note-Galton1877-28













http://en.wikipedia.org/wiki/Pearson_distribution



http://en.wikipedia.org/wiki/Statistics#cite_note-Pearson.2C_On_the_criterion-30



http://en.wikipedia.org/wiki/Biometrika



http://en.wikipedia.org/wiki/University_College_London





http://en.wikipedia.org/wiki/Ronald_Fisher





http://en.wikipedia.org/wiki/William_Gosset






http://en.wikipedia.org/wiki/The_Correlation_between_Relatives_on_the_Supposition_of_Mendelian_Inheritance



http://en.wikipedia.org/wiki/Statistical_Methods_for_Research_Workers



http://en.wikipedia.org/wiki/Variance






http://en.wikipedia.org/wiki/Sufficiency_(statistics)



http://en.wikipedia.org/wiki/Ancillary_statistic



http://en.wikipedia.org/wiki/Linear_discriminant_analysis



http://en.wikipedia.org/wiki/Fisher_information





http://en.wikipedia.org/wiki/Egon_Pearson



http://en.wikipedia.org/wiki/Jerzy_Neyman



http://en.wikipedia.org/wiki/Type_I_and_type_II_errors











http://en.wikipedia.org/wiki/Computer
















http://en.wikipedia.org/wiki/Wikipedia:Verifiability






http://en.wikipedia.org/wiki/File:R._A._Fischer.jpg











































to statistical inference, as well encompassing mathematical statistics. Mathematical statisticsincludes not only the manipulation of probability distributions necessary for deriving results related tomethods of estimation and inference, but also various aspects of computational statistics andthe design of experiments.

Machine learning and data mining[edit]

Statistics has many ties to machine learning and data mining.

Statistics in society[edit]

Statistics is applicable to a wide variety of academic disciplines, including natural and socialsciences, government, and business. Statistical consultants can help organizations and companiesthat don't have in-house expertise relevant to their particular questions.

Statistical computing[edit]

gretl, an example of an open source statistical package

Main article: Computational statistics

The rapid and sustained increases in computing power starting from the second half of the 20thcentury have had a substantial impact on the practice of statistical science. Early statistical modelswere almost always from the class of linear models, but powerful computers, coupled with suitablenumerical algorithms, caused an increased interest in nonlinear models (such as neural networks) as well as the creation of new types, such as generalized linear models and multilevel models.

Increased computing power has also led to the growing popularity of computationally intensivemethods based onresampling, such as permutation tests and the bootstrap, while techniques suchas Gibbs sampling have made use of Bayesian models more feasible. The computer revolution hasimplications for the future of statistics with new emphasis on "experimental" and "empirical" statistics.

A large number of both general and special purpose statistical software are now available.

Statistics applied to mathematics or the arts[edit]

Traditionally, statistics was concerned with drawing inferences using a semi-standardizedmethodology that was "required learning" in most sciences. This has changed with use of statistics innon-inferential contexts. What was once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically.[according to whom?] Initially derided by some mathematical purists,it is now considered essential methodology in certain areas.







http://en.wikipedia.org/wiki/Computational_statistics









http://en.wikipedia.org/wiki/Machine_learning








http://en.wikipedia.org/wiki/Academic_discipline



http://en.wikipedia.org/wiki/Natural



http://en.wikipedia.org/wiki/Social_science




http://en.wikipedia.org/wiki/Statistical_consultant






http://en.wikipedia.org/wiki/Gretl


http://en.wikipedia.org/wiki/Open_source






http://en.wikipedia.org/wiki/Linear_model



http://en.wikipedia.org/wiki/Algorithms






http://en.wikipedia.org/wiki/Neural_networks



http://en.wikipedia.org/wiki/Generalized_linear_model



http://en.wikipedia.org/wiki/Multilevel_model



http://en.wikipedia.org/wiki/Resampling_(statistics)


http://en.wikipedia.org/wiki/Bootstrapping_(statistics)



http://en.wikipedia.org/wiki/Gibbs_sampling



http://en.wikipedia.org/wiki/List_of_statistical_packages






http://en.wikipedia.org/wiki/Wikipedia:Avoid_weasel_words




http://en.wikipedia.org/wiki/File:Gretl_screenshot.png
































In number theory, scatter plots of data generated by a distribution function may be transformedwith familiar tools used in statistics to reveal underlying patterns, which may then lead tohypotheses.

Methods of statistics including predictive methods in forecasting are combined with chaostheory and fractal geometry to create video works that are considered to have great beauty.

The process art of Jackson Pollock relied on artistic experiments whereby underlying

distributions in nature were artistically revealed.[citation needed ] With the advent of computers, statisticalmethods were applied to formalize such distribution-driven natural processes to make andanalyze moving video art.[citation needed ]

Methods of statistics may be used predicatively in performance art, as in a card trick based ona Markov process that only works some of the time, the occasion of which can be predictedusing statistical methodology.

Statistics can be used to predicatively create art, as in the statistical or stochastic music inventedby Iannis Xenakis, where the music is performance-specific. Though this type of artistry does notalways come out as expected, it does behave in ways that are predictable and tunable usingstatistics.

Specialized disciplines[edit] Main article: List of fields of application of statistics

Statistical techniques are used in a wide range of types of scientific and social research,including: biostatistics, computational biology, computational sociology, network biology,socialscience, sociology and social research. Some fields of inquiry use applied statistics so extensivelythat they have specialized terminology. These disciplines include:

Actuarial science (assesses risk in the insurance and finance industries)

Applied information economics

Astrostatistics (statistical evaluation of astronomical data)

Biostatistics

Business statistics

Chemometrics (for analysis of data from chemistry)

Data mining (applying statistics and pattern recognition to discover knowledge from data)

Demography

Econometrics (statistical analysis of economic data)

Energy statistics

Engineering statistics

Epidemiology (statistical analysis of disease)

Geography and Geographic Information Systems, specifically in Spatial analysis

Image processing

Medical Statistics

Psychological statistics

Reliability engineering

Social statistics

In addition, there are particular types of statistical analysis that have also developed their ownspecialised terminology and methodology:

Bootstrap / Jackknife resampling

Multivariate statistics

http://en.wikipedia.org/wiki/Number_theory









http://en.wikipedia.org/wiki/Chaos_theory




http://en.wikipedia.org/wiki/Fractal_geometry



http://en.wikipedia.org/wiki/Process_art



http://en.wikipedia.org/wiki/Jackson_Pollock



http://en.wikipedia.org/wiki/Wikipedia:Citation_needed








http://en.wikipedia.org/wiki/Performance_art



http://en.wikipedia.org/wiki/Markov_process



http://en.wikipedia.org/wiki/Stochastic_music



http://en.wikipedia.org/wiki/Iannis_Xenakis






http://en.wikipedia.org/wiki/List_of_fields_of_application_of_statistics



http://en.wikipedia.org/wiki/Biostatistics



http://en.wikipedia.org/wiki/Computational_biology



http://en.wikipedia.org/wiki/Computational_sociology



http://en.wikipedia.org/wiki/Network_biology






http://en.wikipedia.org/wiki/Sociology



http://en.wikipedia.org/wiki/Social_research



http://en.wikipedia.org/wiki/Specialized_terminology



http://en.wikipedia.org/wiki/Actuarial_science


http://en.wikipedia.org/wiki/Applied_information_economics


http://en.wikipedia.org/wiki/Astrostatistics




http://en.wikipedia.org/wiki/Business_statistics


http://en.wikipedia.org/wiki/Chemometrics


http://en.wikipedia.org/wiki/Chemistry





http://en.wikipedia.org/wiki/Pattern_recognition



http://en.wikipedia.org/wiki/Demography


http://en.wikipedia.org/wiki/Econometrics


http://en.wikipedia.org/wiki/Energy_statistics


http://en.wikipedia.org/wiki/Engineering_statistics


http://en.wikipedia.org/wiki/Epidemiology


http://en.wikipedia.org/wiki/Geography


http://en.wikipedia.org/wiki/Geographic_Information_Systems



http://en.wikipedia.org/wiki/Spatial_analysis



http://en.wikipedia.org/wiki/Image_processing


http://en.wikipedia.org/wiki/Psychological_statistics


http://en.wikipedia.org/wiki/Reliability_engineering


http://en.wikipedia.org/wiki/Social_statistics







http://en.wikipedia.org/wiki/Multivariate_statistics





















































Statistical classification

Structured data analysis (statistics)

Structural equation modelling

http://en.wikipedia.org/wiki/Statistical_classification


http://en.wikipedia.org/wiki/Structured_data_analysis_(statistics)


http://en.wikipedia.org/wiki/Structural_equation_modelling





Documents

First Hw Stat 102