Upload
said
View
39
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Fundamentals of Data Analysis Lecture 3 Basics of statistics. Program for today. Basic terms and definitions Discrete distributions Continuous distributions Normal distribution. Topics for discussion. What are the application s of statistics in modern physics ? - PowerPoint PPT Presentation
Citation preview
Fundamentals of Data Analysis
Lecture 3
Basics of statistics
Program for todayBasic terms and definitions Discrete distributionsContinuous distributionsNormal distribution
Topics for discussion What are the applications of
statistics in modern physics? How important is the drawing of
conclusions based on statistical analysis ?
What is the statistics ?Definition of Statistics:
1. A collection of quantitative data pertaining to a subject or group. Examples are blood pressure statistics etc.
2. The science that deals with the collection, tabulation, analysis, interpretation, and presentation of quantitative data
What is the statistics ?Two phases of statistics:
Descriptive Statistics:o Describes the characteristics of a product
or process using information collected on it.
Inferential Statistics (Inductive):o Draws conclusions on unknown process
parameters based on information contained in a sample.
o Uses probability
Probability When we cannot rely on the assumption that all
sample points are equally likely, we have to determine the probability of an event experimentally. We perform a large number of experiments N and count how often each of the sample points is obtained. The ratio of the number of occurrences of a certain sample point to the total number of experiments is called the relative frequency.
Probability The probability is then assigned the relative
frequency of the occurrence of a sample point in this long series of repetitions of the experiment. This is based on the axiom, called the "law of large numbers", which says that the relative frequency approaches the true (theoretical) probability of the outcome if the experiment is repeated over and over again. How important is the drawing of conclusions based on statistical analysis.
Probability
where n(E) is the number of times, the event E took place out of a total of N experiments. From this definition we can see that the probability is a number between 0 and 1. When the probability is 1, then we know that a particular outcome is certain.
Probability
For a discrete random variable definition of probability is intuitive:
where n(x) is the number of occurences of the desired value of the random variable x (successes) in N samples (N ).
N
xnP
Probability
N
xxxxnxxxxP
00
00
x
xxxxPxf
000
For a continuous random variable, this definition requires the identification of a small range of variation Δx (Δx 0), for which the probability is determined :
For a continuous random variable it is preferable to use the probability density function:
HistogramThe histogram is the most important graphical tool for exploring the shape of data distributions. And a good way to visualize trends in population data. The more a particular value occurs, the larger the corresponding bar
on the histogram.
HistogramConstructing a histogram
Step 1: Find range of distribution, largest - smallest values
Step 2: Choose number of classes, 5 to 20
Step 3: Determine width of classes, one
decimal place more than the data, class width = range/number of classes
Step 4: Determine class boundaries
Step 5: Draw frequency histogram
HistogramNumber of groups or cells
If number of observations < 100 – 5 to 9 cells
Between 100-500 – 8 to 17 cells
Greater than 500 – 15 to 20 cells
Analysis of histogram
Analysis of histogramCalculating the average for ungrouped data
and for grouped data:1
ni
i
XX
n
1
1 1 2 2
1 2
... .
...
hi i
i
h h
h
f XX
n
f X f X f X
f f f
Analysis of histogramBoundaries Midpoint Frequency Computation
23.6-26.5 25.0 4 100
26.6-29.5 28.0 36 1008
29.6-32.5 31.0 51 1581
32.6-35.5 34.0 63 2142
35.6-38.5 37.0 58 2146
38.6-41.5 40.0 52 2080
41.6-44.5 43.0 34 1462
44.6-47.5 46.0 16 736
47.6-50.5 49.0 6 294
Total 320 11549
Measures of dispersion Range Standard deviation Variance
Measures of dispersionThe range is the simplest and easiest to calculate of the measures of dispersion.
R = Xmax - Xmin
Measures of dispersionStandard deviation inside the probe:
2
1( )
1
n
iXi X
Sn
Measures of dispersionFor a discrete random variable definition of variation is as follows:
when for continous is:
ii xPxExxV2
dxxfxExxVb
a
2
Parameters of a distribution
Parameter is a characteristic of a population, i.o.w. it describes a population
Statistic is a characteristic of a sample, used to make inferences on the population parameters that are typically unknown, called an estimator
Parameters of a distribution
Population - Set of all items that possess a characteristic of interest
Sample - Subset of a population
Parameters of a distribution
Expected value (EV) discrete random variable:
and for continuous random variable:
ii
i
xPxZ
kxE
1
dxxfxxEb
a
Random numbers
1 2 3 4 5 6 7 8 9 10
1534 7106 2836 7873 5574 7545 7590 5574 1202 7712
6128 8993 4102 2551 0330 2358 6427 7067 9325 2454
6047 8566 8644 9343 9297 6751 3500 8754 2913 1258
0806 5201 5705 7355 1448 9562 7514 9205 0402 2427
9915 8274 4525 5695 5752 9630 7172 6988 0227 4264
2882 7158 4341 3463 1178 5789 1173 0670 0820 5067
9213 1223 4388 9760 6691 6861 8214 8813 0611 3131
8410 9836 3899 3883 1253 1683 6988 9978 8026 6751
9974 2362 2103 4326 3825 9079 6187 2721 1489 4216
3402 8162 8226 0782 3364 7871 4500 5598 9424 3816
8188 6569 1492 2139 8823 6878 0613 7161 0241 3834
3825 7020 1124 7483 9155 4919 3209 5959 2364 2555
9801 8788 6338 5899 3309 0807 0968 0539 4205 8257
Normal distribution
Characteristics of the normal curve:
It is symmetrical -- Half the cases are to one side of the center; the other half is on the other side.
The distribution is single peaked, not bimodal or multi-modal
Also known as the Gaussian distribution
Normal distribution
Characteristics of the normal curve:
It is symmetrical -- Half the cases are to one side of the center; the other half is on the other side.
The distribution is single peaked, not bimodal or multi-modal
Also known as the Gaussian distribution
Normal distribution
Probability density function:
N(μ,σ)
N(0,1) - standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1
Normal distribution
Exponential distribution
Probability density function
Cumulative distribution function
Cumulative distribution function is given by: F(x) = P(-oo, x)
for
Thanks for attention !