Upload
dapdai
View
92
Download
0
Embed Size (px)
Citation preview
Statistics: Terms and Definitions
Population: All data, continuous
Sample: A subset of data, discrete. Use sample for inferential statistics.
Every statistical problem contains five elements:
• Questions to be answered. Identification of the populations
• Design of experiment, sampling procedure
• Analysis of the sampled data (equations and distributions)
• Inference (based on confidence level)
• How good the inference is, measure of goodness
Statistics: Terms and Definitions
Measurements: Single Point Multiple Point Uncertainty is total error associated with measurements with specific level of confidence. Errors: Bias or fixed error (Systematic Error) Precision or random error
Mean = 𝜇 = 𝑥 = 𝑥𝑖
𝑛, 𝑥𝑖 is the sample and n is the total number of the samples.
Variance = 𝜎2 = 𝑠2 =1
𝑛−1 (𝑥 − 𝑥𝑖)
2
Average deviation from the mean= 1
𝑛 (𝑥 − 𝑥𝑖)
2
R.M.S. Deviation from the mean =1
𝑛 (𝑥 − 𝑥𝑖)2
Standard Deviation (SD)=𝑠 = 𝜎 = 𝑠2 = 𝜎2
Coefficient of Variation: It is a relative variation of the data, 𝑠
𝑥
Standard Error of the Mean = 𝑠𝑥 = 𝑠
𝑛
Mode: The most frequent items in the measurement Median: Central item when the data is arranged in ascending or descending order. Degrees of freedom: F or DF = n-K . Here k is the number of constraints imposed on the data.
Probability Density Function (PDF)
Probability is a measure of occurrence
Probability of an event between a & b
P(a<x<b) = 𝑝 𝑥 𝑑𝑥𝑏
𝑎
Total Probability = 𝑝 𝑥 𝑑𝑥∞
−∞
Gaussian Distribution
𝑝 𝑥1
𝜎𝑥 2𝜋𝑒−
12(𝜎𝑥)
2 𝑥−𝜇 2
Standard Normal Distribution
If the data is large and random, then with the following conversion, it should follow a standard normal distribution.
𝑧 =𝑥 − 𝜇
𝜎𝑥
𝑝 𝑧1
2𝜋𝑒−
𝑧2
2
Area under the curve is one.
Histogram
Histogram provides the probability of events within each increment. Histogram can be used to check if the data follows a standard distribution or not. The following steps can be used to draw a histogram:
– Choose a number of class intervals (usually between 5 and 20) that covers the data range. Select the class marks which are the mid-point of the class intervals. If you arrange data in ascending order, the first data should fall in the first class interval.
– For each class interval, determine the number of data that fall within that interval. If a data falls exactly at the division point, then it is placed in the lower interval.
– Construct rectangles with centers at the class marks and areas proportional to class frequencies. If the widths of the rectangles are the same, then the height of the rectangles represent the class frequencies.
Histogram
Data: 25 data point.
3.0, 6.0, 7.5, 15.0, 12.0, 6.5, 8.0, 4.0, 5.5, 6.5, 5.5, 12.0, 1.0, 3.5, 3.0, 7.5, 5.0, 10.0, 8.0, 3.5, 9.0, 2.0, 6.5, 1.0, 5.0
∆𝑥 =(𝑥𝑚𝑎𝑥−𝑥𝑚𝑖𝑛)
𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = (15.0-1.0)/6=2.33
2.24.22.012 xxx
6.44.22.223 xxx
0.74.26.434 xxx
4.94.20.745 xxx
8.114.24.956 xxx
2.144.28.1167 xxx
6.164.22.1478 xxx
Class Class subinterval Class
Marks Class Frequency
Start End
1 -0.2 2.2 1.0 3
2 2.2 4.6 3.4 5
3 4.6 7.0 5.8 8
4 7.0 9.4 8.2 5
5 9.4 11.8 10.6 1
6 11.8 14.2 13.0 2
7 14.2 16.6 15.4 1
Uncertainty Analysis
Uncertainty Analysis
Uncertainty Analysis
u tSR R R ( ) for 95% confidence level
Uncertainty and Level of Confidence
Variation of the mean value is identifies by the number of the standard deviations (± σ or ± s) we select which is also related to the level of confidence we choose to indicate that we are sure our data falls within the identified rang of the standard deviation. The relationships between the confidence level and the standard deviation are as follow:
67% level of confidence ± s 95% level of confidence ± 2s
(this is what Engineers use, unless stated otherwise) 99% level of confidence ± 3s
For large sample
𝑥 ± 𝑡𝛼𝑠𝑥 Here α = 1-level of confidence. For small sample
𝑥 ± 𝑡𝛼2
𝑠𝑥
𝑛
Identification of Possible Bad Data Point
Z Score: Z score is a measure of relative standing of the data.
𝑧 =𝑥 − 𝑥
𝑠
Data with z values higher than 1.96 (95% level of confidence) is discarded.
Chouvenet’s Criterion:
• For a sample population, calculate 𝑥 , σ𝑥 .
• Using sample population n, find σ𝑚𝑎𝑥
σ𝑥 .
• Knowing σ𝑥 , find σ𝑚𝑎𝑥 from the table below
• Calculate 𝑥 − 𝑥 . Here 𝑥 is the sample that you are assessing. If the difference is larger than σ𝑚𝑎𝑥, the sample is discarded, otherwise it is retained
.
Linear Regression
Linear regression is used extensively for calibration. It is a relationship between input (x) and output (y). Calibration is used
to eliminate Bias error. 𝑦 = 𝑎0 + 𝑎1𝑥
Where:
The error associated with fitting the data with this equation is:
This is a mathematical error.
Correlation Coefficient
Correlation coefficient (r) is a measure of the strength of a linear relationship between two variables.
Or