L1 statistics

Statistics: Terms and Definitions

Population: All data, continuous

Sample: A subset of data, discrete. Use sample for inferential statistics.

Every statistical problem contains five elements:

• Questions to be answered. Identification of the populations

• Design of experiment, sampling procedure

• Analysis of the sampled data (equations and distributions)

• Inference (based on confidence level)

• How good the inference is, measure of goodness

Statistics: Terms and Definitions

Measurements: Single Point Multiple Point Uncertainty is total error associated with measurements with specific level of confidence. Errors: Bias or fixed error (Systematic Error) Precision or random error

Mean = 𝜇 = 𝑥 = 𝑥𝑖

𝑛, 𝑥𝑖 is the sample and n is the total number of the samples.

Variance = 𝜎2 = 𝑠2 =1

𝑛−1 (𝑥 − 𝑥𝑖)

2

Average deviation from the mean= 1

𝑛 (𝑥 − 𝑥𝑖)

2

R.M.S. Deviation from the mean =1

𝑛 (𝑥 − 𝑥𝑖)2

Standard Deviation (SD)=𝑠 = 𝜎 = 𝑠2 = 𝜎2

Coefficient of Variation: It is a relative variation of the data, 𝑠

𝑥

Standard Error of the Mean = 𝑠𝑥 = 𝑠

𝑛

Mode: The most frequent items in the measurement Median: Central item when the data is arranged in ascending or descending order. Degrees of freedom: F or DF = n-K . Here k is the number of constraints imposed on the data.

Probability Density Function (PDF)

Probability is a measure of occurrence

Probability of an event between a & b

P(a<x<b) = 𝑝 𝑥 𝑑𝑥𝑏

𝑎

Total Probability = 𝑝 𝑥 𝑑𝑥∞

−∞

Gaussian Distribution

𝑝 𝑥1

𝜎𝑥 2𝜋𝑒−

12(𝜎𝑥)

2 𝑥−𝜇 2

Standard Normal Distribution

If the data is large and random, then with the following conversion, it should follow a standard normal distribution.

𝑧 =𝑥 − 𝜇

𝜎𝑥

𝑝 𝑧1

2𝜋𝑒−

𝑧2

2

Area under the curve is one.

Histogram

Histogram provides the probability of events within each increment. Histogram can be used to check if the data follows a standard distribution or not. The following steps can be used to draw a histogram:

– Choose a number of class intervals (usually between 5 and 20) that covers the data range. Select the class marks which are the mid-point of the class intervals. If you arrange data in ascending order, the first data should fall in the first class interval.

– For each class interval, determine the number of data that fall within that interval. If a data falls exactly at the division point, then it is placed in the lower interval.

– Construct rectangles with centers at the class marks and areas proportional to class frequencies. If the widths of the rectangles are the same, then the height of the rectangles represent the class frequencies.

Histogram

Data: 25 data point.

3.0, 6.0, 7.5, 15.0, 12.0, 6.5, 8.0, 4.0, 5.5, 6.5, 5.5, 12.0, 1.0, 3.5, 3.0, 7.5, 5.0, 10.0, 8.0, 3.5, 9.0, 2.0, 6.5, 1.0, 5.0

∆𝑥 =(𝑥𝑚𝑎𝑥−𝑥𝑚𝑖𝑛)

𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = (15.0-1.0)/6=2.33

2.24.22.012 xxx

6.44.22.223 xxx

0.74.26.434 xxx

4.94.20.745 xxx

8.114.24.956 xxx

2.144.28.1167 xxx

6.164.22.1478 xxx

Class Class subinterval Class

Marks Class Frequency

Start End

1 -0.2 2.2 1.0 3

2 2.2 4.6 3.4 5

3 4.6 7.0 5.8 8

4 7.0 9.4 8.2 5

5 9.4 11.8 10.6 1

6 11.8 14.2 13.0 2

7 14.2 16.6 15.4 1

Uncertainty Analysis



u tSR R R ( ) for 95% confidence level

Uncertainty and Level of Confidence

Variation of the mean value is identifies by the number of the standard deviations (± σ or ± s) we select which is also related to the level of confidence we choose to indicate that we are sure our data falls within the identified rang of the standard deviation. The relationships between the confidence level and the standard deviation are as follow:

67% level of confidence ± s 95% level of confidence ± 2s

(this is what Engineers use, unless stated otherwise) 99% level of confidence ± 3s

For large sample

𝑥 ± 𝑡𝛼𝑠𝑥 Here α = 1-level of confidence. For small sample

𝑥 ± 𝑡𝛼2

𝑠𝑥

𝑛

Identification of Possible Bad Data Point

Z Score: Z score is a measure of relative standing of the data.

𝑧 =𝑥 − 𝑥

𝑠

Data with z values higher than 1.96 (95% level of confidence) is discarded.

Chouvenet’s Criterion:

• For a sample population, calculate 𝑥 , σ𝑥 .

• Using sample population n, find σ𝑚𝑎𝑥

σ𝑥 .

• Knowing σ𝑥 , find σ𝑚𝑎𝑥 from the table below

• Calculate 𝑥 − 𝑥 . Here 𝑥 is the sample that you are assessing. If the difference is larger than σ𝑚𝑎𝑥, the sample is discarded, otherwise it is retained

.

Linear Regression

Linear regression is used extensively for calibration. It is a relationship between input (x) and output (y). Calibration is used

to eliminate Bias error. 𝑦 = 𝑎0 + 𝑎1𝑥

Where:

The error associated with fitting the data with this equation is:

This is a mathematical error.

Correlation Coefficient

Correlation coefficient (r) is a measure of the strength of a linear relationship between two variables.

Or

Documents

L1 statistics