CHAPTER 1: WHY STUDY STATISTICS?

Why Study Statistics?

� Population is a large (or in�nite) set of elements that are in the interest of a research question.A parameter is a speci�c characteristic of a population

�All the men living in Turkey can be a population. The average height of these men can bea population parameter

� Sample is a subset of population that we use to withdraw conclusions or predictions on theparameters of the population (for inferences to be valid, sampling should be random). Statisticsis a characteristic of the sample

� Instead of measuring the height of every man in Turkey, we can randomly select 5000 menfrom di¤erent locations of the country. This would be our sample. Then we can �nd theaverage height of these people to estimate average height of the men in Turkey. This wouldbe our sample statistics

1Ozan Eksi, TOBB-ETU

Types of Statistics

� Inferential Statistics: This is what explained above; i.e. using sample data to make estima-tion and hypothesis testing (the tools that helps us to make statements and decisions underuncertainty, incomplete information)

� Descriptive Statistics: Graphical and numerical procedures that are used to present and sum-marize data. We can use descriptive statistics on either population, or sample data

Chapter Summary

� Terms reviewed in this chapter:

� Population (Populasyon)� Parameter (Parametre)� Sample (Örneklem)� Inferential Statistics (Ç¬kar¬msal ·Istatistik)� Estimation (Tahmin)� Descriptive Statistics (Betimleyici ·Istatistik)

CHAPTER 2: USING GRAPHS TO DESCRIBE DATA

Data, Variable, and Constant

Data are usually just a set of numbers representing the same kind of thing, such as body weight. That"thing" is called a variable (it is variable because the numbers vary from subject to subject). If thenumbers are the same, the thing is called a constant

Classi�cation of Variables

� Categorical (sometimes called Nominal) or Numerical

�Categorical: (Yes or No), (Like, Dislike or Indi¤erent), ...

�Numerical: (Discrete: Outcome of a dice, ...), (Continuous: Height, time, ...)

� Qualitative or Quantitative

�Qualitative: These variables are measured on an ordinal, interval, or ratio scale to describevariables. Numerical identi�cation is only given to make variables categorized (Yes and Nocan be labeled as 0 and1). Ordered data indicate the rank of ordering items as well (Like,Dislike and Indi¤erent can be labelled as 2, 1,0). This thpe of data can be either categoricalor numerical

�Quantitative: They are measured on a nominal scale. Hence, numeric values matter

� Independent or Dependent

� Independent: A variable that stands alone and isn�t changed by the other variables (ex.someone�s age)

�Dependent: A variable that is explained by independent variables

Tables And Graphs to Describe Categorical Variables

� The Frequency Distribution Table reveals the number of occurrence (frequency) of eachpossible outcome

�A probability distribution is a frequency distribution with each frequency divided bythe total number of observations

� Bar Chart, Pie Chart and Pareto Diagram are the graphics that present the same infor-mation with the Frequency Distribution Table

�Example: Hospital Patients by Unit

Frequency Distribution Table Bar Chart Pie Chart

Hospital Unit Number of Patients

Cardiac Care 1,052Emergency 2,245Intensive Care 340Maternity 552Surgery 4,630

Hospital Patients by Unit

Emergency25%

Maternity6%

Surgery53%

Cardiac Care12%

Intensive Care4%

� Pareto Diagram: It is a special Bar Chart. But unlike Bar and Pie Charts, Pareto diagrampresents the information in an order (descending or ascending), and the cumulative total isrepresented by the line

�Ex: 400 defective items are examined for cause of defectFrequency Distribution Table Arranging Data

400Total21Cracked case19Electrical Short78Paint Flaw25Missing Part

223Poor Alignment34Bad Weld

Number of defectsSource of

Manufacturing Error

400Total21Cracked case19Electrical Short78Paint Flaw25Missing Part

223Poor Alignment34Bad Weld

Number of defectsSource of

Manufacturing Error

4001921253478223

Number of defects

100%Total4.75Electrical Short5.25Cracked case6.25Missing Part8.50Bad Weld

19.50Paint Flaw55.75Poor Alignment

% of Total DefectsSource of

Manufacturing Error

4001921253478223

Number of defects

100%Total4.75Electrical Short5.25Cracked case6.25Missing Part8.50Bad Weld

19.50Paint Flaw55.75Poor Alignment

% of Total DefectsSource of

Manufacturing Error

Pareto Diagram

Pareto Diagram: Cause of Manufacturing Defect

Poor Alignment Paint Flaw Bad Weld Missing Part Cracked case Electrical Short0%

cumulative %(line graph)

Tables And Graphs to Describe Numerical Variables

� We have frequency distribution just like the case with categorical variables. However, since thistime the data is not categorized into groups, it is better to form arti�cial groups instead ofrevealing frequency of each data point

�Ex: A manufacturer of insulation randomly selects 20 winter days and records the dailyhigh temperature: 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

The Ordered Data

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

The Frequency Distribution Table

Interval Frequency Relative Frequency Percentage

more than 10 but less than 20 3 .15 15

Total 20 1.00 100

� Note: In this example to classify the data we used intervals of 10. However, there is no rule forthat. The decision should be case speci�c

� Note: The best graph is always the one that displays the information in the most clear andapprehensible way. There is no restriction for the type of the graph that you would use. However,remember that it may also be risky not to use standard graphs as it may lead confusion for thereaders

Histogram

� It is a graph of the (numerical) data in a frequency distribution

Interval Frequency

10 but less than 20 3

Histogram: Daily High Temperature

001234567

0 10 20 30 40 50 60

The Cumulative Frequency Distribution & Ogive (graphing cumulative frequencies)

Interval Frequency Percentage Cumulative CumulativeFrequency Percentage

more than 10 but less than 20 3 15 3 15

Total 20 100

Ogive: Daily High Temperature

10 20 30 40 50 60Cum

A line chart (time-series plot)

� It is used to show the values of a variable over time (time series data)

�Time is measured on the horizontal axis

�The variable of interest is measured on the vertical axis

� An Example:

Magazine Subscriptions by Year

� Cross-Sectional Data: Refers to data collected by observing many subjects at the same point oftime. It is collected usually for the purpose of comparison

� Time series-cross-sectional Data: Refers to data collected by observing many subjects at thesuccessive points in time

The shape of the distribution

� The shape of the distribution is said to be symmetric if the observations are balanced, or evenlydistributed, about the center

Symmetric Distribution

0123456789

1 2 3 4 5 6 7 8 9

The shape of the distribution is said to be skewed if the observations are not symmetricallydistributed around the center

Negatively Skewed Distribution

1 2 3 4 5 6 7 8 9

Positively Skewed Distribution

1 2 3 4 5 6 7 8 9

Tables and Graphs to Describe Relationship Between Variables

� Graphs illustrated so far have involved only a single variable

� When two variables exist other techniques are used:

�Categorical (Qualitative) Variables: Cross tables (or contingency tables)

�Numerical (Quantitative) Variables : Scatter plots

Cross Tables

� If there are r categories for the �rst variable (rows) and c categories for the second variable(columns), the table is called an r x c cross table

� Ex: 4 x 3 Cross Table for Investment Choices by Investor

Investment Investor A Investor B Investor C TotalCategory

Stocks 46.5 55 27.5 129Bonds 32.0 44 19.0 95CD 15.5 20 13.5 49Savings 16.0 28 7.0 51

Total 110.0 147 67.0 324

Side by side bar chart

Comparing Investors

0 10 20 30 40 50 60

Stocks

Savings

Investor A Investor B Investor C

Scatter Diagrams They are used for paired observations taken from two numerical variables.One variable is measured on the vertical axis and the other variable is measured on the horizontal axis

200601955518850170421673816033146291402612523

Cost perday

Volumeper day

200601955518850170421673816033146291402612523

Cost perday

Volumeper day

Cost per Day vs. Production Volume

0 10 20 30 40 50 60 70

Volume per Day

Chapter Summary

� Data (veri) in raw form are usually not easy to use for decision making. Some type of organizationin the form of table or graphs is needed

� Variables (De¼gi̧skenler):� Categorical (Kategorik) � Numerical (Say¬sal)� Qualitative (Niteliksel) � Quantitative (Niceliksel)� Independent (Ba¼g¬ms¬z) � Dependent (Ba¼g¬ml¬)

� Ordinal scale (S¬rasal Ölçek) � Ratio scale(Oransal Ölçek)� Interval scale(Aral¬ksal Ölçek) � Nominal scale (Say¬sal Ölçek)� Line chart (Çizgisel gra�k) � Bar chart (Çubuk gra�k)� Pie chart (Dairesel Gra�k) � Pareto diagram (Pareto Diyagram¬)

� Histogram (Histogram) � Ogive (A cumulative line graph)� The Cumulative Frequency distribution (Kümülatif Frekans Da¼g¬l¬m¬)� Time Series (Zaman Serisi) � Time Series (Zaman Serisi)� Skewed (Çarp¬k Da¼g¬l¬m) � Scatter plot (Saç¬l¬m Gra�¼gi)

CHAPTER 3: USING NUMERICAL MEASURES TO DESCRIBE DATA

Measures of Central Tendency

� Mean: Arithmetic average of values (sum of values divided by the number of them)

� Median: Midpoint of ranked values

� Mode: Most frequently observed value in the data

�Ex: Suppose the following bicycle prices: 2.000, 100, 300, 100, 500

� The mean is: (2.000+100+300+100+500)/5=600

� The median can be found after ranking: 2.000, 500, 300, 100, 100; which is 300

� The mode is 100

�Even though the mean is the most generally used measure of central tendency, it is seenthat it is subject to outliers� that is, it is highly a¤ected from high or low values in thedata even though these values may not be very informative

�Then median is often used, since the median is not sensitive to extreme values

� Note: the location of the median is n+ 1

2position in the ordered data

� If the number of values is odd, the median is the middle number

� If the number of values is even, the median is the average of the two middle numbers

� Formally, the mean (also called arithmetic mean) is

� If calculated from population of N values, the mean is denoted by � and calculated as:

N=x1 + x1 + :::+ xN

� If calculated from sample size of n values, the mean is denoted by �x and calculated as:

�x =

n=x1 + x1 + :::+ xn

Mean and Median Depending on Shape of a Distribution

Mean = MedianMean < Median Median < Mean

RightSkewedLeftSkewed Symmetric

Measures of Variability

� Measures of variation give information on the spread or variability of the data values

�Ex: Same center, di¤erent variation

� There are di¤erent measures of variability. The ones we are going to discuss

�Range: Di¤erence between the largest and the smallest observations

� Interquartile Range: Eliminate high- and low-valued observations and calculate the rangeof the middle 50% of the data

�Variance: Average of squared deviations of values from the mean

� Standard Deviation: Square Root of Variance

�Coe¢ cient of Variation: Standard Deviation divided by mean (shows relative variation)

� Range: Di¤erence between the largest and the smallest observations

�Ex:

Range = 14 1 = 13

�However, it ignores the way in which data are distributed and sensitive to outliers

7 8 9 10 11 12Range = 12 7 = 5

� Interquartile Range: Eliminate high- and low-valued observations and calculate the range of themiddle 50% of the data

�The �rst quartile, Q1, is the value for which 25% of the observations are smaller and 75%are larger

�Q2 is the same as the median (50% are smaller, 50% are larger)

�Only 25% of the observations are greater than the third quartile

� Ex:Median

maximumXminimum Q1 Q3

25% 25% 25% 25%

12 30 45 57 70

Interquartile range= 57 –30 = 27

� Variance: Average of squared deviations of values from the mean

�Population mean and variance

N�2 =

(xi � �)2

� Sample mean and variance

�x =

(xi � �x)2

n� 1

� Standard Deviation: It is square root of variance. � is the population standard deviation, and sis the sample standard deviation

� Ex: Sample Data (xi): 10, 12, 14, 15, 17, 18, 18, 24

�The sample size, n = 8. The mean can be found by

�x =10 + 12 + 14 + 15 + 17 + 18 + 18 + 24

�The standard deviation can be found by

s(10� �x)2 + (12� �x)2 + :::+ (24� �x)2

n� 1 =

s(10� 16)2 + (12� 16)2 + :::+ (24� 16)2

8� 1

7= 4:2426 (a measure of the average scatter around the mean)

� You don�t have to rank the data to �nd variance or standard deviation

� Both measure is used for hypothesis testing for a single distribution, but cannot be used tocompare variability of di¤erent distributions

� Coe¢ cient of Variation: Shows variation relative to mean, so that it measures relative variationand can be used to compare two or more sets of data measured in di¤erent units

CV = (s

�x)100%

� Ex:

� Stock A:

� Average price last year = $50

� Standard deviation = $5CV = (

50)100% = 10%

� Stock B:

� Average price last year = $100

� Standard deviation = $5CV = (

100)100% = 5%

�Both stocks have the same standard deviation, but stock B is less variable relative to itsprice

More About Standard Deviation of a Distribution

� Chebyshev�s Theorem: For any distribution (not necessarily normal) with mean � and standarddeviation � , and k > 1 , the part of the observations that fall within the interval

�� k�

(i.e. k standard deviations of the mean) includes at least this much of the data

100[1� (1=k2)]%

�Ex:At least Within

(1� 1=1:52) = 55:5 % k = 1:5 (�� 1:5�)(1� 1=22) = 75 % k = 2 (�� 2�)(1� 1=32) = 88:9 % k = 3 (�� 3�)

� If the data distribution is bell-shaped (normally distributed), then the interval

�� 1� contains about 68 % of the values in the population or the sample

�� 2� contains about 95 % of the values in the population or the sample

�� 3� contains about 99:7 % of the values in the population or the sample

1σμ±

2σμ± 3σμ±

Weighted Mean and Measures of Grouped Data

� The weighted mean of a set of data is �x =

wixiPwi

=w1x1 + w2x2 + :::+ wnxnP

where wi is the weight of the ith observation

� Can be used when data is already grouped into n classes, with wi values in the ith class

� Suppose a data set contains values m1;m2; :::;mk, occurring with frequencies f1; f2; :::fK

�Population mean and variance

Nwhere N =

fi , and �2 =

fi(mi � �)2

� Sample mean and variance

�x =

nwhere n =

fi , and s2 =

fi(mi � �x)2

n� 1

Measures of Relationships Between Variables

� The covariance measures the strength of the linear relationship between two variables

�Population covariance:

Cov(x; y) = �2xy =

(xi � �x)(yi � �y)

� Sample covariance:

Cov(x; y) = s2xy =

(xi � �x)(yi � �y)

n� 1

�Only concerned with the strength of the relationship

�No causal e¤ect is implied

� Interpreting Covariance:

�Cov(x; y) > 0 ) x and y tend to move in the same direction

�Cov(x; y) < 0 ) x and y tend to move in opposite directions

�Cov(x; y) = 0 ) there is no linear relation between x and y

� Coe¢ cient of Correlation measures the relative strength of the linear relationship betweentwo variables. It is relative because, unlike covariance, this measure is not a¤ected from themagnitude of data

�Population correlation coe¢ cient: � =Cov(x; y)

�x�y

� Sample correlation coe¢ cient: r =Cov(x; y)

� It is unit free and ranges between �1 and 1. The closer to �1, the stronger the negative linearrelationship. 0 indicates no relationship between the variables of interest

r = 1 r = .6 r = 0

r = +.3r = +1

Xr = 0

Chapter Summary

� Mean (Ortalama) � Median (Medyan, Ortanca De¼ger)� Mode (Mod, Tepe De¼geri) � Measure (Ölçü)� Range (De¼gi̧sim Aral¬¼g¬) � Variance (Varyasyon)� Interquartile Range (Yar¬-çeyreklik De¼gi̧sim Aral¬¼g¬) � Coe¢ cient of Variation (Varyasyon Katsay¬s¬)� Standard Deviation (Standart Sapma) � Weighted Mean (A¼g¬rl¬kl¬ortalama)� Covariance (Covaryasyon) � Correlation (Corelasyon)

CHAPTER 1: WHY STUDY STATISTICS? - Ozan Ekşi

Documents

Cv ozan Dutch

Introduction to Fluid Instabilities K. Yavuz Ekşi

- 925 - YENİ MEDYADA NEFRET SÖYLEMİ: EKŞİ SÖZLÜK ÖRNEĞİ

06 Erdal Dağtaş & Ozan Yıldırım - Kültürel Emperyalizm

Ozan Sağlam mug design

A- SOLOW-SWAN MODEL - Ozan Ekşi - Homeozaneksi.weebly.com/uploads/6/4/3/9/6439085/adv_macro_growth_1.pdf · Notes From ‚Economic Growth: Barro and Sala-i Martin™ Golden Rule

Ahmet Ozan HATİPOĞLU Cansu ÇALIŞIR Mehmet Özgür TEMUÇİN

Ogrenci ozellikleri(ebru musa-ozan-ugurcan)

Sabancı Üniversitesi Lise Yaz Okulu - Ozan Sönmez (SMMM, MBA) · 2020. 6. 13. · Ozan Sönmez (SMMM, MBA) ü Genel Bilgiler Ozan Sönmez, 1997’de Bornova Anadolu Lisesi’nden

OFİS TASARIMINDA ÇEVRESEL FAKTÖRLER Ozan SAĞLAM Işık

TÜBİTAK - Ozan Ekşi · Web viewElde edilen gölge faiz serisinin 2016 yılı sonuna kadar TCMB’nin piyasayı fonladığı faizlerin ortalaması olan ağırlıklı ortalama fonlama

PPT Ozan Ips

Ozan Kocak Turkcell

FATİH OZAN FARALYALI, ANILARLA FETHİYE ÖRNEĞİNDE TARIM …

Ozan yüksel 130908033 tchibo sunumu

Ozan yüksel130908033 sap sunumu

Hobiden İşe: Ekşi Sözlük - ietm.boun.edu.tr · PDF file•2002, eki sözlük: kutsal bilgi kayna

Ozan Madencilik

Sızma Testlerinde Fiziksel Güvenlik - Ozan Uçar #SiberGuvenlikKonferansi 14.05.14

Ozan Boran Sabatay Sevi