Stat01

Chapter One

Statistics: Mathematical and Nonmathematical Aspects

Contents

Statistics Descriptive Statistics Inferential Statistics

Few Basic Concepts Population and Sample Elementary Unit and Frame Survey Versus Experiments Census Taking or Sampling

Statistical Use in Business and Planning Data

Why Do We Need Data Different Data Focus

Nature Source Units or Types of Measurements Levels of measurements

Variables Numeric

Continuous Discrete

Categorical Rank

Most people associate the term statistics with numbers or, perhaps, tables and graphs that display them. This mental image is reinforced daily as people encounter an abundance of numerical information in newspapers, magazines, and on television screens regarding the prices of bonds and stocks, performances of businesses and sports teams, rates of unemployment and inflation, poverty and disease, accidents, crimes and the weather – the list goes on. As a matter of fact, this widely held view quite accurately depicts the original concern of the discipline as well as the continuing concern of one of its modern branches.

1.1 Statistics

Statistics deals with data. Data is classified facts about a particular class of objects. The facts can be quantitative or qualitative depending on the situation. Specifically statistics deals with data relating to

its collection, organization, presentation, summarization, description, and analysis. Statistics is also concerned with facilitating wise decision making in the face of uncertainty. The science of Statistics can, therefore, be viewed as

the application of the scientific method in the analysis of numerical data for the purpose of making rational decisions in the face of uncertainty and that, therefore, develops and utilizes techniques for the careful collection, effective presentation, and proper analysis of numerical information.

Statistics is broadly divided into two distinct groups: Descriptive and Inferential.

Descriptive statistics, as the name reflects, deals with description or summarization of various features of a data set of a given group using tables, diagrams and numerical measures. It mainly focuses on the distribution, central tendency and spread of the group. Presenting the data in a descriptive form is usually the first stage in any statistical analysis, as it helps us to spot any patterns in the data set. The descriptive statistics is sometimes called primary analysis or deductive statistics.

Inferential statistics goes with further analysis of data normally of a smaller group (called sample) in order to draw conclusions about the characteristics of a larger group (called population or universe). Probability is the integral part of the inferential statistics. It is also called secondary analysis or inductive statistics.

1.2 Few basic Concepts in Statistics

Understanding some basic terms is very important in familiarization of Statistics. These terms include:

Population and Sample

Set of all possible observations about a specified characteristic of interest is called a population or universe. The word population can also refer to a collection of entities. If the heights of all the 65 students of a class are of interest, then the population consists of all these students.

A sample is a representative part of a population. If a sample of 45 households is taken from a community for study, it should truly reflect the whole community. Even with a very closely selected representative sample there are chances of error between the population and the sample, known as sampling error.

Elementary Unit and Frame

The persons or objects possessing the characteristics that interest the statistician are referred to as elementary units. A complete listing of all elementary units relevant to a statistical investigation is called a frame. Thus, someone who wanted to learn about the racial composition of a firm’s labor force would quickly identify the individual employees of that firm as the elementary units, but someone concerned about the amount of credit extended by that firm might view the individual credit accounts issued by it as the elementary units to be investigated.

Survey Versus Experiments

New data can be generated either by conducting a survey or by performing an experiment. A survey or an observational study is the collection of data from elementary units without exercising any particular control over factors that may make these units different from one another and that may, therefore, affect the characteristics of interest being observed.

Statistical Applications 2

Example 1.1

A characteristic such as the annual salary of different workers is simply being observed and recorded without regard to factors, such as education, work experience, or length of service, that make workers different from one another and that may be responsible for differences in their salaries.

An experiment involves the collection of data from elementary units, while exercising control over some or all factors that may make these units different from one another and that may, therefore, affect the characteristics of interest being observed.

Example 1.2

A firm may divide its 40 new employees into two similar groups of equal size (with the help of some random device) and then administer an obligatory special training program to one of the groups only. If the 20 employees who went through the program exhibited superior productivity later on, the training program might justifiably be credited because other factors that could account for this result (such as group differences in age, motivation, or prior work experience) were eliminated by the random division of the original group of 40 newcomers.

Experimental data tend to be stronger than the survey data. Unfortunately, most data in business and planning, and in many other fields, are generated not by experiments but by surveys because it is often impossible, or extremely costly, to exercise proper experimental controls.

Census taking or Sampling

Two types of surveys exist: complete and partial ones. Census or a complete survey involves observations about one or more characteristics of interest for every elementary unit that exists. When the number of elementary unit is very large, complete success in observing all of them is likely to elude the census takers.

In a partial or sample survey, observations about one or more characteristics of interest are being made for only a subset of all existing elementary units. There are good reasons why sample surveys are often undertaken in place of censuses.

Why Sampling?

The cost of collecting and processing data is obviously lower; the fewer are the elementary units that have to be contacted.

Sampling saves time. Study of the whole population is sometimes physically impossible, as when the number of

elementary units is infinitely large or when some of them are totally inaccessible. Sampling error is less than that of non-sampling error. A census is senseless whenever it produces information that comes too late. Sampling can provide more accurate data than a census. In case of destruction of samples from a lot, sampling is obvious, e.g., testing of tires,

respondents view about the taste of a drink, etc. It needs skilled and trained interviewers. Sometimes it is monotonous, bored and biased in collecting information from a huge

population.

1.2 Statistical Use in Business and Planning

Statistics: Mathematical and Nonmathematical Aspects 3

The more complex our world becomes both our needs for information and the quantities of information available continue to expand rapidly. Managers or researchers in every field must plan carefully, so that the quantity and quality of information they obtain are adequate to meet their needs.

1.2.1 Statistics for Decision-making

In the business world, the concepts, techniques, and results of statistics are indispensable components of decision making. Statistics presents the decision-maker with relevant facts and, in many cases, provides an estimate of the probability and/or the monetary consequences of making a wrong decision. Before bringing a new product to market, a manufacturer want to arrive at some assessment of the likely level of demand, and a market research survey may be undertaken.

1.2.2 Statistics for Forecasting

One of the major objectives of managers and researchers is to assemble information of sufficient quantity and quality to forecast. In the use of both a management information system and the scientific methods, the persons trained in statistics can make important contribution. Essentially, forecasts of future values are obtained through the discovery of regularities in past behavior.

1.2.3 Statistics for Validity Measurement

As already noted sample has the overall advantage over census in studying a population. When sample is used for population inference the question of validity of the results comes in. Statistics can help in answering these questions in terms of level of confidence of the results.

1.3 Data

As statistics deals with data, to study the science of statistics needs a thorough understanding of the term data. Data is the facts, attributes, observations or characteristics of an object (e.g., income, occupation, food habits, etc.). Any single observation about a specified characteristic of interest is called a datum - the basic unit of the statistician’s raw material. Any collection of observations about one or more characteristics of interest, for one or more elementary units, is called a data set. Data can be thought of as the information needed to help us make a more informed decision in a particular situation.

1.3.1 Why do we need data?

To provide the necessary input to a research study. To measure performance in an ongoing service or production process. To assist in formulating alternative courses of action in a decision making process. To satisfy our curiosity.

1.3.2 Different Focus of Data

Data can be looked at from different angles.

Nature

As per nature, data can be simply divided into two groups: (i) Qualitative Data

Example 1.3


Sex, taste, Letter grades

(ii) Quantitative Data

Example 1.4

132 cm, 56 kilogram,

Source

Depending on the source, data can be mainly classified as primary or secondary.

Primary data are not readily available and as such they are collected directly from the field or experiments. These are first hand data. Primary data can be collected through face-to-face conversation, observation, field survey (using questionnaires), etc.

Example 1.5

1) Airline ticketing offices and travel agents have up-to-the minute information regarding space availability on flights and hotels.

2) ATMs enable banking transactions to occur spontaneously with information immediately recorded on account balances.

Secondary data are obtained from available sources, e.g., reports, records and documents of different organizations/researches. They are the data compilers.

Example 1.6

Statistical Yearbooks, Journals, National trade data bank, World development report, etc.

Some authors use another term tertiary data. When World Development Report cites data using another source it becomes tertiary.

Units or Types of Measurement

On the basis of units of measurement or types of measurement, data may be classified as Categorical, Ranked, and Metric.

Categorical data are those in which individual objects are simply placed in the proper category or group, and the number in each category is counted. No units are required to identify these, e.g., sex, religion, students’ category on the basis of program, employment, etc.

Example 1.7

In a graduate level business statistics class the students are categorized into male and female. The corresponding numbers are counted and are categorized as follows:

Male = 16Female = 11Total = 27

Ranked data are also categorical in nature. But they have order or rank (hierarchy) property inherent among them.


Example 1.8

Student grades (e.g., A, B, C, D) have distinct rank order among them. A community can be ranked on the basis of their income (e.g., low, middle, and high) Students' education level (e.g., primary, secondary, higher secondary). Households can be ranked on the basis of their age (e.g., children, adolescent, young and old).

Metric data needs certain units of measurements and has continuity, i.e., these data have values that are continuous over certain range, and are expressed with the help of standard units of measurements.

Example 1.9

Expenditures and income of a community can be examples of metric data. In many cases the metric data can be converted to rank data, e.g., Income expressed as low (up to Tk. 5,000), medium (between Tk. 5,000 & Tk. 10,000) and high (above Tk. 10,000), or,Height (cm) ranked in classes of 150-159, 160-169, 170-179, etc., or,Weight (lb) ranked in classes of 105-115, 116-125,126-135, etc.

Levels of Measurements

Measurement is the process of assigning a value to the data and rules defining the assignment of an appropriate value determine the level of measurement (sometimes termed as scales of measurement). The levels of measurement are distinguished on the basis of ordering or distance properties inherent in the measurement rules. Knowledge of the rules and the levels of measurement are important because each statistical technique is appropriate for data measured at certain levels only. Traditionally, four levels of measurements are identified: Nominal, Ordinal, Interval, and Ratio.

Nominal

Nominal level is the lowest level of measurements. Each value is a distinct category, which serves merely as a label or name for the category. No ordering or distance properties among categories are made. The real number properties (i.e., addition, subtraction, multiplication, and division) do not apply to nominal level of measurements. Categorical data fall under this level of measurement. The basic property of the nominal level of measurement is that the properties of objects in one category are equal to each other, but not to anything else in their identical aspect. This logical property of equivalence are (a) reflexivity (i.e., every object in one of the categories is equal to itself), (b) symmetry (i.e., if a=b, then b=a), (c) transitivity (i.e., a=b and b=c, then a=c). These three logical properties are operative among objects within the same category, but not necessarily between categories.

Example 1.10

Names of continents: Asia, Africa, Australia, Europe, North America, South America,

Classification of a population by religions: Muslim, Hindu, Buddhist, Christian, etc.

Labeling rooms on 1st, 2nd or 3rd floors by numbers in the 100s, 200s, or 300s, respectively.

Political party affiliation: Democrat, Republican, Independent, Others.


Ordinal

In ordinal level of measurements it is possible to rank order all the categories according to certain criterion exhibiting some kind of relation. Typical relations are “higher”, “greater”, “more desired”, “more difficult”, and so on. Although ordering property is present in the ordinal level of measurements, the distance property is absent. Hence, the real number properties cannot be applied when dealing with ordinal level of measurements. Ranked data fall under this level of measurement.

Example 1.11

Letter grades of students: A, B, C, D.

Student class designation: Freshmen, Sophomore, Junior, Senior.

Product Satisfaction: very satisfied, satisfied, neutral, dissatisfied, very dissatisfied.

Faculty Rank: Professor, Associate Professor, Assistant Professor, Instructor.

Interval

As can be seen in the above examples, in the absence of the distance property it is difficult to relate the gap between A and B, or B and C, and the like. In interval level of measurement this distance property is added in addition to ordering property. Here the distances between the categories are defined in terms of fixed and equal units. It is important to note here that in interval level of measurement we study the difference between things and not their proportionate magnitude. In other words the inherently determined zero point is not available in interval scale. Metric data fall under this level of measurement.

Example 1.12

Temperature. Difference between 40ºC and 41ºC is the same as the difference between 80ºC and 81ºC. But 40ºC does not mean half of 80ºC

Ratio

The ratio level of measurement has all the properties of an interval level measurement in addition to a well defined zero point. The zero point is inherently defined by the measurement scheme. Consequently, the distance comparisons as well as ratio comparisons can be made. Any real number applications are applied to ratio level of measurements. Metric data fall under this level of measurement.

Example 1.13

Weight of students, Income of a household, Height (in inches), Age (in years or days), Salary (in Tk).

It is important to note that a true interval level of measurement is difficult to be found. If the distances between categories can be measured, a zero point can also be established. Another point to note is that statistics developed for one level of measurement can always be used with higher-level variables, but not with variables measures at a lower level. There are other typologies of measurements that we come across quite frequently: Quantitative and Qualitative. The interval and ratio levels fall under quantitative level, whereas nominal and ordinal levels fall under qualitative levels. The formal properties characterizing each level of measurement are summarized in Table 1.1.


Table 1.1: Levels of Measurements and Their Characteristic PropertiesLevel Equivalence Greater Than Fixed Interval Natural Zero

Nominal Yes No No NoOrdinal Yes Yes No NoInterval Yes Yes Yes NoRatio Yes Yes Yes Yes

1.4 Variables

In simple terms variables used to mean something that varies. The variable itself does not vary rather its values vary. In other words, a variable is a symbol (e.g., X, Y, H, x, P, etc.), which can assume any of a prescribed set of values, called the domain of the variable. If the variable can assume only one value it is called a constant.

Any one elementary unit may possess one or more characteristics that interest the statistician. An investigator may, indeed, be interested only in the age of each employee, but it would be just as possible to observe, in addition, each employee’s sex or salary. The characteristics of elementary units are themselves called variables, presumably because observations about these characteristics are likely to vary from one elementary unit to next.

Example 1.14

Age represented by X is a variable. It can assume any value. But age of a particular person at a particular point in time is constant.

Number of customers entering a departmental store each day (X) is an example of a variable.

Three types of variables can be distinguished in statistical applications: Numerical, Categorical, and Rank.

Numerical

A numerical variable is a variable whose possible values are numbers. A numerical variable, which can theoretically assume any value between two given values, is called a continuous variable; otherwise it is called a discrete variable. Observations about a discrete quantitative variable can assume values only at specific points on a scale of values, with gaps between them. Data that can be described by a discrete or by a continuous variable are called discrete data or continuous data respectively. As a thumb rule, enumeration or counting gives rise to discrete quantitative data, which differ from each other by clearly defined steps; while measurements give rise to continuous quantitative data without any gap in between.

Example 1.15

The household size N, which can assume any of the values 1, 2, 3, 4, …. , but cannot be 2.4 or 3.75, is a discrete variable.

The height H of an individual, which can be 169 cm, or 169.6 cm, or 169.567 cm, depending on the accuracy of measurement, is a continuous variable.

Lengths of 1000 bolts produced in a factory are an example of continuous data. Number passengers in a bus can only take integers; hence, it is a discrete variable. The number of books in a library shelve is an example of discrete data.

In business and planning, very often the concept of variable is linked to non-numerical (Qualitative) entities.


Example 1.16

Flavor of ice cream, Sex, color of a pencil, etc.

Categorical

A categorical variable is a variable whose values are expressed in words as categories rather than numbers. A categorical variable with only two values is called a dichotomous variable.

Example 1.17

Sex: Male and Female Reach an agreement: Yes and No.

A categorical variable with more than two values is called polytomous variable.

Example 1.18

Religion wise: Muslim, Hindu, Christian, and Buddhist community in Dhaka city. Satisfaction. Values: Highly satisfied, Satisfied, Indifferent, Dissatisfied, Highly dissatisfied

A categorical variable can be naturally categorized or converted to a categorical variable.

Example 1.19

Naturally categorized: Satisfaction, Response (Yes, No, No answer) Converted to a categorical variable: Examination scores (0 to 100) converted to letter grades

(A, B, C, D).

Rank

Rank variables are those values that can be ranked. Rank variables can also come from two sources: naturally ranked variables and numerical variable converted to rank variables.

Example 1.20

Naturally ranked: Class ranking (first, second, third, forth, …) Converted to a rank variable: test scores of 15 subjects reduced to ranks from 1 to 15.


Documents

Stat01