Sampling Theory

50-3

4.1. SAMPLING

4.1.1. IntroductionA process used in statistical analysis in which a predetermined number of observations will be taken from a larger population is called sampling.

Sampling is essential technique of behavioural research; the research work cannot be undertaken without use of sampling. The study of the total population is not possible and it is also impracticable. The practical limitation – cost, time and other factors which are usually operative in the situation, stand in the way of studying the total population. The concept of sampling has been introduced with a view to making the research findings economical and accurate.

For example, a fruit merchant does not examine each and every apple or mango. He inspects only a few of them and takes decision to purchase or not to purchase. The most important aim of sampling is to obtain maximum information about the population under study with the least uses of money, labour, and time.

According to Cocharn, “In every branch of science we lack the resources, to study more than a fragment of the phenomena that might advance our knowledge.” In this definition a ‘fragment’ is the sample and ‘phenomena’ is the ‘population’.

According to Davis S. Fox “in the social science, it is not possible to collect data from every respondent relevant to our study but only from some fractional part is called sampling”.

4.1.2. Basic Concepts of SamplingThere are some basic terms which are necessary to know the concept of sampling design:

4.1.2.1. Target PopulationThe population or universe represents the entire group of units which is the focus of the study. Thus, the population could consist of all the persons in the country, or those in a particular geographical location, or a special indigenous or economic group, depending on the purpose and coverage of the study. A population could also consist on non-human units such as farms, houses or business establishments. For example, if an investigation is to be conducted on the marks obtained in statistics by the students of a class then all the students of that class in that subject will be the ‘Universe’. If that class consists of 50 students, the same 50 students will form the ‘Universe’.

An aggregate of objects (animate or inanimate) under study is called “population or universe”. It is thus a collection of individuals or of their attributes (qualities) or of results of operations which can be numerically specified.1) Finite Universe: A universe having a finite number of entities or members is called a “finite universe”. For

example, the universe of the weights of students in a particular class or the universe of smokers in Rohtak district.

2) Infinite Universe: A universe with infinite number of members is known as an “infinite universe”. For example, the universe of pressures at various points in the atmosphere.

Statistical Population

Basic Concepts

Sample

Target Population

Census

Sample FrameSample Unit

4.1.2.2. Statistical PopulationA statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalisations about crows, then we would describe the set of crows that is of interest.

Notice that if we choose a population like all crows, we will be limited to observing crows that exist now or will exist in the future. Probably, geography will also constitute a limitation in that our resources for studying crows are also limited. A statistical population is an aggregate of measurable quantities or a set of numbers. In fact, when every element of such a set is characterised by only one character, say, income of individuals, we have a univariate population. It should be noted that a statistical population can be finite or infinite according as it contains a finite or an infinite number of elements. Again, any arbitrary set is not necessarily a statistical population. For example, the set of cows in a farm at a particular time, does not represent a statistical population.

4.1.2.3. SampleSample is a portion of the population which is examined with a view to estimating the characteristics of the population. It is a subset containing the characteristics of a larger population. Samples are used in statistical testing when population sizes are too large for the test to include all possible members or observations. A sample should represent the whole population and not reflect bias toward a specific attribute. A sample is a smaller, manageable version of a larger group.

For example,1) To assess the quality of a bag of rice, we examine only a portion of it. The portion selected from the bag

is called a sample, while the whole quantity of rice in the bag is the population,

2) To estimate the proportion of defective articles in a large consignment, only a portion (i.e., a few of them) is selected and examined. The selected portion is a sample.

4.1.2.4. CensusIf detailed information regarding every individual person or item of a given universe (or population) is collected, then the enquiry will be complete enumerations. Another common name of complete enumeration is census. For example, during the Census of Population (which is done every ten years in India), information in respect of each individual person residing in India is collected. This method gives information about each and every unit of the population with greater accuracy.

Difference between Census and SampleThe main difference between the census and sample method are as follows:

Basis Census Sample

Meaning Census refers to periodic collection of information about the populace from the entire population.

Sampling is a method of collecting information from a sample that is representative of entire population.

Reliability of Data

Data from census is reliable and accurate. There is a margin of error in data obtained from sampling.

Time Taken Census is very time consuming. Sampling is quick.

Sampling Variance

A virtually zero sampling variance, mainly because the data used is drawn from the whole population.

There is a possibility of sampling variance, since the data used is drawn from only a small section of the population.

Scope All items relating to a universe are investigated.

Only a few items are required.

Field on Investigation

Used in investigations with limited field. Used for investigation with large field.

Homogeneity Useful where units of the population are heterogeneous

Proves more useful where population units are homogeneous.

4.1.2.5. Sample FrameA sampling frame is the actual set of units from which a sample will be drawn. It is a list that contains every member of the population from which a sample will be selected. For example, if we wish to study the underlying factors that cause patients to be admitted into hospital following an acute asthmatic attack in a given area (your population), then you would need to know the names of all the people in that area who have been admitted into hospital for this reason.

A good sampling frame should be:1) Relevant: It should contain things directly linked to the research topic.

2) Complete: It should cover all relevant items.

3) Precise: It should exclude all the items that are not relevant.

4) Up-to-Date: It should incorporate recent additions and changes, and have redundant items cleansed from the list.

4.1.2.6. Sample UnitA sampling element is an object on which measurements will be made. A sampling element may or may not be a sampling unit. When the sampling unit consists of several population units, it is called a cluster of units. If each population unit in a cluster will be measured, then the sampling elements are the population units within the sampled clusters. In this case, the sampling element is a subunit of the sampling unit.

The sampling unit must be clearly defined for constructing the sampling frame. By convention in statistics, a capital “N” is used to refer to the number of sampling units making up the universe, and a lowercase “n” for the number of sampling units in the sample itself.

For example, in a family budget enquiry, usually a family is considered as the sampling unit since it is found to be convenient for sampling and for ascertaining the required information. In a crop survey, a farm or a group of farms owned or operated by a household may be considered as the sampling unit.

4.1.3. Characteristics of a Good Sample Following criteria are required for the determination of appropriate sampling design:

1) True Representative: A good sample is the true representative of the population corresponding to its properties. The population is known an aggregate of certain properties and sample is called sub-aggregate of the universe.

2) Free from Bias: A good sample is free from bias. It does not permit prejudices, pre-conceptions and imaginations to influence its choice.

3) Accurate: A good sample maintains accuracy. It yields accurate estimates or statistics and does not involve errors.

4) Comprehensive: A good sample is comprehensive in nature. This is closely related with true representativeness. A comprehensive sample is organised by specific purpose of the investigation. A sample may be comprehensive in traits but may not be a good representative of the population.

True RepresentativeFree from Bias

AccurateComprehensive

ApproachableGood Size

Feasible

Characteristics of a Good Sample

Goal Orientation

Economical Practical

5) Approachable: The subjects of good sample are easily approachable. The research tools can be easily administered on them and data can be easily collected.

6) Good Size: The size of good sample is such that it yields an accurate result. The probability of error can be estimated.

7) Feasible: A good sample creates the research work more feasible.

8) Goal Orientation: This suggests that a sample design “should be oriented to the research objectives, tailored to the survey design, and fitted to the survey conditions”. If this is done, it should influence the choice of the population, the measurement as also the procedure of choosing a sample.

9) Practical: This implies that the sample design can be followed properly in the survey. It is necessary that complete, correct, practical and clear instructions should be given to the interviewer so that no mistakes are made in the selection of sampling units and the final selection in the field is not different from the original sample design. Practically also refers to simplicity of the design, i.e., it should be capable of being understood and followed in actual operation of the field work.

10) Economical: Economical implies that the objectives of the survey should be achieved with minimum cost and effort. Survey objectives are generally spelt out in terms of precision, i.e., the inverse of the variance of survey estimates. For a given degree of precision, the sample design should give the minimum cost. Alternatively, for a given pen unit cost, the sample design should achieve maximum precision (minimum variance).

49-3

4.1.4. Errors in SamplingSampling survey is related to study of limited units of the total population; hence there would be scope for inaccuracy (or) error in the process of collection, processing and analysis of the data (sample). These errors can be broadly classified into two types:1) Sampling errors, and 2) Non-sampling errors.

4.1.4.1. Sampling ErrorsSampling errors or variations among sample statistics are due to differences between each sample and the population and among several samples. Sampling errors originate at the time of collecting samples. The major cause for error lies in the fact that a researcher depends on a small sample drawn from a large universe to draw conclusions of its characteristics.

Types of Sampling ErrorsThese are of two types:1) Biased Sampling Error: Biased Error arises on account of any bias in selection, estimation etc.

2) Unbiased Sampling Error: In some cases few restrictions may have to be imposed while choosing a random sample. In such cases one should ensure that such restrictions do not introduce bias in the results.

Causes for Sampling ErrorsSampling errors are primarily due to some of the following reasons:1) Faulty Selection of the Sample: Some of the bias is introduced by the use of defective sampling technique

for the selection of a sample in which the investigator deliberately selects a representative sample to obtain certain results.

2) Substitution: If problem arises in enumerating a particular sampling unit included in the random sample, the investigators usually substitute a convenient member of the population leading to sampling error.

3) Faulty Demarcation of Sampling Units: Bias due to defective demarcation of sampling unit is particularly significant in area surveys such as agricultural experiments. Thus faulty demarcation could cause sampling error.

Methods to Reduce the Sampling Errors

Sampling errors can be reduced by: 1) Increasing the Size of the Sample: The sampling error can be reduced by increasing the sample size. If the

sample size n is equal to the population size N, then the sampling error is zero. Increase sample size because of the square root formula, the standard error is reduced by half if the sample size is quadrupled (four times), since the sampling error decreases with the increase in sample size. For example, if samples of 100 produce a standard error of 5%, the sample size must be 400 for 2.5%.

If the samples present are unbiased then the size of the sample is required to be decreased. In many situations the decrease is inversely proportional to the square root of the sample size.

2) Stratification: When the population contains homogeneous units, a simple random sample is likely to be representative of the population. But if the population contains dissimilar units, a simple random sample may fail to be representative of all kinds of units, in the population. To improve the result of the sample, the sample design is modified. The population is divided into different groups containing similar units. These groups are called strata. From each group (stratum), a sub-sample is selected in a random manner. Thus all the groups are represented in the sample and sampling error is reduced. It is called stratified-random sampling. The size of the sub-sample from each stratum is frequently in proportion to the size of the stratum. Suppose a population consists of 1000 students out of which 600 are intelligent and 400 are non-intelligent. We are assuming here that we do have this much information about the population. A stratified sample of size n = 100 is to be selected. The size of the stratum is denoted by N 1 and N2

respectively and the size of the samples from each stratum may be denoted by n1 and n2. It is written as under.

Stratum No. Size of Stratum Size of Sample from each Stratum

1 N1 = 600

2 N2 = 400

N1 + N2 = N = 1000 n1 + n2 = n = 100

The size of the sample from each stratum has been calculated according to the size of the stratum. This is called

proportional allocation. In the above sample design, the sampling fraction in the population is10

1

1000

100

N

n

and the sampling fraction in both the strata is also10

1. Thus this design is also called fixed sampling fraction.

This modified sample design is frequently used in sample surveys. But this design requires some prior information about the units of the population. On the basis of this information, the population is divided into different strata. If the prior information is not available then the stratification is not applicable

4.1.4.2. Non-Sampling ErrorsNon-sampling errors occur at the time of observation, approximation and processing of data. This error is common to both the sampling and census survey. Non-sampling errors can arise at any stage of the planning or execution of complete enumeration or sample survey. The non-sampling error may be due to faulty sampling plan, lack of trained and qualified investigators, inaccuracy in responses collected due to bias of the respondent or the researcher, errors in design of the survey and finally the errors in compilation or publication.

Types of Non-Sampling ErrorsFollowing are the various non-sampling errors:

Frame ErrorNon-Response Error

Measurement ErrorData Processing Error

Types of Non-Sampling Errors

Data Analysis Error

1) Frame Error: The sampling frame is the list of all units comprising the population from which a sample is to be drawn. If the sampling frame is incomplete or inaccurate, its use will give rise to this type of error. For example, if a survey is to be undertaken to collect information from different sections of the society, then the use of voters’ list as a sampling frame, will be inappropriate. This is because young people below 18 years of age will be left out from the survey.

2) Non-Response Error: It is almost impossible to obtain data from each and every respondent covered in the same. There are always some respondents who refuse to give any information. Thus, non-response error occurs when respondents refuse to cooperate with the interviewer by not answering his questions. This error also occurs when respondents are away from home when the interviewer calls on them. In case of mail survey particularly, the extent of non-response is usually high.

3) Measurement Error: This is caused when the information gathered is different from the information sought. For example, respondents are asked to indicate whether they own a colour television set. Some of them may respond in the affirmative just to boost their image before an interviewer, even though they may not own a colour television set. Such responses will result in measurement error.

4) Data Processing Error: After the data have been collected, they are to be processed. This involves coding the responses, recording the codes, etc., so that data collection can be transformed into suitable tables. Mistakes can occur during the processing stage of data.

5) Data Analysis Error: As in the case of data processing, errors can occur on account of wrong analysis of data. Apart from simple mistakes in summation, division, etc., more complex errors can occur. For example, the application of a wrong statistical technique can cause such errors.

Causes for Non-Sampling ErrorsSome of the more important non-sampling errors arise from the following factors:1) Errors due to Faulty Planning and Definitions: Sampling error arises due to improper data specification,

error in location of units, measurement of characteristics and lack of trained investigators.2) Response Errors: These errors occur as a result of the responses furnished by the respondents.

3) Non-Response Bias: Non-response errors occur due to: i) The respondent is not found after repeated calls.ii) The respondent is unable to furnish the information on all questions.iii) The respondent refuses to answer certain questions.

4) Errors in Coverage: These errors occur in the coverage of sampling units.

5) Compiling Errors: The errors arise due to compilation such as editing and coding of responses.

Methods to Reduce the Non-sampling ErrorsNon-sampling error can be reduced by: 1) Careful selection of the time the survey is conducted, 2) Using an up-to-date and accurate sampling frame, 3) Planning for follow up of non-respondents, 4) Careful questionnaire design, 5) Providing thorough training for interviewers and processing staff and 6) Being aware of all the factors affecting the topic under consideration.

4.1.5. Sample Size The purpose of research is the main determinant of the level of accuracy required in the results, and this level of accuracy or exactness is the main determinant of a sample size. In general, the larger the sample size, the more accurate will be his estimates. In general, the research budget determines the sample size.

The sample size of a statistical sample is the number of observations that constitute it. It is typically denoted ‘n’, a positive integer (natural number).

The sample size ‘n’ is chosen of that the sampling error (the difference between the statistic and the parameter) and the standard error of the statistic are fixed at some pre-assigned level.

4.1.5.1. Determinants of Sample Size In addition to the purpose of the study and population size, three determinants usually will need to be specified to determine the appropriate sample size:

1) Size of the Universe: The larger the size of the universe, the bigger should be the sample size.

2) Resources Available: If the resources available are vast a larger sample size could be taken. However, in most cases resources constitute a big constraint on sample size.

3) Degree of Accuracy or Precision Desired: The greater the degree of accuracy desired, the larger should be the sample size. However, it does not necessarily mean that bigger samples always ensure greater accuracy. If a sample is selected by experts by following scientific method, it may ensure better results even when it is small compared to a situation in which a large sample size is selected by inexperienced people.

4) Homogeneity or Heterogeneity of the Universe: If the universe consists of homogeneous units a small may serve the purpose but if the universe consists of heterogeneous units a large sample may be inevitable.

5) Nature of Study: For an intensive and continuous study a small sample may be suitable. But for studies which are not likely to be repeated and are quite extensive in nature, it may be necessary to take a larger sample size.

6) Method of Sampling Adopted: The size of sample is also influenced by the type of sampling plan adopted. For example, if the sample is a simple random sample it may necessitate bigger sample size. However, in a properly drawn stratified sampling plan, even a small sample may give better results.

7) Nature of Respondents: Where it is expected a large number of respondents will not cooperate and send back the questionnaire, a large sample should be selected.

8) The Level of Precision: The level of precision, sometimes called sampling error, is the range in which the true value of the population is estimated to be. This range is often expressed in percentage points (e.g., ±5 percent). Thus, if a researcher finds that 60% of farmers in the sample have adopted a recommended practice with a precision rate of ±5%, then he or she can conclude that between 55% and 65% of farmers in the population have adopted the practice.

9) The Confidence Level: The confidence or risk level is based on ideas encompassed under the Central Limit Theorem. The key idea encompassed in the Central Limit Theorem is that when a population is repeatedly sampled, the average value of the attribute obtained by those samples is equal to the true population value. Furthermore, the values obtained by these samples are distributed normally about the true

Size of the Universe

Resources Available

Degree of Accuracy or Precision Desired Homogeneity or Heterogeneity of

the Universe

Nature of Study

Determinants of Sample Size

Method of Sampling Adopted

Nature of RespondentsThe Level of Precision

Degree of VariabilityThe Confidence Level

value, with some samples having a higher value and some obtaining a lower score than the true population value. In a normal distribution, approximately 95% of the sample values are within two standard deviations of the true population value (e.g., mean).

In other words, this means that if a 95% confidence level is selected, 95 out of 100 samples will have the true population value within the range of precision specified earlier. There is always a chance that the sample you obtain does not represent the true population value.

10) Degree of Variability: The third criterion, the degree of variability in the attributes being measured, refers to the distribution of attributes in the population. The more heterogeneous a population, the larger the sample size required to obtain a given level of precision. The less variable (more homogeneous) a population, the smaller the sample size. A proportion of 50% indicates a greater level of variability than either 20% or 80%. This is because 20% and 80% indicate that a large majority do not or do, respectively, have the attribute of interest. Because a proportion of .5 indicates the maximum variability in a population, it is often used in determining a more conservative sample size, i.e., the sample size may be larger than if the true variability of the population attribute were used.

4.1.5.2. Determination and Selection of Sample MemberThe method of determining optimal sample size has been discussed in two estimation problems:1) Determination of Sample Size ‘n’ when Estimating the Population Mean.2) Determination of Sample Size ‘n’ when Estimating the Population Proportion.

Determination of Sample Size (n) when Estimating the Population MeanAs we know that, for large samples, sample mean is an unbiased estimator of population mean . The

standard error of being . For estimating the sample size we need the following pre-assigned value:

1) The desired confidence level.2) The Permissible sampling error (E).3) The population standard deviation ().

So,

and standard error of .

For large samples,

Or

Suppose the desired confidence level is 95%. Then z values defining 95% confidence level are 1.96, thus

or

or

or

To explain this procedure let us assume the example of a population. Say, a researcher wants to find-out the average income (in lack) of the population with an accuracy of 0.5 of a lack in income, i.e., the researcher can

tolerate an error of half a lack income on either side of the true average income at 95% confidence level. In other words, the researcher wants to be 95% confident about his findings.

The formula for confidence limits is:

Where, = population mean

z = value of z at 95% confidence level

= standard error of

= standard deviationn = sample size

If researcher decide to tolerate an error of 1/2 lac that is

so

value of z at 95% confidence level are 1.96

so

or

The value of standard deviation can be found by either1) Assuming or guessing,2) Consulting an expert,3) By pilot study to get the value, and4) Obtaining from other comparable studies.

Say = 2 lac

Then,

n = 61.5 or 62

Hence 62 respondents are required to constitute a sample at 95% level of accuracy. Similarly it can be calculated for 99% or others.

Determining Sample Size (n) When Estimating the Population ProportionFor a sample of size n, sample proportion is used for estimating population proportion p.

The standard error of (p being known).

For large samples,

;

Where

Or

4.1.6. Steps in Designing the SampleA sampling plan is a detailed outline of which measurements will be taken at what times, on which material, in what manner, and by whom. Sampling plans should be designed in such a way that the resulting data will contain a representative sample of the parameters of interest and allow for all questions, as stated in the goals, to be answered. The steps involved in developing a sampling plan are:1) Define the Universe: Universe can be confined to a particular type of

product, some geographical limits or some other constraints.

The first problem in any sampling procedure is to define the universe. The target population or universe is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population must be defined precisely. Imprecise definition of the target population will result in research that is ineffective at best and misleading at worst. Defining the target population involves translating the problem definition into a precise statement of who should and should not be included in the sample. The target population should be defined in terms of elements, sampling units, extent and time. An element is the object about which or from which the information is desired. In survey research, the element is usually the respondent.

For example, consider a marketing research project assessing consumer response to a new brand of men’s cologne. Who should be included in the target population? All men? Men who have used cologne during the last month? Men 17 or older? Should females be included, because some women buy colognes for their husbands? These and similar questions must be resolved before the target population can be appropriately defined.

2) Sample Frame: The frame is constructed either by the researcher for the purpose of his study or may consist of some existing list of the population.

After the population to be studied has been specified, the next step is to develop a frame of this population. A list containing all sampling units of a population is known as sampling frame. The frame is constructed either by the researcher for the purpose of his study or may consist of some existing list of the population. A frame does not always have to be a list of names; it can also involve a definite location, a boundary, an address, or a set of rules by which a sampling unit can be delineated.

A frame in some sense is a set of boundaries circumscribing the universe. It may be in the form of lists, indices, maps, directories, population records, electoral rolls, city tax rolls, students enrolled in a university etc. In marketing studies the frame is essential. A list of every element of the population appearing once and only once would constitute a sample frame. A good sampling frame should be accurate, free from duplication and conveniently available. A sample frame is essential for marketing research and better performance of sampling procedure.

A sampling frame is a representation of the elements of the target population. It consists of a list or set of directions for identifying the target population. For example, the telephone book, an association directory listing the firms in an industry, a mailing list purchased from a commercial organisation, a city directory, or a map.

3) Specifying the Sampling Units: The decision on sampling unit often depends on the sampling frame. The sampling unit is the basic unit containing the elements of the population to be sampled, e.g. city blocks, households, a business organisation etc. The selection of the sampling unit partially depends on the overall design of the project also. The units which serve as the basis of initial sampling are known as primary sampling units. It can be composed of one or more units of the population depending on the objectives of the inquiry.

For example, suppose that Revlon wanted to assess consumer response to a new line of lipsticks and wanted to sample females over 18 years of age. It may be possible to sample females over 18 directly, in

Define the Universe

Sample Frame

Specifying Sampling Units

Selection of Sample Design

Determination of Sample Size

Select the Sample

Figure 3.1: Steps in Designing the Sample

which case a sampling unit would be the same as an element. Alternatively, the sampling unit might be households. In the latter case, households would be sampled and all females over 18 in each selected household would be interviewed. Here, the sampling unit and the population element are different. Extent refers to the geographical boundaries and the time factor is the time period under consideration.

4) Selection of Sample Design: It is the procedure of selecting units in the sample. There are two basic methods of sampling namely, probability and non-probability methods which can be further divided into some specific methods of selection. It is the procedure of selecting units in the sample. A probability sample is one, where the selected units have some specific chance of being included in the sample. In a non-probability sample some arbitrary method of selection not depending on chance is adopted. This method mainly depends on the purpose of the inquiry, as well as on the attitude or convenience of the investigators.

The selection of the sample design really involves two decisions:i) To use probability or non-probability method of selection, andii) Specific sample design to use in collecting the data.

The researcher’s choice will be affected by the following considerations:i) If sampling error is to be evaluated, then probability sampling must be used.ii) To ensure randomness in the selection of units, probability sample should be used.iii) In the absence of proper sample frame, non-probability sampling should be used.iv) If time and money considerations are vital, then non-probability sampling should be used.

Once the decision about probability and non-probability method of selection has been made, one should select the sample design that will best accomplish the objectives of the investigation. Regardless of the design finally chosen, the researcher may have to defend this design, when the study results are ultimately presented.

5) Determination of Sample Size: The size of the sample has direct relationship with degree of accuracy desired in the investigation. It also depends upon the nature of the population as well as the method of selection. In marketing research investigations the ideal sample size depends upon the type of the series and the size of the population. It is a common practice that larger the size of the population, more units should be drawn in the sample and more the degree of heterogeneity, larger should be sample size for it to be representative.

6) Select the Sample: Select the sample means execute actual sampling process. It is the actual selection of the sample elements. This requires a substantial amount of office and field work, particularly when personal interviews are involved. Execution of the sampling process requires a detailed specification of how the sampling design decisions with respect to the population, sampling frame, sampling unit, sampling technique, and sample size are to be implemented. If households are the sampling unit, an operational definition of a household is needed. Procedures should be specified for vacant housing units and for call backs in case no one is at home. Detailed information must be provided for all sampling design decisions.

4.1.7. Types of Sampling Methods: Sample Design A sample design is a definite plan for obtaining a sample from the sampling frame. Sampling design, in general, refers to the method or technique the researcher adopts in selecting the sampling units from the frame or population.

A sample design is the framework, or road map, that serves as the basis for the selection of a survey sample and affects many other important aspects of a survey as well. In a broad context, survey researchers are interested in obtaining some type of information through a survey for some population, or universe, of interest. One must define a sampling frame that represents the population of interest, from which a sample is to be drawn. The sampling frame may be identical to the population, or it may be only part of it and is therefore subject to some undercoverage, or it may have an indirect relationship to the population (e. g. the population is preschool children and the frame is a listing of preschools). The sample design provides the basic plan and methodology for selecting the sample. A sample design can be simple or complex.

There are different types of sample designs based on two factors viz., the representation basis and the element selection technique. On the representation basis, the sample may be probability sampling or it may be non-probability sampling. Probability sampling is based on the concept of random selection, whereas non-probability sampling is ‘non-random’ sampling.

On element selection basis, the sample may be either unrestricted or restricted. When each sample element is drawn individually from the population at large, then the sample so drawn is known as ‘unrestricted sample’, whereas all other forms of sampling are covered under the term ‘restricted sampling’.

Thus, sample designs are basically of two types’ viz., probability sampling and non-probability sampling:

4.1.7.1. Random/Probability Sampling TechniquesProbability sampling is also known as ‘Random sampling’ or ‘chance sampling’. Under this sampling design, every item of the universe has an equal chance of inclusion in the sample. It is, so to say, a lottery method in which individual units are picked up from the whole group not deliberately but by some mechanical process. Here it is blind chance alone that determines whether one item or the other is selected. The results obtained from probability or random sampling can be assured in terms of probability i.e., we can measure the errors of estimation or the significance of results obtained from a random sample, and this fact brings out the superiority of random sampling design over the deliberate sampling design.

The various probability sampling methods are as follows:

1) Simple Random Sampling: This is the simplest and most popular technique of sampling. In it each unit of the population has equal chance of being included in the sample. This method implies that if N is the size of the population and n units are to be drawn in the sample, then the sample should be taken in such a way that each of the NCn samples has an equal chance of being selected.

Simple probability sampling gives:i) Each element in the population an equal chance of being included in the sample and all choices are

independent of each other.ii) Each possible sample combination an equal chance of being chosen.

The method of simple probability sampling eliminates the chance of bias or personal prejudices in the selection of units.

2) Systematic Sampling: In this sampling, one unit is selected at random from the universe and the other units are at a specified interval from the selected unit. This method can be used when the population is finite and the units of the Universe can be arranged on the basis of any system – like alphabetical arrangement, numerical arrangement or geographical arrangement etc.

3) Stratified Random Sampling: Stratified random sample is one in which random selection is done not from the heterogeneous universe as a whole but from different homogeneous parts or strata of a universe. This sampling procedure may be summarised as follows:i) The universe to be sampled is divided (or stratified) into groups that are mutually exclusive and include

all items in the universe.ii) A simple random sample is then chosen independently from each group or stratum.

The process of stratified probability sampling differs from simple random sampling in that, with the later, sample items are selected at random from the entire universe. In stratified random sampling, the sample is

Simple Random SamplingSystematic Sampling

Stratified Random Sampling Clusters Sampling

Area Sampling

Random Sampling Techniques

Multi-Stage Sampling

designed so that a separate random sample is selected from each stratum. In simple random sampling the distribution of the sample among strata is left entirely to chance.

Formally, divide the population into non-overlapping groups (i.e., strata)

Such that

Then do a random sample of where f is the sampling fraction.

4) Cluster Sampling: In this method, the universe is divided into some recognisable sub-groups which are called clusters. After this a simple random sample of these clusters is drawn and then all the units belonging to the selected clusters constitute the sample.

For example, if we have to conduct an opinion poll in the city of Delhi, then the city may be divided into, say, 50 blocks and out of these 50 blocks 5 blocks can be picked up by random sampling and the inhabitants in these five blocks can be interviewed to give their opinion on a particular issue.

While using this method, it should be seen that clusters are of as small in size as possible and the number of sample units in each cluster should be more or less the same. This method is commonly used in collecting data about some common characteristics of the population.

Cluster sampling, no doubt, reduces cost by concentrating surveys in selected clusters. But certainly it is less precise than random sampling. There is also not as much information in ‘n’ observations within a cluster as there happens to be in ‘n’ randomly drawn observations. Cluster sampling is used only because of the economic advantage it possesses; estimates based on cluster samples are usually more reliable per unit cost.

5) Multi-Stage Sampling: This is a modified form of cluster sampling. While in cluster sampling all the units in a selected cluster constitute the sample, in multistage sampling the sample units are selected in two or three or four stages. In this system the universe is first divided into first-stage sample units, from which the sample is selected.

The selected first-stage samples are then sub-divided into second stage units from which another sample is selected. Third stage and fourth-stage sampling is done in the same manner if necessary. Thus, for an urban survey, a sample of towns may be taken first and then for each of the selected town a sub-sample of households may be taken, and then, if need be, from each of the selected household a third-stage-sample of individuals may be obtained.

6) Area Sampling: Area Sampling is a form of multi-stage sampling in which maps, rather than lists or registers are used as the sampling frame. It is more frequently used in those countries which do not have a satisfactory sampling frame such as a population lists.

If clusters happen to be some geographic subdivisions, in that case cluster sampling is better known as area sampling. In other words, cluster designs, where the primary sampling unit represents a cluster of units based on geographic area, are distinguished as area sampling. The plus and minus points of cluster sampling are also applicable to area sampling. The overall area for sampling is divided into several smaller areas within which a random sample is selected. For example, the city map is used for area sampling. Various blocks provide the frame and each of them are numbered and used for the sampling. For sampling blocks stratification is employed, which is based on geographical considerations. Thus blocks are needed to be identified and then a stratified sample of dwellings can be selected. Finally blocks are subdivided into segments of a more or less equal size, and a sample of these segments may be taken in the sample.

Nature of Random SamplingThe nature of probability sampling can be described as follows:1) Accurate Estimates of Population: For some research problems, highly accurate estimates of population

characteristics are required. In these situations, the elimination of selection bias and the ability to calculate sampling error make probability sampling desirable. However, probability sampling will not always result

in more accurate results. If non-sampling errors are likely to be an important factor, then non-probability sampling may be preferable, as the use of judgement may allow greater control over the sampling process.

2) Heterogeneous Population: Another consideration is the homogeneity of the population with respect to the variables of interest. A more heterogeneous population would favour probability sampling, because it would be more important to secure a representative sample. Probability sampling is preferable from a statistical viewpoint, because it is the basis of the most common statistical techniques.

3) Sophisticated: However, probability sampling is sophisticated and requires statistically trained researchers. It generally costs more and takes longer than non-probability sampling. In many marketing research projects, it is difficult to justify the additional time and expense and thus operational considerations favour the use of non-probability sampling. In practice, the objectives of the study often exert a dominant influence on which sampling method will be used.

4) Permits Generalisation: The major advantage of probability sampling is that it permits generalisation, the process of applying the findings from the sample to the population from which the sample was drawn. As for the broader population beyond the sampling frame, the researcher can only hypothesise about the applicability of the sample findings. This is one reason why replication in research is so important, to test the limits of findings as they apply to additional settings and variations in the population.

Advantages of Random SamplingAdvantages of probability sampling are as follows:1) Unbiased Estimates: Random (Probability) sampling is the only sampling method that provides

essentially unbiased estimates having measurable precision. If the investigator requires this level of objectivity, then some variant of probability sampling is essential.

2) Relative Efficiency: Random Sampling permits the researcher to evaluate, in quantitative terms, the relative efficiency of alternate sampling techniques in a given situation. Usually this is not possible in non-probability sampling.

3) Less Universe Knowledge Required: This requires relatively little universe knowledge. Essentially, only two things are needed to be known:i) A way of identifying each universe element uniquely, andii) The total number of universe elements.

4) Fair: Every item in the population has an equal chance of being selected and measured.

5) Easy: It allows easy data analysis and error calculation.

Disadvantages of Random SamplingFollowing are the disadvantages of probability sampling:1) Less Efficient: It is less statistically efficient than other sampling methods.

2) Non-Utilisation of Additional Knowledge: It does not make use of additional knowledge of how the population is structured.

3) Complex and Time Consuming: The method of selection in many cases can be complex and time consuming. Especially in the cases of marketing research, the constraints of budget and time may give preference to non-probability methods of sampling.

4) High Level Skills: Probability sampling requires a very high level of skill and experience for its use.

5) More Time Required: It requires a lot of time to plan and execute a probability sample.

6) High Costs: The costs involved in probability sampling are generally large as compared to non-probability sampling.

4.1.7.2. Non-Random/Non-Probability Sampling TechniquesNon- probability sampling is that sampling procedure which does not afford any basis for estimating the probability that each item in the population has been included in the sample. Non-probability sampling is also known by different names such as ‘deliberate sampling’, ‘purposive sampling’ and ‘judgment sampling’.

In this type of sampling, items for the sample are selected deliberately by the researcher; his choice concerning the items remains supreme. In other words, under non-probability sampling the organisers of the inquiry purposively choose the particular units of the universe for constituting a sample on the basis that the small mass that they select out of a huge one will be typical or representative of the whole.

The various non- probability sampling methods are:

1) Quota Sampling: One of the most commonly used non-probability sample designs is quota sampling, which enjoys its most widespread use in consumer surveys. This sampling method also uses the principle of stratification. As in stratified random sampling, the researcher begins by constructing strata. Bases for stratification in consumer surveys are commonly demographic, e.g., age, sex, income and so on. Often compound stratification is used – for example, age groups within sex.

This is one kind of purposive or judgement sampling. A quota sample is one in which the investigator is directed to collect information from an assigned number, or quota of individuals. The quota sampling technique is very popular in opinion surveys and market studies.

Next, sample sizes (called quotas) are established for each stratum. As with stratified random sampling, the sampling within strata may be proportional or disproportional. Field-workers are then instructed to conduct interviews with the designated quotas, with the identification of individual respondents being left to the field-workers.

2) Convenience Sampling: In convenience sampling selection, the researcher chooses the sampling units on the basis of convenience or accessibility. It is called accidental samples because the sample-units enter by accident.

This is also known as a sample of the man in the street, i.e., selection of units where they are. Sample units are selected because they are accessible. For example, in testing a potential new product, the sample work is done by adding the new product to the appropriate shops in the locality. Purchasing and selling of the new product is observed there.

3) Judgment Sampling: A second method of non-probability sampling that is sometimes advocated is the selection of universe items by means of expert judgment. Using this approach, specialists in the subject matter of the survey choose what they believe to be the best sample for that particular study.

This type of sample requires judgment or an ‘educated guess’ as to who should represent the population. It is expected that these samples would be better as the experts are supposed to know the population. For example, a group of sales managers might select a sample of grocery stores in a city that they regarded as “representative.” This approach has been found empirically to produce unsatisfactory results. And, of course, there is no objective way of evaluating the precision of sample results. Despite these limitations, this method may be useful when the total sample size is extremely small.

4) Panel Sampling: Here, the initial samples are drawn on random basis and information from these is collected on regular basis. It is a semi-permanent sample where members may be included repetitively for successive studies. Here there is a facility to select and quickly contact such well-balanced samples and to have relatively high response rate even by mail.

5) Snowball Sampling: It is also known as Referred sampling or Multiplicity sampling. It is a procedure in which initial respondents are selected randomly but where additional respondents are then obtained from referrals. It is a form of networking. As the name implies, the sample grows just as a snowball grows.

Quota Sampling

Non-Random Sampling Techniques

Convenience Sampling

Judgment SamplingPanel Sampling

Snowball Sampling

It is a special non-probability method used when the desired sample characteristic is rare. It may be extremely difficult or cost prohibitive to locate respondents in these situations. Snowball sampling relies on referrals from initial subjects to generate additional subjects. While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population

For example, Let us suppose that a researcher wants to conduct the survey of NRIs living in USA for the last 5 years. Initial respondents may be selected from the list supplied by USA Embassy in India. Then referral procedure was obtained a second group of qualified respondents and so on.

Nature of Non- Random SamplingThe nature of non-probability sampling can be described as follows:1) Unknown Probability: Selection, i.e., in the case on non-probability sampling, the probability of selection

of each sampling unit is not known. It implies that non-probability samples cannot depend upon the rationale of the probability theory and hence we cannot estimate population parameters from sample statistics. Further, in the case of non-probability samples, we do not have a rational way to prove/know whether the selected sample is representative of the population.

2) Applied in Social Research: In general, researchers prefer probabilistic sampling methods over non-probabilistic ones, but in applied social research due to constraints such as time and cost and objectives of the research study there are circumstances when it is not feasible to adopt a random process of selection and in those circumstances usually non-probabilistic sampling is adopted.

3) Subjective Judgement: A core characteristic of non-probability sampling techniques is that samples are selected based on the subjective judgement of the researcher, rather than random selection (i.e., probabilistic methods), which is the cornerstone of probability sampling techniques. Whilst some researchers may view non-probability sampling techniques as inferior to probability sampling techniques, there are strong theoretical and practical reasons for their use.

4) Easier, Quicker and Cheaper Method: Non-probability sampling is often used because the procedures used to select units for inclusion in a sample are much easier, quicker and cheaper when compared with probability sampling. This is especially the case for convenience sampling. For students doing dissertations at the undergraduate and master’s level, such practicalities often lead to the use of non-probability sampling techniques.

Advantages of Non-Random Sampling1) True Universe Picture: Relevant sections of the universe may be selected in the proportions they appear in

the universe.

2) Economical: Geographical concentration can be achieved thus reducing costs.

3) Quick: Useful and quick method in certain circumstances.

4) Specific Cases Types: Might be only method available, such as if sampling illegal drug users.

5) Specific Members of Population: If researchers are truly interested in particular members of a population, not the entire population.

6) Pilot Study: Exploratory research attempting to determine whether a problem exists or not, such as a pilot study.

Disadvantages of Non-Random Sampling1) Details Needed: Detailed initial information of the universe is needed.

2) Errors: Errors in sample selection can easily occur.

3) Subjective Nature: The subjectivity of non-probability sampling prevents making inferences to the entire population.

4) Selection Bias: Validity and credibility questionable due to selection bias.

5) Reliability: The reliability of the resulting estimates cannot be evaluated which results in the user not knowing how much confidence can be placed in any interpretations of the survey finding.

4.1.7.3. Random Versus Non-Random Sampling Techniques

Basis Random Sampling Non-Random Sampling

1) Control Sampling error can be controlled Sampling error cannot be controlled.

2) Chances of Selection Bias

The selection process depends on the specific technique and is, therefore, not influenced by the expertise of the researcher.

Selection bias can be very high.

3) Economy Time and costs involved may be high. Usually a low-cost, quicker alternative.

4) Reliability It is possible to test the hypotheses through formal, rigorous tests of significance and, thus, obtain more reliable results.

Parametric tests of significance not applicable; the reliability of results is therefore, not very high.

5) Suitability More reliable and representative if the population is heterogeneous.

May be more useful in a homogeneous population.

6) Usefulness Preferable if complex, detailed estimates of is required.

Reasonably useful if parameters to parameters be estimated are at broad, aggregated levels, such as market shares or total sales.

7) Degree of Accuracy

Accuracy may be poor if the population is high. Accuracy in such situations is quite scattered.

8) Sampling Frame

Formal sampling frames’ required. Can be effective even in the absence of an elaborate sampling frame.

9) Convenience May be very inconvenient if the cheaper geographical spread of the population is high and likely to have lower.

More convenient, less time-consuming, non-sampling errors.

4.1.8. Need and Importance of SamplingThe importance of sampling is as follows:1) Saves Time, Money and Effort: The researcher can save time, money and effort because the subjects

involved are small in number giving him a short time to calculate, tabulate, present, analyse, and interpret.

2) More Effective: As the size of sample is less than that of population, fatigue in collecting the information is reduced and therefore more effective work is done by the investigators.

3) Faster and Cheaper: Since the sample is small, the collection, tabulation, presentation, analysis, and interpretation of data are rapid and less expense is involved.

4) More Accurate: Fewer errors are made because small data are involved in collection, tabulation, presentation, analysis and interpretation.

5) Gives More Comprehensive Information: A small sample results in a more thorough investigation of the study, thus, giving more comprehensive information because all the members of the population have been given an equal chance of being included in the sample.

4.2. EXERCISE

To be added later.

Documents

Sampling Theory