NCERT Class 11 Statistics

CONTENTS

Foreword iii

Chapter 1 : Introduction 1

Chapter 2 : Collection of Data 9

Chapter 3 : Organisation of Data 22

Chapter 4 : Presentation of Data 40

Chapter 5 : Measures of Central Tendency 58

Chapter 6 : Measures of Dispersion 74

Chapter 7 : Correlation 91

Chapter 8 : Index Numbers 107

Chapter 9 : Use of Statistical Tools 121

APPENDIX A : GLOSSARY OF STATISTICAL TERMS 131

APPENDIX B : TABLE OF TWO-DIGIT RANDOM NUMBERS 134

told this subject is mainly aroundwhat Alfred Marshall (one of thefounders of modern economics) calledthe study of man in the ordinarybusiness of life. Let us understandwhat that means.

When you buy goods (you maywant to satisfy your own personalneeds or those of your family or thoseof any other person to whom you wantto make a gift) you are calleda consumer.

When you sell goods to makea profit for yourself (you may bea shopkeeper), you are called a seller.

When you produce goods (you maybe a farmer or a manufacturer), youare called a producer.

Introduction

1. WHY ECONOMICS?

You have, perhaps, already hadEconomics as a subject for your earlierclasses at school. You might have been

Studying this chapter shouldenable you to: know what the subject of

economics is about; understand how economics is

linked with the study of economicactivities in consumption,production and distribution;

understand why knowledge ofstatistics can help in describingconsumption, production anddistribution;

learn about some uses ofstatistics in the understanding ofeconomic activities.

C H A P T E R

2 STATISTICS FOR ECONOMICS

When you are in a job, working forsome other person, and you get paidfor it (you may be employed bysomebody who pays you wages or asalary), you are called a service-holder.

When you provide some kind ofservice to others for a payment (youmay be a lawyer or a doctor or abanker or a taxi driver or a transporterof goods), you are called a service-provider.

In all these cases you will be calledgainfully employed in an economicactivity. Economic activities are onesthat are undertaken for a monetarygain. This is what economists meanby ordinary business of life.

Activities

List different activities of themembers of your family. Wouldyou call them economicactivities? Give reasons.

Do you consider yourself aconsumer? Why?

We cannot get something fornothing

If you ever heard the story of Aladdinand his Magic Lamp, you would agreethat Aladdin was a lucky guy.Whenever and whatever he wanted, hejust had to rub his magic lamp onwhen a genie appeared to fulfill hiswish. When he wanted a palace to livein, the genie instantly made one forhim. When he wanted expensive giftsto bring to the king when asking forhis daughters hand, he got them atthe bat of an eyelid.

In real life we cannot be as luckyas Aladdin. Though, like him we haveunlimited wants, we do not have amagic lamp. Take, for example, thepocket money that you get to spend.If you had more of it then you couldhave purchased almost all the thingsyou wanted. But since your pocketmoney is limited, you have to chooseonly those things that you want themost. This is a basic teaching ofEconomics.

Activities

Can you think for yourself ofsome other examples where aperson with a given income hasto choose which things and inwhat quantities he or she canbuy at the prices that are beingcharged (called the currentprices)?

What will happen if the currentprices go up?

Scarcity is the root of all economicproblems. Had there been no scarcity,there would have been no economicproblem. And you would not havestudied Economics either. In our dailylife, we face various forms of scarcity.The long queues at railway bookingcounters, crowded buses and trains,shortage of essential commodities, therush to get a ticket to watch a newfilm, etc., are all manifestations ofscarcity. We face scarcity because thethings that satisfy our wants arelimited in availability. Can you thinkof some more instances of scarcity?

The resources which the producershave are limited and also have

INTRODUCTION 3

alternative uses. Take the case of foodthat you eat every day. It satisfies yourwant of nourishment. Farmersemployed in agriculture raise cropsthat produce your food. At any pointof time, the resources in agriculturelike land, labour, water, fertiliser, etc.,are given. All these resources havealternative uses. The same resourcescan be used in the production of non-food crops such as rubber, cotton, juteetc. Thus alternative uses of resourcesgive rise to the problem of choicebetween different commodities thatcan be produced by those resources.

Activities

Identify your wants. How manyof them can you fulfill? Howmany of them are unfulfilled?Why you are unable to fulfillthem?

What are the different kinds ofscarcity that you face in yourdaily life? Identify their causes.

Consumption, Production andDistribution

If you thought about it, you mighthave realised that Economics involvesthe study of man engaged in economic

activities of various kinds. For this,you need to know reliable facts aboutall the diverse economic activities likeproduction, consumption anddistribution. Economics is oftendiscussed in three parts: consum-ption, production and distribution.

We want to know how theconsumer decides, given his incomeand many alternative goods to choosefrom, what to buy when he knows theprices. This is the study of Consum-ption.

We also want to know how theproducer, similarly, chooses what toproduce for the market when heknows the costs and prices. This is thestudy of Production.

Finally, we want to know how thenational income or the total incomearising from what has been producedin the country (called the GrossDomestic Product or GDP) isdistributed through wages (andsalaries), profits and interest (We willleave aside here income frominternational trade and investment).This is study of Distribution.

Besides these three conventionaldivisions of the study of Economicsabout which we want to know all thefacts, modern economics has toinclude some of the basic problemsfacing the country for special studies.

For example, you might want toknow why or to what extent somehouseholds in our society have thecapacity to earn much more thanothers. You may want to know howmany people in the country are really


poor, how many are middle-class, howmany are relatively rich and so on. Youmay want to know how many areilliterate, who will not get jobs,requiring education, how many arehighly educated and will have the bestjob opportunities and so on. In otherwords, you may want to know morefacts in terms of numbers that wouldanswer questions about poverty anddisparity in society. If you do not likethe continuance of poverty and grossdisparity and want to do somethingabout the ills of society you will needto know the facts about all thesethings before you can ask forappropriate actions by thegovernment. If you know the facts itmay also be possible to plan your ownlife better. Similarly, you hear of some of you may even haveexperienced disasters like Tsunami,earthquakes, the bird flu dangersthreatening our country and so onthat affect mans ordinary businessof life enormously. Economists canlook at these things provided theyknow how to collect and put togetherthe facts about what these disasterscost systematically and correctly. Youmay perhaps think about it and askyourselves whether it is right thatmodern economics now includeslearning the basic skills involved inmaking useful studies for measuringpoverty, how incomes are distributed,how earning opportunities are relatedto your education, how environmentaldisasters affect our lives and so on?

Obviously, if you think along theselines, you will also appreciate why weneeded Statistics (which is the study

of numbers relating to selected factsin a systematic form) to be added toall modern courses of moderneconomics.

Would you now agree with thefollowing definition of economics thatmany economists use?

Economics is the study of howpeople and society choose toemploy scarce resources that couldhave alternative uses in order toproduce various commodities thatsatisfy their wants and todistribute them for consumptionamong various persons and groupsin society.

Activity

Would you say, in the light of thediscussion above, that thisdefinition used to be given seemsa little inadequate now? Whatdoes it miss out?

2. STATISTICS IN ECONOMICSIn the previous section you were toldabout certain special studies thatconcern the basic problems facing acountry. These studies required thatwe know more about economic factsin terms of numbers. Such economicfacts are also known as data.

The purpose of collecting dataabout these economic problems is tounderstand and explain theseproblems in terms of the variouscauses behind them. In other words,we try to analyse them. For example,when we analyse the hardships ofpoverty, we try to explain it in termsof the various factors such as

INTRODUCTION 5

unemployment, low productivity ofpeople, backward technology, etc.

But, what purpose does theanalysis of poverty serve unless we areable to find ways to mitigate it. Wemay, therefore, also try to find thosemeasures that help solve an economicproblem. In Economics, suchmeasures are known as policies.

So, do you realise, then, that noanalysis of a problem would bepossible without the availability ofdata on various factors underlying aneconomic problem? And, that, in sucha situation, no policies can beformulated to solve it. If yes, then youhave, to a large extent, understood thebasic relationship between Economicsand Statistics.

3. WHAT IS STATISTICS?At this stage you are probably readyto know more about Statistics. Youmight very well want to know what thesubject Statistics is all about. Whatare its specific uses in Economics?Does it have any other meaning? Letus see how we can answer thesequestions to get closer to the subject.

In our daily language the wordStatistics is used in two distinctsenses: singular and plural. In theplural sense, statistics meansnumerical facts systematicallycollected as described by OxfordDictionary. Thus, the simple meaningof statistics in plural sense is data.

Do you know that the term statisticsin singular means the science ofcollecting, classifying and usingstatistics or a statistical fact.

By data or statistics, we mean bothquantitative and qualitative facts thatare used in Economics. For example,a statement in Economics like theproduction of rice in India hasincreased from 39.58 million tonnesin 197475 to 58.64 million tonnes in198485, is a quantitative fact. Thenumerical figures such as 39.58million tonnes and 58.64 milliontonnes are statistics of theproduction of rice in India for197475 and 198485 respectively.

In addition to the quantitativedata, Economics also uses qualitativedata. The chief characteristic of suchinformation is that they describeattributes of a single person or a groupof persons that is important to recordas accurately as possible even thoughthey cannot be measured inquantitative terms. Take, for example,gender that distinguishes a personas man/woman or boy/girl. It is oftenpossible (and useful) to state theinformation about an attribute of aperson in terms of degrees (like better/worse; sick/ healthy/ more healthy;unskilled/ skilled/ highly skilled etc.).Such qualitative information orstatistics is often used in Economicsand other social sciences andcollected and stored systematicallylike quantitative information (onprices, incomes, taxes paid etc.),whether for a single person or a groupof persons.

You will study in the subsequentchapters that statistics involvescollection and organisation of data. Thenext step is to present the data in


tabular, diagrammatic and graphicforms. The data, then, is summarisedby calculating various numericalindices such as mean, variance,standard deviation etc. that representthe broad characteristics of thecollected set of information.

Activities

Think of two examples ofqualitative and quantitative data.

Which of the following would giveyou qualitative data; beauty,intelligence, income earned,marks in a subject, ability tosing, learning skills?

4. WHAT STATISTICS DOES?By now, you know that Statistics isan indispensable tool for an economistthat helps him to understand aneconomic problem. Using its variousmethods, effort is made to find thecauses behind it with the help of thequalitative and the quantitative factsof the economic problem. Once thecauses of the problem are identified,it is easier to formulate certain policiesto tackle it.

But there is more to Statistics. Itenables an economist to presenteconomic facts in a precise anddefinite form that helps in propercomprehension of what is stated.When economic facts are expressed instatistical terms, they become exact.Exact facts are more convincing thanvague statements. For instance,saying that with precise figures, 310people died in the recent earthquakein Kashmir, is more factual and, thus,

a statistical data. Whereas, sayinghundreds of people died, is not.

Statistics also helps in condensingthe mass of data into a few numericalmeasures (such as mean, varianceetc., about which you will learn later).These numerical measures helpsummarise data. For example, itwould be impossible for you toremember the incomes of all thepeople in a data if the number ofpeople is very large. Yet, one canremember easily a summary figure likethe average income that is obtainedstatistically. In this way, Statisticssummarises and presents ameaningful overall information abouta mass of data.

Quite often, Statistics is used infinding relationships between differenteconomic factors. An economist maybe interested in finding out whathappens to the demand for acommodity when its price increasesor decreases? Or, would the supply ofa commodity be affected by thechanges in its own price? Or, wouldthe consumption expenditure increasewhen the average income increases?Or, what happens to the general pricelevel when the governmentexpenditure increases? Such ques-tions can only be answered if anyrelationship exists between thevarious economic factors that havebeen stated above. Whether suchrelationships exist or not can be easilyverified by applying statisticalmethods to their data. In some casesthe economist might assume certainrelationships between them and like

INTRODUCTION 7

to test whether the assumption she/he made about the relationship is validor not. The economist can do this onlyby using statistical techniques.

In another instance, the economistmight be interested in predicting thechanges in one economic factor dueto the changes in another factor. Forexample, she/he might be interestedin knowing the impact of todaysinvestment on the national income infuture. Such an exercise cannot beundertaken without the knowledge ofStatistics.

Sometimes, formulation of plansand policies requires the knowledgeof future trends. For example, an

consumption of past years or of recentyears obtained by surveys. Thus,statistical methods help formulateappropriate economic policies thatsolve economic problems.

5. CONCLUSIONToday, we increasingly use Statisticsto analyse serious economic problemssuch as rising prices, growingpopulation, unemployment, povertyetc., to find measures that can solvesuch problems. Further it also helpsevaluate the impact of such policiesin solving the economic problems. Forexample, it can be ascertained easily

economic planner has to decide in2005 how much the economy shouldproduce in 2010. In other words, onemust know what could be theexpected level of consumption in 2010in order to decide the production planof the economy for 2010. In thissituation, one might make subjectivejudgement based on the guess aboutconsumption in 2010. Alternatively,one might use statistical tools topredict consumption in 2010. Thatcould be based on the data of

using statistical techniques whetherthe policy of family planning iseffective in checking the problem ofever-growing population.

In economic policies, Statisticsplays a vital role in decision making.For example, in the present time ofrising global oil prices, it might benecessary to decide how much oilIndia should import in 2010. Thedecision to import would depend onthe expected domestic production ofoil and the likely demand for oil in

Statistical methods are no substitute for common sense!

There is an interesting story which is told to make fun of statistics. It is saidthat a family of four persons (husband, wife and two children) once set outto cross a river. The father knew the average depth of the river. So hecalculated the average height of his family members. Since the average heightof his family members was greater than the average depth of the river, hethought they could cross safely. Consequently some members of the family(children) drowned while crossing the river.Does the fault lie with the statistical method of calculating averages orwith the misuse of the averages?


Recap

Our wants are unlimited but the resources used in the productionof goods that satisfy our wants are limited and scarce. Scarcity isthe root of all economic problems.

Resources have alternative uses. Purchase of goods by consumers to satisfy their various needs is

Consumption. Manufacture of goods by producers for the market is Production. Division of the national income into wages, profits, rents and interests

is Distribution. Statistics finds economic relationships using data and verifies them. Statistical tools are used in prediction of future trends. Statistical methods help analyse economic problems and

formulate policies to solve them.

EXERCISES

1. Mark the following statements as true or false.(i) Statistics can only deal with quantitative data.(ii) Statistics solves economic problems.(iii) Statistics is of no use to Economics without data.

2. Make a list of activities that constitute the ordinary business of life. Arethese economic activities?

3. The Government and policy makers use statistical data to formulatesuitable policies of economic development. Illustrate with two examples.

4. You have unlimited wants and limited resources to satisfy them. Explainby giving two examples.

5. How will you choose the wants to be satisfied?

6. What are your reasons for studying Economics?

7. Statistical methods are no substitute for common sense. Comment.

2010. Without the use of Statistics, itcannot be determined what theexpected domestic production of oiland the likely demand for oil wouldbe. Thus, the decision to import oil

cannot be made unless we know theactual requirement of oil. This vitalinformation that help make thedecision to import oil can only beobtained statistically.

Collection of Data

1. INTRODUCTION

In the previous chapter, you have readabout what is economics. You alsostudied about the role and importanceof statistics in economics. In this

Studying this chapter should enableyou to: understand the meaning and

purpose of data collection; distinguish between primary and

secondary sources; know the mode of collection of data; distinguish between Census and

Sample Surveys; be familiar with the techniques of

sampling; know about some important

sources of secondary data.

chapter, you will study the sources ofdata and the mode of data collection.The purpose of collection of data is tocollect evidence for reaching a soundand clear solution to a problem.

In economics, you often comeacross a statement like,

After many fluctuations the outputof food grains rose to 176 million tonnesin 199091 and 199 million tonnes in199697, but fell to 194 million tonnesin 199798. Production of food grainsthen rose continuously and touched212 million tonnes in 200102.

In this statement, you can observethat the food grains production indifferent years does not remain thesame. It varies from year to year andfrom crop to crop. As these values

C H A P T E R

1 0 STATISTICS FOR ECONOMICS

vary, they are called variable. Thevariables are generally represented bythe letters X, Y or Z. The values ofthese variables are the observation.For example, suppose the food grainproduction in India varies between100 million tonnes in 197071 to 220million tonnes in 200102 as shownin the following table. The years arerepresented by variable X and theproduction of food grain in India (inmillion tonnes) is represented byvariable Y:

TABLE 2.1Production of Food Grain in India

(Million Tonnes)

X Y

197071 108197879 132197980 108199091 176199697 199199798 194200102 212

Here, these values of the variablesX and Y are the data, from which wecan obtain information about thetrend of the production of food grainsin India. To know the fluctuations inthe output of food grains, we need thedata on the production of food grainsin India. Data is a tool, which helpsin understanding problems byproviding information.

You must be wondering where dodata come from and how do we collectthese? In the following sections we willdiscuss the types of data, method andinstruments of data collection andsources of obtaining data.

2. WHAT ARE THE SOURCES OF DATA?

Statistical data can be obtained fromtwo sources. The enumerator (personwho collects the data) may collect thedata by conducting an enquiry or aninvestigation. Such data are calledPrimary Data, as they are based onfirst hand information. Suppose, youwant to know about the popularity ofa film star among school students. Forthis, you will have to enquire from alarge number of school students, byasking questions from them to collectthe desired information. The data youget, is an example of primary data.

If the data have been collected andprocessed (scrutinised and tabulated)by some other agency, they are calledSecondary Data. Generally, thepublished data are secondary data.They can be obtained either frompublished sources or from any othersource, for example, a web site. Thus,the data are primary to the source thatcollects and processes them for thefirst time and secondary for all sourcesthat later use such data. Use ofsecondary data saves time and cost.For example, after collecting the dataon the popularity of the film staramong students, you publish a report.If somebody uses the data collectedby you for a similar study, it becomessecondary data.

3. HOW DO WE COLLECT THE DATA?Do you know how a manufacturerdecides about a product or how apolitical party decides about acandidate? They conduct a survey by

COLLECTION OF DATA 1 1

asking questions about a particularproduct or candidate from a largegroup of people. The purpose ofsurveys is to describe somecharacteristics like price, quality,usefulness (in case of the product) andpopularity, honesty, loyalty (in caseof the candidate). The purpose of thesurvey is to collect data. Survey is amethod of gathering information fromindividuals.

Preparation of Instrument

The most common type of instrumentused in surveys is questionnaire/interview schedule. The questionnaireis either self administered by therespondent or administered by theresearcher (enumerator) or trainedinvestigator. While preparing thequestionnaire/interview schedule, youshould keep in mind the followingpoints;

The questionnaire should not be toolong. The number of questionsshould be as minimum as possible.Long questionnaires discouragepeople from completing them.

The series of questions should movefrom general to specific. Thequestionnaire should start fromgeneral questions and proceed tomore specific ones. This helps therespondents feel comfortable. Forexample:

Poor Q(i) Is increase in electricity charges

justified?(ii) Is the electricity supply in your

locality regular?

Good Q(i) Is the electricity supply in your

locality regular?(ii) Is increase in electricity charges

justified? The questions should be precise

and clear. For example,Poor QWhat percentage of your income doyou spend on clothing in order to lookpresentable?Good QWhat percentage of your income doyou spend on clothing?

The questions should not beambiguous, to enable the respon-dents to answer quickly, correctlyand clearly. For example:

Poor QDo you spend a lot of money on booksin a month?Good QHow much do you spend on books ina month?(i) Less than Rs 200(ii) Between Rs 200300(iii)Between Rs 300400(iv) More than Rs 400 The question should not use double

negatives. The questions startingwith Wouldnt you or Dont youshould be avoided, as they maylead to biased responses. Forexample:

Poor QDont you think smoking should beprohibited?Good QDo you think smoking should beprohibited?


The question should not be aleading question, which gives a clueabout how the respondent shouldanswer. For example:

Poor QHow do you like the flavour of thishigh-quality tea?Good QHow do you like the flavour of this tea?

The question should not indicatealternatives to the answer. Forexample:

Poor QWould you like to do a job after collegeor be a housewife?Good QWould you like to do a job, if possible?

The questionnaire may consist ofclosed ended (or structured) questionsor open ended (or unstructured)questions.

Closed ended or structuredquestions can either be a two-wayquestion or a multiple choice question.When there are only two possibleanswers, yes or no, it is called a two-way question.

When there is a possibility of morethan two options of answers, multiplechoice questions are more appropriate.Example,Q. Why did you sell your land?

(i) To pay off the debts.(ii) To finance childrens educa-

tion.(iii) To invest in another property.(iv) Any other (please specify).Closed -ended questions are easy

to use, score and code for analysis,

because all the respondents respondfrom the given options. But they aredifficult to write as the alternativesshould be clearly written to representboth sides of the issue. There is alsoa possibility that the individuals trueresponse is not present among theoptions given. For this, the choice ofAny Other is provided, where therespondent can write a response,which was not anticipated by theresearcher. Moreover, anotherlimitation of multiple-choice questionsis that they tend to restrict theanswers by providing alternatives,without which the respondents mayhave answered differently.

Open-ended questions allow formore individualised responses, butthey are difficult to interpret and hardto score, since there are a lot ofvariations in the responses. Example,Q. What is your view about

globalisation?

Mode of Data Collection

Have you ever come across a televisionshow in which reporters ask questionsfrom children, housewives or generalpublic regarding their examinationperformance or a brand of soap or apolitical party? The purpose of askingquestions is to do a survey forcollection of data. There are threebasic ways of collecting data: (i)Personal Interviews, (ii) Mailing(questionnaire) Surveys, and (iii)Telephone Interviews.


Personal Interviews

This method is usedwhen the researcherhas access to all themembers. The resea-rcher (or investigator)conducts face to face interviews withthe respondents.

Personal interviews are preferreddue to various reasons. Personalcontact is made between therespondent and the interviewer. Theinterviewer has the opportunity ofexplaining the study and answeringany query of the respondents. Theinterviewer can request the respon-dent to expand on answers that areparticularly important. Misinterpre-tation and misunderstanding can beavoided. Watching the reactions of therespondents can provide supplemen-tary information.

Personal interview has somedemerits too. It is expensive, as itrequires trained interviewers. It takeslonger time to complete the survey.Presence of the researcher may inhibitrespondents from saying what theyreally think.

Mailing QuestionnaireWhen the data in a survey arecollected by mail, the questionnaire issent to each individualby mail with a requestto complete and returnit by a given date. Theadvantages of thismethod are that, it is

less expensive. It allows the researcherto have access to people in remoteareas too, who might be difficult toreach in person or by telephone. Itdoes not allow influencing of therespondents by the interviewer. It alsopermits the respondents to takesufficient time to give thoughtfulanswers to the questions. These daysonline surveys or surveys throughshort messaging service i.e. SMS havebecome popular. Do you know how anonline survey is conducted?

The disadvantages of mail surveyare that, there is less opportunity toprovide assistance in clarifyinginstructions, so there is a possibilityof misinterpretation of questions.Mailing is also likely to produce lowresponse rates due to certain factorssuch as returning the questionnairewithout completing it, not returningthe questionnaire at all, loss ofquestionnaire in the mail itself, etc.

Telephone Interviews

In a telephone interview, theinvestigator asks questions over the

telephone. The advan-tages of telephoneinterviews are that theyare cheaper thanpersonal interviews and

can be conducted in a shorter time.They allow the researcher to assist therespondent by clarifying thequestions. Telephone interview isbetter in the cases where therespondents are reluctant to answercertain questions in personalinterviews.


Activities

You have to collect informationfrom a person, who lives in aremote village of India. Whichmode of data collection will bethe most appropriate forcollecting information from him?

You have to interview the parentsabout the quality of teaching ina school. If the principal of theschool is present there, whattypes of problems can arise?

The disadvantage of this methodis access to people, as many peoplemay not own telephones. TelephoneInterviews also obstruct visualreactions of the respondents, whichbecomes helpful in obtaininginformation on sensitive issues.

Pilot Survey

Once the questionnaire is ready, it isadvisable to conduct a try-out with a

small group which is known as PilotSurvey or Pre-Testing of thequestionnaire. The pilot survey helpsin providing a preliminary idea aboutthe survey. It helps in pre-testing ofthe questionnaire, so as to know theshortcomings and drawbacks of thequestions. Pilot survey also helps inassessing the suitability of questions,clarity of instructions, performance ofenumerators and the cost and timeinvolved in the actual survey.

4. CENSUS AND SAMPLE SURVEYSCensus or Complete Enumeration

A survey, which includes everyelement of the population, is knownas Census or the Method of CompleteEnumeration. If certain agencies areinterested in studying the totalpopulation in India, they have toobtain information from all thehouseholds in rural and urban India.

Disadvantages

Most expensive Possibility of influencing

respondents More time taking.

Cannot be used by illiterates Long response time Does not allow explanation of

unambiguous questions Reactions cannot be watched.

Limited use Reactions cannot be watched Possibility of influencing respon-

dents.

Advantages

Highest Response Rate Allows use of all types of questions Better for using open-endedquestions

Allows clarification of ambiguousquestions.

Least expensive Only method to reach remote areas No influence on respondents Maintains anonymity of respondents Best for sensitive questions.

Relatively low cost Relatively less influence onrespondents

Relatively high response rate.


The essential feature of this methodis that this covers every individual unitin the entire population. You cannotselect some and leave out others. Youmay be familiar with the Census ofIndia, which is carried out every tenyears. A house-to-house enquiry iscarried out, covering all householdsin India. Demographic data on birthand death rates, literacy, workforce,life expectancy, size and compositionof population, etc. are collected andpublished by the Registrar General ofIndia. The last Census of India washeld in February 2001.

According to the Census 2001,population of India is 102.70 crore. Itwas 23.83 crore according to Census1901. In a period of hundred years,the population of our countryincreased by 78.87 crore. Census

1981 indicated that the rate ofpopulation growth during 1960s and1970s remained almost same. 1991Census indicated that the annualgrowth rate of population during1980s was 2.14 per cent, which camedown to 1.93 per cent during 1990saccording to Census 2001.

At 00.00 hours of first March,2001 the population of India stoodat 1027,015,247 comprising of531,277,078 males and495,738,169 females. Thus, Indiabecomes the second country in theworld after China to cross the onebillion mark.

Source: Census of India, 2001.

Sample Survey

Population or the Universe in statisticsmeans totality of the items understudy. Thus, the Population or theUniverse is a group to which theresults of the study are intended toapply. A population is always all theindividuals/items who possess certaincharacteristics (or a set of characteris-

Source: Census of India, 2001.


Sample: Ten per cent of theagricultural labourers in Chura-chandpur district.

Most of the surveys are samplesurveys. These are preferred instatistics because of a number ofreasons. A sample can providereasonably reliable and accurateinformation at a lower cost andshorter time. As samples are smallerthan population, more detailedinformation can be collected byconducting intensive enquiries. As weneed a smaller team of enumerators,it is easier to train them and supervisetheir work more effectively.

Now the question is how do youdo the sampling? There are two maintypes of sampling, random and non-random. The following description willmake their distinction clear.

Activities

In which years will the nextCensus be held in India andChina?

If you have to study the opinionof students about the neweconomics textbook of class XI,what will be your population andsample?

If a researcher wants to estimatethe average yield of wheat inPunjab, what will be her/hispopulation and sample?

Random Sampling

As the name suggests, randomsampling is one where the individualunits from the population (samples)are selected at random. Thegovernment wants to determine the

tics), according to the purpose of thesurvey. The first task in selecting asample is to identify the population.Once the population is identified, theresearcher selects a RepresentativeSample, as it is difficult to study theentire population. A sample refers toa group or section of the populationfrom which information is to beobtained. A good sample (represen-tative sample) is generally smaller thanthe population and is capable ofproviding reasonably accurateinformation about the population ata much lower cost and shorter time.

Suppose you want to study theaverage income of people in a certainregion. According to the Censusmethod, you would be required to findout the income of every individual inthe region, add them up and divideby number of individuals to get theaverage income of people in the region.This method would require hugeexpenditure, as a large number ofenumerators have to be employed.Alternatively, you select a represent-ative sample, of a few individuals, fromthe region and find out their income.The average income of the selectedgroup of individuals is used as anestimate of average income of theindividuals of the entire region.

Example

Research problem: To study theeconomic condition of agriculturallabourers in Churachandpur districtof Manipur. Population: All agriculturallabourers in Churachandpur district.


impact of the rise in petrol price onthe household budget of a particularlocality. For this, a representative(random) sample of 30 households hasto be taken and studied. The namesof all the 300 households of that areaare written on pieces of paper andmixed well, then 30 names to beinterviewed are selected one by one.

In the random sampling, everyindividual has an equal chance of beingselected and the individuals who areselected are just like the ones who arenot selected. In the above example, allthe 300 sampling units (also calledsampling frame) of the population gotan equal chance of being included inthe sample of 30 units and hence thesample, such drawn, is a randomsample. This is also called lotterymethod. The same could be done usinga Random Number Table also.

How to use the Random NumberTables?

Do you know what are the RandomNumber Tables? Random number

tables have been generated toguarantee equal probability ofselection of every individual unit (bytheir listed serial number in thesampling frame) in the population.They are available either in apublished form or can be generatedby using appropriate softwarepackages (See Appendix B).You canstart using the table from anywhere,i.e., from any page, column, row orpoint. In the above example, you needto select a sample of 30 householdsout of 300 total households. Here, thelargest serial number is 300, a threedigit number and therefore we consultthree digit random numbers insequence. We will skip the randomnumbers greater than 300 since thereis no household number greater than300. Thus, the 30 selected householdsare with serial numbers: 149, 219,111, 165, 230, 007, 089, 212, 051,244, 300, 051, 244, 155, 300, 051,152, 156, 205, 070, 015, 157, 040,243, 479, 116, 122, 081, 160, 162.

Exit Polls

You must have seen that when anelection takes place, the televisionnetworks provide election coverage.They also try to predict the results.This is done through exit polls,wherein a random sample of voterswho exit the polling booths are askedwhom they voted for. From the dataof the sample of voters, theprediction is made.

A non RepresentativeSample

A RepresentativeSample

A Population of 20Kuchha and 20Pucca Houses


Activity

You have to analyse the trend offoodgrains production in Indiafor the last fifty years. As it isdifficult to include all the years,you have to select a sample ofproduction of ten years. Usingthe Random Number Tables,how will you select your sample?

Non-Random Sampling

There may be a situation that youhave to select 10 out of 100households in a locality. You have todecide which household to select andwhich to reject. You may select thehouseholds conveniently situated orthe households known to you or yourfriend. In this case, you are using yourjudgement (bias) in selecting 10households. This way of selecting 10out of 100 households is not a randomselection. In a non-random samplingmethod all the units of the populationdo not have an equal chance of beingselected and convenience or judgementof the investigator plays an importantrole in selection of the sample. They aremainly selected on the basis ofjudgment, purpose, convenience orquota and are non-random samples.

5. SAMPLING AND NON-SAMPLINGE R R O R S

Sampling Errors

The purpose of the sample is to takean estimate of the population.Sampling error refers to thedifferences between the sampleestimate and the actual value of a

characteristic of the population (thatmay be the average income, etc.). It isthe error that occurs when you makean observation from the sample takenfrom the population. Thus, thedifference between the actual value ofa parameter of the population (whichis not known) and its estimate (fromthe sample) is the sampling error. It ispossible to reduce the magnitude ofsampling error by taking a largersample.

Example

Consider a case of incomes of 5farmers of Manipur. The variable x(income of farmers) has measure-ments 500, 550, 600, 650, 700. Wenote that the population average of( 5 0 0 + 5 5 0 + 6 0 0 + 6 5 0 + 7 0 0 ) 5 = 3000 5 = 600.

Now, suppose we select a sampleof two individuals where x hasmeasurements of 500 and 600. Thesample average is (500 + 600) 2= 1100 2 = 550.Here, the sampling error of theestimate = 600 (true value) 550(estimate) = 50.

Non-Sampling Errors

Non-sampling errors are more seriousthan sampling errors because asampling error can be minimised bytaking a larger sample. It is difficultto minimise non-sampling error, evenby taking a large sample. Even aCensus can contain non-samplingerrors. Some of the non-samplingerrors are:


Errors in Data Acquisition

This type of error arises from recordingof incorrect responses. Suppose, theteacher asks the students to measurethe length of the teachers table in theclassroom. The measurement by thestudents may differ. The differencesmay occur due to differences inmeasuring tape, carelessness of thestudents etc. Similarly, suppose wewant to collect data on prices oforanges. We know that prices varyfrom shop to shop and from marketto market. Prices also vary accordingto the quality. Therefore, we can onlyconsider the average prices. Recordingmistakes can also take place as theenumerators or the respondents maycommit errors in recording or trans-scripting the data, for example, he/she may record 13 instead of 31.

Non-Response Errors

Non-response occurs if an intervieweris unable to contact a person listed inthe sample or a person from thesample refuses to respond. In thiscase, the sample observation may notbe representative.

Sampling Bias

Sampling bias occurs when thesampling plan is such that somemembers of the target populationcould not possibly be included in thesample.

6. CENSUS OF INDIA AND NSSOThere are some agencies both at thenational and state level, which collect,

process and tabulate the statisticaldata. Some of the major agencies atthe national level are Census of India,National Sample Survey Organisation(NSSO), Central Statistical Organisa-tion (CSO), Registrar General of India(RGI), Directorate General ofCommercial Intelligence and Statistics(DGCIS), Labour Bureau etc.

The Census of India provides themost complete and continuousdemographic record of population. TheCensus is being regularly conductedevery ten years since 1881. The firstCensus after Independence was heldin 1951. The Census collectsinformation on various aspects ofpopulation such as the size, density,sex ratio, literacy, migration, rural-urban distribution etc. Census inIndia is not merely a statisticaloperation, the data is interpreted andanalysed in an interesting manner.

The NSSO was established by thegovernment of India to conductnation-wide surveys on socio-economic issues. The NSSO doescontinuous surveys in successiverounds. The data collected by NSSOsurveys, on different socio economicsubjects, are released through reportsand its quarterly journalSarvekshana. NSSO provides periodicestimates of literacy, schoolenrolment, utilisation of educationalservices, employment, unemployment,manufacturing and service sectorenterprises, morbidity, maternity,child care, utilisation of the publicdistribution system etc. The NSS 59thround survey (JanuaryDecember


Recap

Data is a tool which helps in reaching a sound conclusion on anyproblem by providing information.

Primary data is based on first hand information. Survey can be done by personal interviews, mailing questionnaires

and telephone interviews. Census covers every individual/unit belonging to the population. Sample is a smaller group selected from the population from which

the relevant information would be sought. In a random sampling, every individual is given an equal chance of

being selected for providing information. Sampling error arises due to the difference between the actual

population and the estimate. Non-sampling errors can arise in data acquisition, by non-response

or by bias in selection. Census of India and National Sample Survey Organisation

are two important agencies at the national level, which collect,process and tabulate data.

EXERCISES

1. Frame at least four appropriate multiple-choice options for followingquestions:(i) Which of the following is the most important when you buy a new

dress?

2003) was on land and livestockholdings, debt and investment. TheNSS 60th round survey (JanuaryJune 2004) was on morbidity andhealth care. The NSSO alsoundertakes the fieldwork of Annualsurvey of industries, conducts cropestimation surveys, collects rural andurban retail prices for compilation ofconsumer price index numbers.

7. CONCLUSIONEconomic facts, expressed in terms ofnumbers, are called data. The purpose

of data collection is to understand,explain and analyse a problem andcauses behind it. Primary data isobtained by conducting a survey.Survey includes various steps, whichneed to be planned carefully. There arevarious agencies which collect,process, tabulate and publishstatistical data. These can be used assecondary data. However, the choiceof source of data and mode of datacollection depends on the objective ofthe study.


(ii) How often do you use computers?(iii) Which of the newspapers do you read regularly?(iv) Rise in the price of petrol is justified.(v) What is the monthly income of your family?

2. Frame five two-way questions (with Yes or No). 3. (i) There are many sources of data (true/false).

(ii) Telephone survey is the most suitable method of collecting data, whenthe population is literate and spread over a large area (true/false).

(iii) Data collected by investigator is called the secondary data (true/false).(iv) There is a certain bias involved in the non-random selection of samples

(true/false).(v) Non-sampling errors can be minimised by taking large samples (true/

false). 4. What do you think about the following questions. Do you find any problem

with these questions? If yes, how?(i) How far do you live from the closest market?(ii) If plastic bags are only 5 percent of our garbage, should it be banned?(iii) Wouldnt you be opposed to increase in price of petrol?(iv) (a) Do you agree with the use of chemical fertilizers?

(b) Do you use fertilizers in your fields?(c) What is the yield per hectare in your field?

5. You want to research on the popularity of Vegetable Atta Noodles amongchildren. Design a suitable questionnaire for collecting this information.

6. In a village of 200 farms, a study was conducted to find the croppingpattern. Out of the 50 farms surveyed, 50% grew only wheat. Identify thepopulation and the sample here.

7. Give two examples each of sample, population and variable.

8. Which of the following methods give better results and why?(a) Census (b) Sample

9. Which of the following errors is more serious and why?

(a) Sampling error (b) Non-Sampling error10. Suppose there are 10 students in your class. You want to select three out

of them. How many samples are possible?

11. Discuss how you would use the lottery method to select 3 students out of10 in your class?

12. Does the lottery method always give you a random sample? Explain.

13. Explain the procedure of selecting a random sample of 3 students out of10 in your class, by using random number tables.

14. Do samples provide better results than surveys? Give reasons for youranswer.

between census and sampling. In thischapter, you will know how the data,that you collected, are to be classified.The purpose of classifying raw data isto bring order in them so that theycan be subjected to further statisticalanalysis easily.

Have you ever observed your localjunk dealer or kabadiwallah to whomyou sell old newspapers, brokenhousehold items, empty glass bottles,plastics etc. He purchases thesethings from you and sells them tothose who recycle them. But with somuch junk in his shop it would be verydifficult for him to manage his trade,if he had not organised them properly.To ease his situation he suitablygroups or classifies various junk.He puts old newspapers together and

Organisation of Data

1. INTRODUCTIONIn the previous chapter you havelearnt about how data is collected. Youalso came to know the difference

Studying this chapter should enableyou to: classify the data for further

statistical analysis; distinguish between quantitative

and qualitative classification; prepare a frequency distribution

table; know the technique of forming

classes; be familiar with the method of tally

marking; differentiate between univariate

and bivariate frequency distribu-tions.

CHAPTER

ORGANISATION OF DATA 2 3

ties them with a rope. Then collectsall empty glass bottles in a sack. Heheaps the articles of metals in onecorner of his shop and sorts them intogroups like iron, copper,aluminium, brass etc., and so on.In this way he groups his junk intodifferent classes newspapers,plastics, glass, metals etc. andbrings order in them. Once his junkis arranged and classified, it becomeseasier for him to find a particular itemthat a buyer may demand.

Likewise when you arrange yourschoolbooks in a certain order, itbecomes easier for you to handlethem. You may classify them

according to subjects where eachsubject becomes a group or a class.So, when you need a particular bookon history, for instance, all you needto do is to search that book in thegroup History. Otherwise, youwould have to search through yourentire collection to find the particularbook you are looking for.

While classification of objects orthings saves our valuable time andeffort, it is not done in an arbitrary

manner. The kabadiwallah groups hisjunk in such a way that each groupconsists of similar items. For example,under the group Glass he would putempty bottles, broken mirrors andwindowpanes etc. Similarly when youclassify your history books under thegroup History you would not put abook of a different subject in thatgroup. Otherwise the entire purposeof grouping would be lost.Classification, therefore, is arrangingor organising similar things into groupsor classes.

Activity

Visit your local post-office to findout how letters are sorted. Doyou know what the pin-code in aletter indicates? Ask yourpostman.

2. RAW DATA

Like the kabadiwallahs junk, theunclassified data or raw data arehighly disorganised. They are oftenvery large and cumbersome to handle.To draw meaningful conclusions fromthem is a tedious task because theydo not yield to statistical methodseasily. Therefore proper organisationand presentation of such data isneeded before any systematicstatistical analysis is undertaken.Hence after collecting data the nextstep is to organise and present themin a classified form.

Suppose you want to know theperformance of students inmathematics and you have collecteddata on marks in mathematics of 100


students of your school. If you presentthem as a table, they may appearsomething like Table 3.1.

TABLE 3.1Marks in Mathematics Obtained by 100

Students in an Examination

47 45 10 60 51 56 66 100 49 4060 59 56 55 62 48 59 55 51 4142 69 64 66 50 59 57 65 62 5064 30 37 75 17 56 20 14 55 9062 51 55 14 25 34 90 49 56 5470 47 49 82 40 82 60 85 65 6649 44 64 69 70 48 12 28 55 6549 40 25 41 71 80 0 56 14 2266 53 46 70 43 61 59 12 30 3545 44 57 76 82 39 32 14 90 25

Or you could have collected dataon the monthly expenditure on foodof 50 households in yourneighbourhood to know their averageexpenditure on food. The datacollected, in that case, had you

presented as a table, would haveresembled Table 3.2. Both Tables 3.1and 3.2 are raw or unclassified data.In both the tables you find thatnumbers are not arranged in anyorder. Now if you are asked what arethe highest marks in mathematics

TABLE 3.2Monthly Household Expenditure (inRupees) on Food of 50 Households

1904 1559 3473 1735 27602041 1612 1753 1855 44395090 1085 1823 2346 15231211 1360 1110 2152 11831218 1315 1105 2628 27124248 1812 1264 1183 11711007 1180 1953 1137 20482025 1583 1324 2621 36761397 1832 1962 2177 25751293 1365 1146 3222 1396

from Table 3.1 then you have to firstarrange the marks of 100 studentseither in ascending or in descendingorder. That is a tedious task. Itbecomes more tedious, if instead of100 you have the marks of a 1,000students to handle. Similarly in Table3.2, you would note that it is difficultfor you to ascertain the averagemonthly expenditure of 50households. And this difficulty will goup manifold if the number was larger say, 5,000 households. Like ourkabadiwallah, who would bedistressed to find a particular itemwhen his junk becomes large anddisarranged, you would face a similarsituation when you try to get anyinformation from raw data that arelarge. In one word, therefore, it is atedious task to pull information fromlarge unclassified data.

The raw data are summarised, andmade comprehensible by classifi-cation. When facts of similarcharacteristics are placed in the sameclass, it enables one to locate themeasily, make comparison, and drawinferences without any difficulty. You


have studied in Chapter 2 that theGovernment of India conducts Censusof population every ten years. The rawdata of census are so large andfragmented that it appears an almostimpossible task to draw anymeaningful conclusion from them.But when the data of Census areclassified according to gender,education, marital status, occupation,etc., the structure and nature ofpopulation of India is, then, easilyunderstood.

The raw data consist ofobservations on variables. Each unitof raw data is an observation. In Table3.1 an observation shows a particularvalue of the variable marks of astudent in mathematics. The rawdata contain 100 observations onmarks of a student since there are100 students. In Table 3.2 it shows aparticular value of the variablemonthly expenditure of a householdon food. The raw data in it contain50 observations on monthlyexpenditure on food of a householdbecause there are 50 households.

Activity

Collect data of total weeklyexpenditure of your family for ayear and arrange it in a table.See how many observations youhave. Arrange the data monthlyand find the number ofobservations.

3. CLASSIFICATION OF DATAThe groups or classes of aclassification can be done in various

ways. Instead of classifying your booksaccording to subjects History,Geography, Mathematics, Scienceetc. you could have classified themauthor-wise in an alphabetical order.Or, you could have also classified themaccording to the year of publication.The way you want to classify themwould depend on your requirement.

Likewise the raw data could beclassified in various ways dependingon the purpose in hand. They can begrouped according to time. Such aclassification is known as aChronological Classification. Insuch a classification, data areclassified either in ascending or indescending order with reference totime such as years, quarters, months,weeks, etc. The following exampleshows the population of Indiaclassified in terms of years. Thevariable population is a Time Seriesas it depicts a series of values fordifferent years.

Example 1

Population of India (in crores)

Year Population (Crores)

1951 35.71961 43.81971 54.61981 68.41991 81.82001 102.7

In Spatial Classification the dataare classified with reference togeographical locations such ascountries, states, cities, districts, etc.Example 2 shows the yield of wheat indifferent countries.


Example 2Yield of Wheat for Different Countries

Country Yield of wheat (kg/acre)America 1925Brazil 127China 893Denmark 225France 439India 862

Activities

In the time-series of Example 1,in which year do you find thepopulation of India to be theminimum. Find the year when itis the maximum.

In Example 2, find the countrywhose yield of wheat is slightlymore than that of Indias. Howmuch would that be in terms ofpercentage?

Arrange the countries ofExample 2 in the ascendingorder of yield. Do the sameexercise for the descending orderof yield.

Sometimes you come acrosscharacteristics that cannot beexpressed quantitatively. Suchcharacteristics are called Qualities orAttributes. For example, nationality,literacy, religion, gender, maritalstatus, etc. They cannot be measured.Yet these attributes can be classified

on the basis of either the presence orthe absence of a qualitativecharacteristic. Such a classification ofdata on attributes is called aQualitative Classification. In thefollowing example, we find populationof a country is grouped on the basisof the qualitative variable gender. Anobservation could either be a male ora female. These two characteristicscould be further classified on the basisof marital status (a qualitativevariable) as given below:

Example 3Population

Male Female

Married Unmarried Married Unmarried

The classification at the first stageis based on the presence and absenceof an attribute i.e. male or not male(female). At the second stage, eachclass male and female, is further subdivided on the basis of the presence orabsence of another attribute i.e.whether married or unmarried. On the

Activity

The objects around can begrouped as either living or non-living. Is it a quantitativeclassification?

other hand, characteristics like height,weight, age, income, marks ofstudents, etc. are quantitative innature. When the collected data ofsuch characteristics are grouped into


classes, the classification is aQuantitative Classification.

Example 4Frequency Distribution of Marks in

Mathematics of 100 Students

Marks Frequency

010 11020 82030 63040 74050 215060 236070 197080 68090 590100 4

Total 100

Example 4 shows quantitativeclassification of the data of marks inmathematics of 100 students given inTable 3.1 as a Frequency Distribution.

Activity

Express the values of frequencyof Example 4 as proportion orpercentage of total frequency.Note that frequency expressed inthis way is known as relativefrequency.

In Example 4, which class hasthe maximum concentration ofdata? Express it as percentageof total observations. Which classhas the minimum concentrationof data?

4. VARIABLES: CONTINUOUS ANDDISCRETE

A simple definition of variable,which you have read in the last

chapter, does not tell you how it varies.Different variables vary differently anddepending on the way they vary, theyare broadly classified into two types:

(i) Continuous and(ii) Discrete.

A continuous variable can take anynumerical value. It may take integralvalues (1, 2, 3, 4, ...), fractional values(1/2, 2/3, 3/4, ...), and values thatare not exact fractions ( 2 =1.414,

3 =1.732, , 7 =2.645). Forexample, the height of a student, ashe/she grows say from 90 cm to 150cm, would take all the values inbetween them. It can take values thatare whole numbers like 90cm, 100cm,108cm, 150cm. It can also takefractional values like 90.85 cm, 102.34cm, 149.99cm etc. that are not wholenumbers. Thus the variable height

is capable ofmanifesting inevery conceivablevalue and itsvalues can also

be broken down into infinitegradations. Other examples of acontinuous variable are weight, time,distance, etc.

Unlike a continuous variable, adiscrete variable can take only certainvalues. Its value changes only by finitejumps. It jumps from one value toanother but does not take anyintermediate value between them. Forexample, a variable like the numberof students in a class, for differentclasses, would assume values that areonly whole numbers. It cannot take


any fractional value like0.5 because half of astudent is absurd.Therefore it cannot take avalue like 25.5 between 25and 26. Instead its valuecould have been either 25or 26. What we observe isthat as its value changesfrom 25 to 26, the valuesin between them the fractions arenot taken by it. But do not have theimpression that a discrete variablecannot take any fractional value.Suppose X is a variable that takesvalues like 1/8, 1/16, 1/32, 1/64, ...Is it a discrete variable? Yes, becausethough X takes fractional values itcannot take any value between twoadjacent fractional values. It changesor jumps from 1/8 to 1/16 and from1/16 to 1/32. But cannot take a valuein between 1/8 and 1/16 or between1/16 and 1/32

Activity

Distinguish the followingvariables as continuous anddiscrete:Area, volume, temperature,number appearing on a dice,crop yield, population, rainfall,number of cars on road, age.

Earlier we have mentioned thatexample 4 is the frequencydistribution of marks in mathematicsof 100 students as shown in Table 3.1.It shows how the marks of 100students are grouped into classes. Youwill be wondering as to how we got itfrom the raw data of Table 3.1. But,

before we address this question, youmust know what a frequencydistribution is.

5. WHAT IS A FREQUENCY DISTRIBUTION?

A frequency distribution is acomprehensive way to classify rawdata of a quantitative variable. Itshows how the different values of avariable (here, the marks inmathematics scored by a student) aredistributed in different classes alongwith their corresponding classfrequencies. In this case we have tenclasses of marks: 010, 1020, , 90100. The term Class Frequency meansthe number of values in a particularclass. For example, in the class 3040 we find 7 values of marks from rawdata in Table 3.1. They are 30, 37, 34,30, 35, 39, 32. The frequency of theclass: 3040 is thus 7. But you mightbe wondering why 40which isoccurring twice in the raw data isnot included in the class 3040. Hadit been included the class frequencyof 3040 would have been 9 insteadof 7. The puzzle would be clear to youif you are patient enough to read thischapter carefully. So carry on. You willfind the answer yourself.

Each class in a frequencydistribution table is bounded by ClassLimits. Class limits are the two endsof a class. The lowest value is calledthe Lower Class Limit and the highestvalue the Upper Class Limit. Forexample, the class limits for the class:6070 are 60 and 70. Its lower classlimit is 60 and its upper class limit is70. Class Interval or Class Width is


the difference between the upper classlimit and the lower class limit. For theclass 6070, the class interval is 10(upper class limit minus lower classlimit).

The Class Mid-Point or Class Markis the middle value of a class. It lieshalfway between the lower class limitand the upper class limit of a classand can be ascertained in thefollowing manner:

Class Mid-Point or Class Mark =(Upper Class Limit + Lower ClassLimit) / 2 .....................................(1)

The class mark or mid-value ofeach class is used to represent theclass. Once raw data are grouped intoclasses, individual observations arenot used in further calculations.Instead, the class mark is used.

TABLE 3.3The Lower Class Limits, the Upper Class

Limits and the Class Mark

Class Frequency Lower Upper ClassClass Class MarksLimit Limit

010 1 0 10 51020 8 10 20 152030 6 20 30 253040 7 30 40 354050 21 40 50 455060 23 50 60 556070 19 60 70 657080 6 70 80 758090 5 80 90 8590100 4 90 100 95

Frequency Curve is a graphicrepresentation of a frequencydistribution. Fig. 3.1 shows thediagrammatic presentation of the

frequency distribution of the data inour example above. To obtain thefrequency curve we plot the classmarks on the X-axis and frequency onthe Y-axis.

Fig. 3.1: Diagrammatic Presentation ofFrequency Distribution of Data.

How to prepare a FrequencyDistribution?

While preparing a frequencydistribution from the raw data of Table3.1, the following four questions needto be addressed:1. How many classes should we

have?2. What should be the size of each

class?3. How should we determine the class

limits?4. How should we get the frequency

for each class?

How many classes should we have?

Before we determine the numberof classes, we first find out as to whatextent the variable in hand changesin value. Such variations in variablesvalue are captured by its range. TheRange is the difference between thelargest and the smallest values of the


variable. A large range indicates thatthe values of the variable are widelyspread. On the other hand, a smallrange indicates that the values of thevariable are spread narrowly. In ourexample the range of the variablemarks of a student are 100 becausethe minimum marks are 0 and themaximum marks 100. It indicates thatthe variable has a large variation.

After obtaining the value of range,it becomes easier to determine thenumber of classes once we decide theclass interval. Note that range is thesum of all class intervals. If the classintervals are equal then range is theproduct of the number of classes andclass interval of a single class.

Range = Number of Classes ClassInterval ........................................(2)

Activities

Find the range of the following: population of India in Example 1, yield of wheat in Example 2.

Given the value of range, thenumber of classes would be large ifwe choose small class intervals. Afrequency distribution with too manyclasses would look too large. Such adistribution is not easy to handle. Sowe want to have a reasonably compactset of data. On the other hand, giventhe value of range if we choose a classinterval that is too large then thenumber of classes becomes too small.The data set then may be too compactand we may not like the loss ofinformation about its diversity. For

example, suppose the range is 100and the class interval is 50. Then thenumber of classes would be just 2(i.e.100/50 = 2). Though there is nohard-and-fast rule to determine thenumber of classes, the rule of thumboften used is that the number ofclasses should be between 5 and 15.In our example we have chosen tohave 10 classes. Since the range is 100and the class interval is 10, thenumber of classes is 100/10 =10.

What should be the size of eachclass?

The answer to this question dependson the answer to the previousquestion. The equality (2) shows thatgiven the range of the variable, we candetermine the number of classes oncewe decide the class interval. Similarly,we can determine the class intervalonce we decide the number of classes.Thus we find that these two decisionsare inter-linked with one another. Wecannot decide on one without decidingon the other.

In Example 4, we have the numberof classes as 10. Given the value ofrange as 100, the class intervals areautomatically 10 by the equality (2).Note that in the present context wehave chosen class intervals that areequal in magnitude. However we couldhave chosen class intervals that arenot of equal magnitude. In that case,the classes would have been ofunequal width.


How should we determine the classlimits?

When we classify raw data of acontinuous variable as a frequencydistribution, we in effect, group theindividual observations into classes.The value of the upper class limit of aclass is obtained by adding the classinterval with the value of the lowerclass limit of that class. For example,the upper class limit of the class 2030 is 20 + 10 = 30 where 20 is thelower class limit and 10 is the classinterval. This method is repeated forother classes as well.

But how do we decide the lowerclass limit of the first class? That is tosay, why 0 is the lower class limit ofthe first class: 010? It is because wechose the minimum value of thevariable as the lower limit of the firstclass. In fact, we could have chosen avalue less than the minimum value ofthe variable as the lower limit of thefirst class. Similarly, for the upperclass limit for the last class we couldhave chosen a value greater than themaximum value of the variable. It isimportant to note that, when afrequency distribution is beingconstructed, the class limits shouldbe so chosen that the mid-point orclass mark of each class coincide, asfar as possible, with any value aroundwhich the data tend to beconcentrated.

In our example on marks of 100students, we chose 0 as the lower limitof the first class: 010 because theminimum marks were 0. And that iswhy, we could not have chosen 1 as

the lower class limit of that class. Hadwe done that we would have excludedthe observation 0. The upper classlimit of the first class: 010 is thenobtained by adding class interval withlower class limit of the class. Thus theupper class limit of the first classbecomes 0 + 10 = 10. And this proce-dure is followed for the other classesas well.

Have you noticed that the upperclass limit of the first class is equal tothe lower class limit of the secondclass? And both are equal to 10. Thisis observed for other classes as well.Why? The reason is that we have usedthe Exclusive Method of classificationof raw data. Under the method weform classes in such a way that thelower limit of a class coincides withthe upper class limit of the previousclass.

The problem, we would face next,is how do we classify an observationthat is not only equal to the upperclass limit of a particular class but isalso equal to the lower class limit ofthe next class. For example, we findobservation 30 to be equal to theupper class limit of the class 2030and it is equal to the lower class limitof class 3040. Then, in which of thetwo classes: 2030 or 3040 shouldwe put the observation 30? We can putit either in class 2030 or in class 3040. It is a dilemma that one commonlyfaces while classifying data inoverlapping classes. This problem issolved by the rule of classification inthe Exclusive Method.


Exclusive Method

The classes, by this method, areformed in such a way that the upperclass limit of one class equals thelower class limit of the next class. Inthis way the continuity of the data ismaintained. That is why this methodof classification is most suitable incase of data of a continuous variable.Under the method, the upper class limitis excluded but the lower class limit ofa class is included in the interval. Thusan observation that is exactly equalto the upper class limit, according tothe method, would not be included inthat class but would be included inthe next class. On the other hand, ifit were equal to the lower class limitthen it would be included in that class.In our example on marks of students,the observation 40, that occurs twice,in the raw data of Table 3.1 is notincluded in the class: 3040. It isincluded in the next class: 4050. Thatis why we find the frequency corres-ponding to the class 3040 to be 7instead of 9.

There is another method of formingclasses and it is known as theInclusive Method of classification.

Inclusive Method

In comparison to the exclusive method,the Inclusive Method does not excludethe upper class limit in a classinterval. It includes the upper classin a class. Thus both class limits areparts of the class interval.

For example, in the frequencydistribution of Table 3.4 we include

TABLE 3.4Frequency Distribution of Incomes of 550

Employees of a Company

Income (Rs) Number of Employees800899 50900999 10010001099 20011001199 15012001299 4013001399 10

Total 550

in the class: 800899 those employeeswhose income is either Rs 800, orbetween Rs 800 and Rs 899, or Rs899. If the income of an employee isexactly Rs 900 then he is put in thenext class: 900999.

Adjustment in Class IntervalA close observation of the InclusiveMethod in Table 3.4 would show thatthough the variable income is acontinuous variable, no suchcontinuity is maintained when theclasses are made. We find gap ordiscontinuity between the upper limitof a class and the lower limit of thenext class. For example, between theupper limit of the first class: 899 andthe lower limit of the second class:900, we find a gap of 1. Then howdo we ensure the continuity of thevariable while classifying data? Thisis achieved by making an adjustmentin the class interval. The adjustmentis done in the following way:1. Find the difference between the

lower limit of the second class andthe upper limit of the first class.For example, in Table 3.4 the lowerlimit of the second class is 900 and


the upper limit of the first class is899. The difference between themis 1, i.e. (900 899 = 1)

2. Divide the difference obtained in(1) by two i.e. (1/2 = 0.5)

3. Subtract the value obtained in (2)from lower limits of all classes(lower class limit 0.5)

4. Add the value obtained in (2) toupper limits of all classes (upperclass limit + 0.5).After the adjustment that restores

continuity of data in the frequencydistribution, the Table 3.4 is modifiedinto Table 3.5

After the adjustments in classlimits, the equality (1) that determinesthe value of class-mark would bemodified as the following:

Adjusted Class Mark = (AdjustedUpper Class Limit + Adjusted LowerClass Limit)/2.

TABLE 3.5Frequency Distribution of Incomes of 550

Employees of a Company

Income (Rs) Number of Employees799.5899.5 50899.5999.5 100999.51099.5 2001099.51199.5 1501199.51299.5 401299.51399.5 10

Total 550

How should we get the frequencyfor each class?

In simple terms, frequency of anobservation means how many timesthat observation occurs in the rawdata. In our Table 3.1, we observe thatthe value 40 occurs thrice; 0 and 10occur only once; 49 occurs five timesand so on. Thus the frequency of 40is 3, 0 is 1, 10 is 1, 49 is 5 and so on.But when the data are grouped into

TABLE 3.6Tally Marking of Marks of 100 Students in Mathematics

Class Observations Tally Frequency ClassMark Mark

010 0 / 1 51020 10, 14, 17, 12, 14, 12, 14, 14 //// /// 8 152030 25, 25, 20, 22, 25, 28 //// / 6 253040 30, 37, 34, 39, 32, 30, 35, //// // 7 354050 47, 42, 49, 49, 45, 45, 47, 44, 40, 44, //// //// ////

49, 46, 41, 40, 43, 48, 48, 49, 49, 40, //// /41 21 45

5060 59, 51, 53, 56, 55, 57, 55, 51, 50, 56, //// //// ////59, 56, 59, 57, 59, 55, 56, 51, 55, 56, //// ///55, 50, 54 23 55

6070 60, 64, 62, 66, 69, 64, 64, 60, 66, 69, //// //// ////62, 61, 66, 60, 65, 62, 65, 66, 65 //// 19 65

7080 70, 75, 70, 76, 70, 71 ///// 6 758090 82, 82, 82, 80, 85 //// 5 8590100 90, 100, 90, 90 //// 4 95

Total 100


classes as in example 3, the ClassFrequency refers to the number ofvalues in a particular class. Thecounting of class frequency is done bytally marks against the particularclass.

Finding class frequency by tallymarking

A tally (/) is put against a class foreach student whose marks areincluded in that class. For example, ifthe marks obtained by a student are57, we put a tally (/) against class 5060. If the marks are 71, a tally is putagainst the class 7080. If someoneobtains 40 marks, a tally is putagainst the class 4050. Table 3.6shows the tally marking of marks of100 students in mathematics fromTable 3.1.

The counting of tally is made easierwhen four of them are put as ////and the fifth tally is placed acrossthem as . Tallies are then countedas groups of five. So if there are 16tallies in a class, we put them as

/ for the sake ofconvenience. Thus frequency in aclass is equal to the number of talliesagainst that class.

Loss of Information

The classification of data as afrequency distribution has aninherent shortcoming. While itsummarises the raw data making itconcise and comprehensible, it doesnot show the details that are found inraw data. There is a loss of information

in classifying raw data though muchis gained by summarising it as aclassified data. Once the data aregrouped into classes, an individualobservation has no significance infurther statistical calculations. InExample 4, the class 2030 contains6 observations: 25, 25, 20, 22, 25 and28. So when these data are groupedas a class 2030 in the frequencydistribution, the latter provides onlythe number of records in that class(i.e. frequency = 6) but not their actualvalues. All values in this class areassumed to be equal to the middlevalue of the class interval or classmark (i.e. 25). Further statisticalcalculations are based only on thevalues of class mark and not on thevalues of the observations in thatclass. This is true for other classes aswell. Thus the use of class markinstead of the actual values of theobservations in statistical methodsinvolves considerable loss ofinformation.

Frequency distribution withunequal classes

By now you are familiar withfrequency distributions of equal classintervals. You know how they areconstructed out of raw data. But insome cases frequency distributionswith unequal class intervals are moreappropriate. If you observe thefrequency distribution of Example 4,as in Table 3.6, you will notice thatmost of the observations areconcentrated in classes 4050, 5060and 6070. Their respective frequen-


cies are 21, 23 and 19. It means thatout of 100 observations, 63(21+23+19) observations areconcentrated in these classes. Theseclasses are densely populated withobservations. Thus, 63 percent of datalie between 40 and 70. The remaining37 percent of data are in classes010, 1020, 2030, 3040, 7080,8090 and 90100. These classes aresparsely populated with observations.Further you will also notice thatobservations in these classes deviatemore from their respective class marksthan in comparison to those in otherclasses. But if classes are to be formedin such a way that class markscoincide, as far as possible, to a valuearound which the observations in aclass tend to concentrate, then in thatcase unequal class interval is moreappropriate.

Table 3.7 shows the samefrequency distribution of Table 3.6 in

terms of unequal classes. Each of theclasses 4050, 5060 and 6070 aresplit into two classes. The class 4050 is divided into 4045 and 4550.The class 5060 is divided into 50 55and 5560. And class 6070 is dividedinto 6065 and 6570. The newclasses 4045, 4550, 5055, 5560,6065 and 6570 have class intervalof 5. The other classes: 010, 1020,2030, 3040, 7080, 8090 and 90100 retain their old class interval of10. The last column of this table showsthe new values of class marks forthese classes. Compare them with theold values of class marks in Table 3.6.Notice that the observations in theseclasses deviated more from their oldclass mark values than their new classmark values. Thus the new class markvalues are more representative of thedata in these classes than the oldvalues.

TABLE 3.7Frequency Distribution of Unequal Classes

Class Observations Frequency ClassMark

010 0 1 51020 10, 14, 17, 12, 14, 12, 14, 14 8 152030 25, 25, 20, 22, 25, 28 6 253040 30, 37, 34, 39, 32, 30, 35, 7 354045 42, 44, 40, 44, 41, 40, 43, 40, 41 9 42.54550 47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49 12 47.55055 51, 53, 51, 50, 51, 50, 54 7 52.55560 59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55,

56, 55, 56, 55 16 57.56065 60, 64, 62, 64, 64, 60, 62, 61, 60, 62, 10 62.56570 66, 69, 66, 69, 66, 65, 65, 66, 65 9 67.57080 70, 75, 70, 76, 70, 71 6 758090 82, 82, 82, 80, 85 5 8590100 90, 100, 90, 90 4 95

Total 100


Figure 3.2 shows the frequencycurve of the distribution in Table 3.7.The class marks of the table areplotted on X-axis and the frequenciesare plotted on Y-axis.

Fig. 3.2: Frequency Curve

Activity

If you compare Figure 3.2 withFigure 3.1, what do you observe?Do you find any differencebetween them? Can you explainthe difference?

Frequency array

So far we have discussed theclassification of data for a continuousvariable using the example ofpercentage marks of 100 students inmathematics. For a discrete variable,the classification of its data is knownas a Frequency Array. Since a discretevariable takes values and notintermediate fractional valuesbetween two integral values, we havefrequencies that correspond to eachof its integral values.

The example in Table 3.8illustrates a Frequency Array.

TABLE 3.8Frequency Array of the Size of Households

Size of the Number ofHousehold Households

1 52 153 254 355 106 57 38 2

Total 100

The variable size of thehousehold is a discrete variable thatonly takes integral values as shownin the table. Since it does not take anyfractional value between two adjacentintegral values, there are no classesin this frequency array. Since thereare no classes in a frequency arraythere would be no class intervals. Asthe classes are absent in a discretefrequency distribution, there is noclass mark as well.

6. BIVARIATE FREQUENCY DISTRIBUTIONThe frequency distribution of a singlevariable is called a UnivariateDistribution. The example 3.3 showsthe univariate distribution of thesingle variable marks of a student.A Bivariate Frequency Distribution isthe frequency distribution of twovariables.

Table 3.9 shows the frequencydistribution of two variable sales andadvertisement expenditure (in Rs.lakhs) of 20 companies. The values ofsales are classed in different columns


and the values of advertisementexpenditure are classed in differentrows. Each cell shows the frequencyof the corresponding row and columnvalues. For example, there are 3 firmswhose sales are between Rs 135145lakhs and their advertisementexpenditures are between Rs 6466thousands. The use of a bivariatedistribution would be taken up inChapter 8 on correlation.7. CONCLUSIONThe data collected from primary andsecondary sources are raw or

unclassified. Once the data iscollected, the next step is to classifythem for further statistical analysis.Classification brings order in thedata.The chapter enables you to know howdata can be classified through afrequency distribution in acomprehensive manner. Once youknow the techniques of classification,it will be easy for you to construct afrequency distribution, both forcontinuous and discrete variables.

Recap

Classification brings order to raw data. A Frequency Distribution shows how the different values of a variable

are distributed in different classes along with their correspondingclass frequencies.

The upper class limit is excluded but lower class limit is included inthe Exclusive Method.

Both the upper and the lower class limits are included in the InclusiveMethod.

In a Frequency Distribution, further statistical calculations are basedonly on the class mark values, instead of values of the observations.

The classes should be formed in such a way that the class markof each class comes as close as possible, to a value aroundwhich the observations in a class tend to concentrate.

TABLE 3.9Bivariate Frequency Distribution of Sales (in Lakh Rs) and Advertisement Expenditure

(in Thousand Rs) of 20 Firms

115125 125135 135145 145155 155165 165175 Total

6264 2 1 36466 1 3 46668 1 1 2 1 56870 2 2 47072 1 1 1 1 4

Total 4 5 6 3 1 1 20


EXERCISES

1. Which of the following alternatives is true?(i) The class midpoint is equal to:

(a) The average of the upper class limit and the lower class limit.(b) The product of upper class limit and the lower class limit.(c) The ratio of the upper class limit and the lower class limit.(d) None of the above.

(ii) The frequency distribution of two variables is known as(a) Univariate Distribution(b) Bivariate Distribution(c) Multivariate Distribution(d) None of the above

(iii) Statistical calculations in classified data are based on(a) the actual values of observations(b) the upper class limits(c) the lower class limits(d) the class midpoints

(iv) Under Exclusive method,(a) the upper class limit of a class is excluded in the class interval(b) the upper class limit of a class is included in the class interval(c) the lower class limit of a class is excluded in the class interval(d) the lower class limit of a class is included in the class interval

(v) Range is the(a) difference between the largest and the smallest observations(b) difference between the smallest and the largest observations(c) average of the largest and the smallest observations(d) ratio of the largest to the smallest observation

2. Can there be any advantage in classifying things? Explain with an examplefrom your daily life.

3. What is a variable? Distinguish between a discrete and a continuousvariable.

4. Explain the exclusive and inclusive methods used in classification ofdata.

5. Use the data in Table 3.2 that relate to monthly household expenditure(in Rs) on food of 50 households and(i) Obtain the range of monthly household expenditure on food.(ii) Divide the range into appropriate number of class intervals and obtain

the frequency distribution of expenditure.(iii) Find the number of households whose monthly expenditure on food is

(a) less than Rs 2000(b) more than Rs 3000


(c) between Rs 1500 and Rs 2500 6. In a city 45 families were surveyed for the number of domestic appliances

they used. Prepare a frequency array based on their replies as recordedbelow.

1 3 2 2 2 2 1 2 1 2 2 3 3 3 33 3 2 3 2 2 6 1 6 2 1 5 1 5 32 4 2 7 4 2 4 3 4 2 0 3 1 4 3

7. What is loss of information in classified data?

8. Do you agree that classified data is better than raw data?

9. Distinguish between univariate and bivariate frequency distribution.

10. Prepare a frequency distribution by inclusive method taking class intervalof 7 from the following data:

28 17 15 22 29 21 23 27 18 12 7 2 9 4 61 8 3 10 5 20 16 12 8 4 33 27 21 15 93 36 27 18 9 2 4 6 32 31 29 18 14 1315 11 9 7 1 5 37 32 28 26 24 20 19 2519 20

Suggested Activity

From your old mark-sheets find the marks that you obtained inmathematics in the previous classes. Arrange them year-wise. Checkwhether the marks you have secured in the subject is a variable ornot. Also see, if over the years, you have improved in mathematics.

Presentation of Data

1. INTRODUCTION

You have already learnt in previouschapters how data are collected andorganised. As data are generallyvoluminous, they need to be put in acompact and presentable form. Thischapter deals with presentation of dataprecisely so that the voluminous datacollected could be made usable readilyand are easily comprehended. There aregenerally three forms of presentation ofdata:

Textual or Descriptive presentation Tabular presentation Diagrammatic presentation.

2. TEXTUAL PRESENTATION OF DATA

In textual presentation, data aredescribed within the text. When thequantity of data is not too large this formof presentation is more suitable. Lookat the following cases:

Case 1

In a bandh call given on 08 September2005 protesting the hike in prices ofpetrol and diesel, 5 petrol pumps werefound open and 17 were closed whereas2 schools were closed and remaining 9schools were found open in a town ofBihar.

Studying this chapter shouldenable you to: present data using tables; represent data using appropriate

diagrams.

CHAPTER

PRESENTATION OF DATA 4 1

Case 2

Census of India 2001 reported thatIndian population had risen to 102crore of which only 49 crore werefemales against 53 crore males. 74 crorepeople resided in rural India and only28 crore lived in towns or cities. Whilethere were 62 crore non-workerpopulation against 40 crore workers inthe entire country, urban populationhad an even higher share of non-workers (19 crores) against the workers(9 crores) as compared to the ruralpopulation where there were 31 croreworkers out of a 74 crore population....

In both the cases data have beenpresented only in the text. A seriousdrawback of this method of presentationis that one has to go through t

Documents

NCERT Class 11 Statistics