28
By Sayanto Tripathy Gayathri E.A Khizar 2014 Indian rape Convict Statistic Analysis [INDIAN RAPE CONVICT STATISTIC REPORT]

Rape convicts analysis

Embed Size (px)

Citation preview

By Sayanto Tripathy

Gayathri E.A

Khizar

2014

Indian rape Convict

Statistic Analysis

[INDIAN RAPE CONVICT STATISTIC REPORT]

Table of Contents

1. Introduction ................................................................................................................................3

2. Data Segmentation ......................................................................................................................3

3. Descriptive Analysis .....................................................................................................................5

North States Description .................................................................................................................6

South States Description .................................................................................................................7

East States Description ....................................................................................................................7

West State Description....................................................................................................................8

4. Scatter Diagram...........................................................................................................................9

5. Correlation ................................................................................................................................ 10

6. Box plot statistics....................................................................................................................... 12

7. Sampling Probability .................................................................................................................. 13

8. Confidence Interval.................................................................................................................... 16

9. Anova Test ................................................................................................................................ 19

10. Conclusion ................................................................................................................................ 23

11. References ................................................................................................................................ 24

12. Appendix................................................................................................................................... 25

1. Introduction

Rape in India is one of the most common and brutal crime against women. Rape case in India has doubled

from 1990 to 2007. While the rape case increased from 16,075 in 2001 to 24,500 in 2011 , the conviction

rate decreased from 40.8% to 24% .The annual rape rate in India has increased from 1.5 to 2.0 per

100,000 people over 2010-2012 periods which is quite low when compared to other countries like USA,

Morocco, Mexico and Bahrain. But still as the day passes, the question of women security is raised and

this needs to be solved. The statistical study which is conducted by us here in to analyze how Indian

States vary in the amount of rape convicts. As the day passes by, the rape cases in India are becoming

more intense and we are here analyzing geographical regions (North. South, East and West) of how they

differ from each other in rape convict statistics. The study of rape convicts pattern should be prioritized

region wise to understand the lack of security for women and how it can be strengthen. For some states

like Gujarat , Haryana the rape convict number increases gradually whereas for some states like

Meghalaya, Manipur and the north-eastern states the rape convicts number are pretty low. This unique

pattern of low numbers in North-East and high numbers in North & Central part of India needs to be

analyzed statistically and the problem should be analyzed to implicate the security and women safety

culture in those high rape convicted areas. We would also extend our study to understand how the rape

convict numbers for each region differs when compared to the past years..

Interestingly while studying the Rape-Convict Data region wise, we analyze that all India average is much

lower when compared to other countries described below with some impressive records shown by

Southern States. The complete study of Rape Convicts helps us to conclude the region which needs much

stringent action to control the rape cases and come out with much stricter action.

2. Data Segmentation

Indian being a vast state , we segment the sates geographically into North , South, East and West and see

how each of this areas vary from one another. We include the following states in our Segmentation as per

geographical location:-

North :- Chhattisgarh, Haryana, Himachal Pradesh, Jammu Kashmir, Madhya Pradesh, Punjab, Uttar Pradesh, Uttarakhand, Chandigarh, Delhi South: - Andhra Pradesh, Karnataka, Kerala, TamilNadu, Andaman&Nicobar Island, Lakshadweep, Pondicherry East: - Assam, Bihar, Jharkhand, Manipur, Meghalaya, Mizoram, Nagaland, Odhisa, Sikkim, Tripura, West Bengal West: - Goa, Gujarat, Maharashtra, Rajasthan, Dadra & Nagar, Daman Diu,

Segmentation of data helps us to analyze how each region behave and how they differ from each other

and which region needs utmost attention

3. Descriptive Analysis

From the data that we have with regards to the rape convicts for each state for the period of 2001-2010,

we try to calculate the Average and the Standard Deviation of all the states comprising in India. Over here

we look at the graph to understand how the number of rape convicts number where progressing from each

year.

Average rape Convicts per year (2001-2010): - 5021.1087 person

Standard Deviation: - 816.814

Table for Average Rape Convicts Each Year in India

Year Average(Person)

2001 114.73

2002 105.20

2003 143.11

2004 142.17

2005 156.79

2006 189.02

2007 151.79

2008 164.24

2009 156.02

2010 144.84

The figure above gives the alarming sign of how the rape convict numbers were increasing from the

period of 2001-2005. This alarming numbers signals the sudden action to be taken to enhance the

protective measures for women in India.

From this point we segment region wise and understand what North/South/East/West behaves when it

comes to alarming signs of rape convict number. We try to understand which region sounds safe and also

we try to figure out the states with high number of rape convicts.

North States Description

Let us try to look at the North states individual Average and total of rape convicts in the northern region.

The average rape Convicts of North States per year from the period of 2001-2010: - 2422.609

The below summary shows us how individual north region differs and Haryana having the maximum

followed by Chhattisgarh.

Table for North State Average (2001-2010)

0

20

40

60

80

100

120

140

160

180

200

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Average Rape Convicts

Average Rape Convicts

Groups Sum Average Variance

Chhattisgarh 3475 347.5 3690.944

Haryana 3766 376.6 10404.93

Himachal 1133 113.3 2099.344

Jammu Kashmir 202 20.2 158.4

MP 7488 748.8 14259.51

Punjab 2248 224.8 10520.84

UP 2890.087 289.0087 100367.5

Uttarakhand 427 42.7 671.7889

Chandigarh 112 11.2 37.73333

Delhi 2485 248.5 4579.611

South States Description

The average rape convicts of South India per year from the period of 2001-2010: - 417.5

Table for South State Average (2001-2010)

Groups Sum Average Variance

Andhra Pradesh 1728 172.8 658.8444444

Karnataka 577 57.7 371.5666667

Kerala 328 32.8 620.1777778

Tamil Nadu 1488 148.8 7087.511111

Andaman & Nicobar 34 3.4 2.044444444

Lakshadweep 2 0.2 0.4

Puduchery 18 1.8 5.733333333

In the South region, Andhra Pradesh stands to be most unsafe region for women with highest rape

convicts found there. However when compared to northern region, South States are pretty much safer for

women with lower average than North India.

East States Description

The average rape convicts found in East India from the period of 2001-2010 is 1303.7

Table for East State Average (2001-2010)

Groups Sum Average Variance

Assam 2835 283.5 6521.389

Arunachal 3 1.5 0.5

Bihar 1555 155.5 647.1667

Jharkhand 3465 346.5 14090.5

WB 2039 203.9 8825.656

Odhisa 1917 191.7 1346.456

Manipur 9 0.9 0.544444

Mizoram 224 22.4 118.0444

Sikkim 79 7.9 14.1

Tripura 781 78.1 565.6556

Nagaland 41 4.1 10.54444

Meghalaya 89 8.9 56.1

Jharkhand and Assam are the most unsafe place for women in the eastern region. Assam has for several

times also been in the media radar for its highly increasing rape cases during the past few years.

West State Description

The average of Western region states:-877.3

Table for East State Average (2001-2010)

Groups Sum Average Variance

Goa 140 14 8.444444

Gujarat 2881 288.1 31966.77

Maharashtra 2320 232 3208.889

Rajasthan 3430 343 2263.556

Daman 0 0 0

Dadra 2 0.2 0.177778

As the above table indicates Gujarat and Rajasthan have the highest rape convicts in the West Region.

With few states comprising the western geographical areas- Gujarat, Rajasthan and Maharashtra have the

most rape convicts which slightly increases the average of western India.

Analysis: - Analyzing the above data gives us a clear picture of Northern Region where the rape convicts

are highest and is the most unsafe region for women in India. This is followed by Eastern India. Southern

Part of India having highly low average sounds safe for Women in India. We also conclude that

Chhattisgarh, Haryana and Jharkhand are the most unsafe place for women and measures should be

taken to improve the condition of these states by spreading education.

4. Scatter Diagram

The scatter diagram helps us to find the trend of rape convicts in each region and understand if there is

any improvement in the safety of women in those regions. We analyze each of the geographical region to

understand if the condition is getting worse or its improving thereby understanding how safe are those

region from Women point of view.

Table showing how number of rape convicts as the year progresses from 2001-2010

0

1000

2000

3000

4000

2000 2002 2004 2006 2008 2010 2012

No

. o

f C

on

vict

s

Time Period

North India

Total

0

200

400

600

2000 2002 2004 2006 2008 2010 2012

No

. o

f C

on

vict

s

Time Period

South India

While initially the descriptive analysis of Rape Convicts in each region gave us a picture of Northern

states being most unsafe for Women, the scatter diagram gives a picture of things in West India looks

different. While the rape victim numbers for each state fell after the period of 2006, things were different

for West India. After the period of 2006, the number of rape convicts saw a sudden increase which

creates a sense of concern and needs to be studied to find the exact reason. While things are getting better

when compared to the previous year, South India shows incredible improvement in the safeties of women

with number of rape convicts decreasing and falling much below as the year passes on.

5. Correlation

This section helps us to correlate and understand how each region varies from one other. Over here we try

to observer if whenever there is an increase in number of rape victims in one of the region is there similar

0

500

1000

1500

2000

2000 2002 2004 2006 2008 2010 2012

No

. o

f C

ovi

cts

Time Period

East India

0

200

400

600

800

1000

1200

1400

2000 2002 2004 2006 2008 2010 2012

No

. o

f C

on

vict

s

Time Period

West India

increase in other region and how strongly are they bonded to each other. Correlation of 1 state that there is

a stronger relation between the variables and strong increase in one region follows the same trend in other

region .As the number decreases, the correlation between them decreases. Negative correlation states

whenever there is increase in one region, we observer a decrement in other region.

So to proceed with the correlation, we first simplify our data showing the total rape convicts in each

region for the period of 2001-2010.

North south east west

2001 1981 465 857 600

2002 1910 334 707 627

2003 2437 375 1184 870

2004 2377 365 1412 679

2005 2669 537 1302 823

2006 3370 553 1698 806

2007 2282.198 402 1487 990

2008 2414.426 408 1592 1170

2009 2351.722 366 1524 1219

2010 2433.71 370 1274 991

For the above table we find the correlation between North, South, East, and West:-

North south east west

North 1 south 0.684049 1

east 0.723066 0.304197 1 west 0.177788 -0.17227 0.655837 1

So the above correlation table shows how each region behaves when compared to other region. We see a

strong positive correlation between East and North indicating the same trend of changes being experience

over the period of time. The negative correlation between South and West indicates that the things are

opposite when being compared. The correlation of south region with other region is interesting as it

shows weak relation with other region and this helps us to understand how things were getting better for

the southern region in terms of rape cases. The correlation factor improves or strengthens our data that we

represented earlier and support the safety of women in Southern India.

6. Box plot statistics

North south east west

Upper whisker 2669.00 553.00 1698.00 1219.00

3rd quartile 2437.00 465.00 1524.00 991.00

Median 2395.71 388.50 1357.00 846.50

1st quartile 2282.20 366.00 1184.00 679.00

Lower whisker 2282.20 334.00 707.00 600.00

Nr. of data points 10.00 10.00 10.00 10.00

Mean 2422.61 417.50 1303.70 877.50

Box plots let us know the variations of the given data easily.

When you see from the regions point of view:

IQR of each region:

North – IQR = Q3-Q1 = 2437-2282.5=154.8

This determines the middle 50% of the total convicts in the northern region from 2001 – 2010.

South – IQR = Q3-Q1 = 465-366=99

This determines the middle 50% of the total convicts in the southern region from 2001 – 2010.

East – IQR = Q3-Q1 = 1524-1184=340

This determines the middle 50% of the total convicts in the eastern region from 2001 – 2010.

West – IQR = Q3-Q1 = 991-679=312

This determines the middle 50% of the total convicts in the western region from 2001 – 2010.

Comparing the box plots with other regions:

When comparing the convicts of north and south, the numbers of north convicts were tremendously more

(up to 2437) than the south regions (465).

Even though the variability of convicts in the west region is more when compared to the east region, the

numbers of convicts in the east region are focused between the 50% (between Q1 and Q3).

A huge number of convicts are closely found in the north and the south regions. As we can see the

convicts are closely appearing between the Q1 and Q3(50% of the total data set).

Looking at the 4 different regions of India, from this box plot derivation we can conclude that the

numbers of convicts in Northern regions are more compared to other 3 regions and the numbers of

convicts in Southern regions are less compared to other 3 regions.

Outliers:

Exceptions found in the box plots: Outliers are found when a huge or a smaller number is out of range

from the other values in the data set.

North: Three outliers are visible i.e. 1910, 1981 and 3370 for the year 2002, 2001 and 2006 respectively.

The reasons could be either a data entry error or an extreme period time where rape cases happened in the

northern region and were not controlled.

7. Sampling Probability

In this section we try to understand the likelihood of the sample falling within the population range and

understanding how likely the event can occur with the data given to us.

North States

Probability of mean of North States from 2001-2019 to be 242.2609 more than Population mean (146.816)

Average North = 242.2609

N(total number of states & Union Territory)= 35

n(sample of 10 states in north)= 10 population Standard Deviation(𝜎) = 181.276

Since n/N =10/35 = .28> .05 , we need to use population correction factor

Population Correction Factor(s) = (𝜎 ÷ √𝑛) √{(𝑁 − 𝑛) ÷ (𝑁 − 1)} S= 49.15

𝑧 =(242.26−146 .81)

49.15 = 1.94

Using the Z table from normal distribution, we calculate the area of probability.

P (north mean more than population mean = 47.38%)

The probability of getting mean of Northern states more than that of Population mean seems to be 47.38%

which states that its very likely that the state experiences high rape convicts and the data given in the

research is a very likely event thereby endangering the safety of women in these region.

South States

Probability of south states mean 59.64 to be lesser than population mean (146.816)

Average south = 59.64286

N(total number of states & Union Territory)= 35

n(sample of states in south)= 7 population Standard Deviation(𝜎) = 181.276

Since n/N =7/35 = .20> .05 , we need to use population correction factor

Population Correction Factor(s) = (𝜎 ÷ √𝑛) √{(𝑁 − 𝑛) ÷ (𝑁 − 1)} S= 62.17

𝑧 =(59.64−146 .81)

62.17 = -1.40

P (south states mean less than population mean = 41.40%)

The probability of getting mean of Southern states less than that of Population mean seems to be 41.40%

which makes it a very likely event to occur and strengthen the data that supports the safety of the women.

East States

Probability of East states mean 116.4018 to be lesser than population mean (146.816)

Average east = 116.4018

N(total number of states & Union Territory)= 35 n(sample of states in south)= 12

population Standard Deviation(𝜎) = 181.276

Since n/N =12/35 = .34> .05 , we need to use population correction factor

Population Correction Factor(s) = (𝜎 ÷ √𝑛) √{(𝑁 − 𝑛) ÷ (𝑁 − 1)}

S= 43.03

𝑧 =(116.4018 −146 .81)

43.03= -.70

P(east states mean less than population mean = 25.80%)

The East states table states that there is 25.8% chance for the Eastern states mean to be lower than the

population mean.

West States

Probability of west states mean lesser than population mean

Probability of West states mean 146.2167 to be lesser than population mean (146.816)

Average west = 146.2167

N(total number of states & Union Territory)= 35 n(sample of states in south)= 6

population Standard Deviation(𝜎) = 181.276

Since n/N =6/35 = .17> .05 , we need to use population correction factor

Population Correction Factor(s) = (𝜎 ÷ √𝑛) √{(𝑁 − 𝑛) ÷ (𝑁 − 1)}

S= 68.34

𝑧 =(146.2167 −146 .81)

68.34 = -.0086

P (west states mean less than population mean = .2%)

Conclusion

As we see above that the data supports the finding of Northern states being more unsafe with 47% of rape

convicts being found within the population mean and hence strengthen the facts of steps/measures to be

taken to enhance protective measures for the state.

8. Confidence Interval

In this section we try to determine an estimated range of values which is likely to include our sample

parameters. Confidence Intervals are usually calculated for 90%, 95%, 99%. In our paper, we would try to

evaluate the intervals for 99% to be more specific. The width of the confidence interval tells us how

uncertain we are about the unknown parameter.

North States:

BASIC STATISTICS

N Mean Standard deviation

10 2422.61 400.45

CONFIDENCE INTERVALS FOR THE North Sate MEAN (With normal distribution) Mean = 2422.6056 Standard Deviation = 400.452857 N = 10 Z (0.005) = 2.575831

(Left Interval) = 2422.6056 − 2.5758 ×400.452

√10

= 2422.6056 - 326.18863 = 2096.41697

(Right Interval) = 2422.6056 + 2.5758 ×400.452

√10

= 2422.6056 + 326.18863 = 2748.79423 99% Confidence Interval: (2096.41697, 2748.79423) The above Confidence interval states that we are 99% sure that the mean of the North States which is 2422.61 lies within the range of 2096.4169 and 2748.79423.

South States BASIC STATISTICS

N Mean Standard Deviation

10 417.5 75.77

CONFIDENCE INTERVALS FOR THE South State MEAN (With normal distribution) Mean = 417.5 Standard Deviation = 75.770487 N = 10 Z (0.005) = 2.575831

(Left Interval) = 417.5 − 2.5758 ×475.77048

√10

= 417.5 - 61.718804 = 355.781196

(Right Interval) = 417.5 + 2.5758 ×475.77048

√10

= 417.5 + 61.718804 = 479.218804

99% Confidence Interval: (355.781196, 479.218804) The above Confidence interval states that we are 99% sure that the mean of the South States which is 417.5 lies within the range of 355.78 and 479.21

East States BASIC STATISTICS

N Mean SD

10 1303.7 316.8 CONFIDENCE INTERVALS FOR THE East state MEAN (With normal distribution) Mean = 1303.7 Standard Deviation = 316.798937 N = 10 Z (0.005) = 2.575831

(Left Interval) = 1303.7 − 2.5758 ×316.708

√10

= 1303.7 - 258.04838 = 1045.65162

(Right Interval) = 1303.7 + 2.5758 ×316.708

√10

= 1303.7 + 258.04838 = 1561.74838 99% Confidence Interval: (1045.65162, 1561.74838) The above Confidence interval states that we are 99% sure that the mean of the East States which is 1303.7 lies within the range of 1045.651 and 1561.74. The lower range of this confidence interval strengthens the statistic of Eastern region.

West States BASIC STATISTICS

N Mean SD

10 877.5 214.74

CONFIDENCE INTERVALS FOR THE West State MEAN (With normal distribution)

Mean = 877.5 Standard Deviation = 214.73873 N = 10 Z (0.005) = 2.575831

(Left Interval) = 877.5 − 2.5758 ×214.73873

√10

= 877.5 - 174.915301 = 702.584699

(Right Interval) =877.5 + 2.5758 ×214.73873

√10

= 877.5 + 174.915301 = 1052.415301 99% Confidence Interval: (702.584699, 1052.415301) The above Confidence interval states that we are 99% sure that the mean of the West States which is 877.5 lies within the range of 702.58 and1052.41.

9. Anova Test

Anova Test helps us to determine regionally if the mean of the states with in each region differs from

each other. This test helps us to understand if there is any difference in the mean of the states in the each

region. If the difference is found, calculating which mean is different is beyond the scope of this project.

North States Anova Test

H0 = All sample mean are equal

H1= At least one mean in the Northern States are different.

ANOVA Source of

Variation SS df MS F P-value F critical

Between

Groups 4473889 9 497098.8 33.86449

3.27993E-

25 1.985595

Within Groups 1321115 90 14679.06 Total 5795005 99

95% CONFIDENCE INTERVAL DIAGRAM

+------------------------+------------------------+ (-*--) Chhattisgarh (---*--) Haryana (-*) Himachal Pradesh (* Jammu & Kashmir (----*---) Madhya Pradesh (---*---) Punjab (----------*-----------) Uttar Pradesh (*) Uttarakhand *) Chandigarh (-*--) Delhi +------------------------+------------------------+ -75.94 420.51 916.96

As in the above we see in the Anova Table F lies outside the Fc, therefore we reject the Null Hypothesis

which states that the mean of the North states are equal. There is at least one mean which differs from the

rest of the north states means

South States Anova Test

H0 = All sample mean are equal

H1= At least one mean in the South States are different.

ANOVA

Source of

Variation SS df MS F

P-

value F crit

Between Groups 315203.6 6 52533.93 42.04502868

3.31E-20 2.246408

Within Groups 78716.5 63 1249.468 Total 393920.1 69

+------------------------+------------------------+ (---*--) Andhra Pradesh (--*-) Karnataka (---*--) Kerala (-----------*-----------) Tamil Nadu * Andaman & Nicobar Isla * Lakshadweep (* Pondicherry +------------------------+------------------------+ -21.18 104.39 229.95

As in the above we see F lies outside the Fc, therefore we reject the Null Hypothesis which states that the

mean of the South states are equal. There is at least one mean which differs from the rest of the south

states.

East States Anova Test

H0 = All sample mean are equal

H1= At least one mean in the East States are different.

ANOVA Source of

Variation SS df MS F P-value F crit

Between

Groups 1579469 11 143588.1 49.55314

1.69E-

35 1.885687

Within Groups 289765.9 100 2897.659

Total 1869235 111

+------------------------+------------------------+ (-----*----) Assam (* Arunachal Pradesh (*-) Bihar (-------*-------) Jharkhand * Manipur *) Meghalaya (*) Mizoram * Nagaland (--*-) Orissa *) Sikkim (-*-) Tripura (-----*------) West Bengal +------------------------+------------------------+ -48.48 213.28 475.04

As in the above we see F lies outside the Fc, therefore we reject the Null Hypothesis which states that the

mean of the East states are equal. There is at least one mean which differs from the rest of the east states.

West States Anova Test

H0 = All sample mean are equal

H1= At least one mean in the West States are different.

ANOVA Source of

Variation SS df MS F P-value F crit

Between Groups 1263948 5 252789.5 40.50267 4.3E-17 2.38607

Within Groups 337030.5 54 6241.306

Total 1600978 59

+------------------------+------------------------+ (* Goa (------------*------------) Gujarat (---*---) Maharashtra (---*--) Rajasthan * Dadra & Nagar Haveli * Damman & Diu +------------------------+------------------------+ -41.71 207.95 457.61

As in the above we see F lies outside the Fc, therefore we reject the Null Hypothesis that states the mean

of the West states are equal. There is at least one mean which differs from the rest of the west states.

10. Conclusion

The above statistical figures give us the clear picture of how each region differs in the number of rape

convicts found throughout the year of 2001-2010. We go through series of test to understand the trend

that follows in each region and to understand how safe are those region for Women. However the number

rape convicts does not states that number of rape that takes place in India. Those figures of number of

rape may be way higher than the rape convicts. The statistical figure does not only include Indian States

but at the same time includes the Union Territories under the Indian Government regime. The test of

Anova helps us to conclude that no states in each region have same mean within the given variance of

theirs. While the sampling distribution tests strengthen our statistical data of stating the likelihood of the

sample falling under range of population. Looking at each test we conclude on the fact that Northern

region sounds more unsafe to the women of India with highest convicts per year. South India sounds

much safer in the comparison graph. Also we draw the strange graph of Western India which shows how

slowly the graph rises from the period of 2006 while other states aimed to show certain improvement in

their graph .Keeping all this in mind, we conclusively conclude that while Northern states need more sign

of concerns, each region must also not be neglected and sign f concerns lies there as well to improve the

rape convict numbers.

11. References

1. http://news.harvard.edu/gazette/story/2013/09/understanding-indias-rape-crisis/

2. http://www.indialawjournal.com/volume2/issue_2/article_by_priyanka.html

3. http://citation.allacademic.com//meta/p_mla_apa_research_citation/2/4/2/6/4/pages242646/p2426

46-2.php

4. http://www.cjsonline.ca/pdf/racethsex.pdf

5. http://jcc.sagepub.com/content/12/3/284.short

6. http://onlinelibrary.wiley.com/doi/10.1111/j.1540-4560.1981.tb01068.x/abstract

7. http://ethics.journalism.wisc.edu/2013/03/19/covering-rape-the-changing-nature-of-society-and-

indian-journalism/

12. Appendix

Formulas Used:-

Variance (𝜎2 ) =∑ (𝑥𝑖−𝜇)𝑖−𝑁

𝑖−1

𝑁 where 𝜇 = 𝑀𝑒𝑎𝑛 𝑎𝑛𝑑 𝑁 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠

Standard Deviation (𝜎)= √𝜎2

Z value for sampling probability:- 𝑥−𝜇

𝜎𝑥√𝑛⁄

where x= Sample Mean , 𝜇 is population mean

Population Correction Factor to be used in Z formula above if the n/N > .05 (𝜎𝑥 ):- (𝜎

𝑁) x

√𝑁−𝑛

𝑁−1 where N is population number and n is sample number

Confidence Interval :- 𝜇 + 𝑧 ×𝜎

√𝑛 and 𝜇 − 𝑧 ×

𝜎

√𝑛

The Anova table below shows the formula of calculating the F (Critical value). The below table shows for both simple and regression mean, however in our paper we have used the 1st table for comparing the means.

Formulas used for Box plot.

Mean, Standard deviation, Quartile range and the interquartile range. Quartile range: These tell us the position of the values for Q1, Q2 and Q3.

Q1 = 𝑛

4𝑡ℎ

Q3 = 3𝑛

4𝑡ℎ

Q2 = Median = 𝑛

2𝑡ℎ

Inter-quartile range: It tells us how much data falls between Q3 and Q1 (50% data).

IQR = Q3 – Q1

To find the outliers: These outliers will be beyond the whiskers of a box plot.

Low values = Q1-1.5(IQR) and High values = Q3+1.5(IQR)

The data used in our calculation –

SOUTH States 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Andhra Pradesh 166 186 191 199 160 160 216 170 153 127

Karnataka 45 29 32 46 74 66 61 67 67 90

Kerala 28 15 40 0 0 49 44 81 48 23

Tamil Nadu 226 101 109 115 300 273 73 84 88 119

Andaman&Nicobar

Island

0 3 3 4 3 4 3 5 4 5

Lakshadweep 0 0 0 0 0 0 2 0 0 0

Pondicherry 0 0 0 1 0 1 3 1 6 6

NORTH

States

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Chhattisgarh 344 339 427 337 290 307 276 357 324 474

Haryana 207 197 347 400 394 412 412 438 438 521

Himachal

Pradesh

36 65 96 90 92 122 187 147 156 142

Jammu &

Kashmir

11 9 15 14 16 27 18 53 21 18

Madhya

Pradesh

870 697 756 797 747 879 780 797 711 454

Punjab 70 95 169 190 168 365 252 277 313 349

Uttar

Pradesh

249 277 371 371 711 905 1198 1426 1722 1741

Uttarakhand 8 17 42 4 38 62 74 47 62 73

Chandigarh 6 7 9 4 5 13 16 23 17 12

Delhi 180 207 205 170 208 278 266 274 308 309

EAST

States

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Arunachal 2 1

Pradesh

Assam 139 176 222 298 315 343 359 371 347 256

Bihar 147 118 174 159 161 206 172 154 136 128

Jharkhand 241 146 403 435 392 571 310 392 316 259

Manipur 0 2 1 1 1 1 1 2 0 0

Meghalaya 1 2 1 5 5 15 18 6 20 16

Mizoram 13 14 16 23 14 22 18 25 30 49

Nagaland 2 0 2 2 2 4 7 5 6 11

Odhissa 161 123 151 188 198 227 233 230 209 197

Sikkim 12 7 0 11 13 9 7 7 8 5

Tripura 40 50 57 69 89 87 118 93 95 83

West

bengal

99 68 157 221 112 213 244 307 357 261

WEST States 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Goa 14 11 11 16 18 16 11 11 14 18

Gujarat 89 114 155 162 178 230 415 525 484 529

Maharashtra 214 184 344 207 224 216 240 249 299 143

Rajasthan 282 318 360 294 403 344 324 384 420 301

Dadra &

Nagar Haveli

1 0 0 0 0 0 0 1 0 0

Daman 0 0 0 0 0 0 0 0 0 0