Bribes EGovREPORT

Business Intelligence Using Data Mining Bribe Payments For Land Registrations

Submitted By: Hussain Boltwala 61210213

Karthik Vemparala 61210505

Naveen Kumar HS 61210144

Salman Siddiqui 61210626

Smita Chakravorty 61210558

BIDM – Bribing Behaviour for e-Governance Services

P a g e | 2

INTRODUCTION 3

PROBLEM STATEMENT 3

DATA PREPARATION AND VISUALIZATION 4

THE PREDICTION METHOD 15

CLASSIFICATION TREES 15

K- NEAREST NEIGHBOUR 16

NAÏVE BAYES 17

CONCLUSION & FURTHER ANALYSIS 18

P a g e | 3

Introduction

The project is based on the data collected over a period of time from the customers who have used e-

Governance services for the land registration process. This framework will be useful for intermediaries who

can target customers based on their demographic criteria. These intermediaries can charge a fee, that is

typically lesser than the bribe paid, and provide a convenient and fast service to people who are most

susceptible to pay bribes. This is similar to freelance notaries outside the court houses who charge a fee to

customers for guiding them through any legal process. The framework will also provide insights into

customer behaviour and the effectiveness of e-Governance initiatives.

This project also analyses the relationship between customers who paid bribes and the differentiating

factors like age, level of education, place etc that significantly contribute to payment of bribes. Our analysis

is based on Land Registration transactions carried out in Delhi, Haryana and Gujarat. Data was collected via

a hand written survey with people availing the survey being interviewed. This has resulted in a lot of

misclassified data and the group had endeavoured to clean and interpret as many data points to ensure a

robust model is obtained.

Problem Statement Predict whether a person availing the e-Governance Service will pay a bribe of over INR 100.

P a g e | 4

Data Preparation and Visualization In order to better understanding the key predictors for susceptibility to bribing behaviour, different metrics

were analysed whether the bribe was paid (categorical) and the amount of bribe paid (numerical). Some

insights are presented below:

1 Below Rs.500

2 Rs. 500-1000

3 Rs.1000-2999

4 Rs.3000-4999

5 Rs.5000-6999

6 Rs.7000-9999

7 More than Rs.10,000

The amount of bribe paid by people in higher income brackets (7000-9999 and more than 10,000) is higher in both

Delhi and Haryana.

Gujarat seemed to have the least amount of bribing culture, where Delhi and Haryana fared badly on most

markers. This could possibly indicate that affluent people are generally targeted by officials. This is also

depicted in the bar chart below which depicts the number of people who paid bribes (code -1, in pink) vs.

those who did not (code-2, in blue). Gujarat has the largest number of non-bribe payers.

P a g e | 5

From the above plot, we see that most people in Delhi and Haryana have paid bribes between Rs 100-200.

P a g e | 6

In Delhi, the number of people who did pay a bribe increased were the once who were more infrequent in availing

the services offered by the TCC. However, in Haryana, there is no information on the service availing frequency and

bribing pattern. The total amount of bribe paid also increases if the services are availed less frequently as seen from

the bar graph below.

1 Once in 3 Months

2 Once in 6 Months

3 Once in a Year

4 Less than once a year

5 Others

P a g e | 7

In Delhi, more number of people paid a bribe on their first trip, but this number decreases as the number of trips to

the TCC increased. Haryana doesn’t really follow any discernible pattern.

It may be that people who frequented the office at least once every 3 months and made more than 1 trip,

paid very little in bribes. This may indicate that people who have a high level of familiarity (and perhaps

have built relationships with officials) don’t pay too much to get their work done. Or they may simply not be

able / willing to pay a bribe and hence have to make more number of trips to avail the same services.

P a g e | 8

If we look at box plots of the age of an individual to see whether s/he has paid a bribe greater than Rs. 100,

we don’t see any discernible pattern.

But if we plot bribe amount and try and classify in different age brackets, we find that mostly elderly people

end up paying bribes less than Rs. 200

P a g e | 9

1 Illiterate

2 Literate without Education

3 Below Primary

4 Primary

5 Middle

6 Matric/Secondary

7 Higher Secondary/Intermediate

8 Non-Technical Diploma

9 Technical Diploma

10 Graduate & Above

11 Others

The median amount of bribe paid across education level remains between 100-150 with the only exception of the

individuals who are “literate without education”. The amount of bribe paid by this group is higher.

The above plot indicates that semi-urban areas generally paid much higher in bribes than either rural or

urban areas.

P a g e | 10

This came as no surprise that larger pockets of land attracted relatively higher amount of bribes.

P a g e | 11

Distance from the Land Registry office did not seem to play any significant role in the bribing patterns,

however wage loss did i.e. the higher the loss of wage, higher the bribe amount.

Whilst total cost of availing the service was seen as an important aspect, this was ultimately ruled out since

this included the total amount paid by the user, including the land registry charges.

P a g e | 12

Surprisingly, amount of bribes were closely tied to satisfaction levels, with Delhi and Haryana reporting the

most data. This could indicate that bribing is considered a part of any government transaction and it has no

bearing on the overall perceptions of satisfaction.

P a g e | 13

From a service provider’s perspective, the most amount of bribes given were under Rs. 100. This is not

considered the target market and only those people who would pay over Rs. 100 are being considered in

this study.

Also, most bribes were paid in order to expedite the process – thus it was logical to look at predictors that

would cause the individual to spend more time at the land registry office.

Total Bribes Paid

P a g e | 14

P a g e | 15

The Prediction Method

Classification Trees

Since there are a lot of variables, we decided to run a classification tree to find out what are the most

relevant predictor variables. Wage loss, service charges, wait time, total payment, age, level of education,

occupation, mode of travel, no. of trips made to the TCC, travel time, and reason for bribe payment (this is

largely to expedite the process).

Certain predictors above are not relevant for a prediction model. For example, reason for bribe payment will not

apply as it will not be available at the time of prediction. Also, a person who has already paid a bribe, may not want to

avail the services of an intermediary. However a person who might have tried to avail the services previously but had

a long wait time might be more inclined to use the services of a broker.

90 72.5

0260 1.5 5.5

0170 175

0 11.5

1 0Sub Tree beneath

travel_mode

serv_charge wait_time

total_paymen expedite_pro Occupation

serv_charge wage_loss expedite_pro

405 255

376 29 133 122

22 7 43 90 72 50

6 1 18 25 15 35

Full Tree

Pruned Tree

P a g e | 16

K- Nearest Neighbour

Running a K-NN with the above predictor variables, we get an error rate of 12% on the validation data and 11% on

the test data.

AgeLev_Educatio

nOccupation travel_mode no_of_trip travel_time w ait_time w age_loss serv_charge

expedite_pro

total_paymen

Variables

# Input Variables 11

Input variables

Output variable Bribe > 100

Training Data scoring - Summary Report (for k=1)

Actual Class 1 0

1 168 0

0 0 932

Class # Cases # Errors % Error

1 168 0 0.00

0 932 0 0.00

Overall 1100 0 0.00

Validation Data scoring - Summary Report (for k=1)

Actual Class 1 0

1 80 35

0 47 498

1 115 35 30.43

0 545 47 8.62

Overall 660 82 12.42

Test Data scoring - Summary Report (for k=1)

Actual Class 1 0

1 52 23

0 24 341

1 75 23 30.67

0 365 24 6.58

Overall 440 47 10.68

Error Report

Cut off Prob.Val. for Success (Updatable)

Classification Confusion Matrix

Predicted Class

Error Report

Predicted Class

Error Report

Actual Class 1 0

1 168 0

0 0 932

1 168 0 0.00

0 932 0 0.00

Overall 1100 0 0.00

Actual Class 1 0

1 80 35

0 47 498

1 115 35 30.43

0 545 47 8.62

Overall 660 82 12.42

Actual Class 1 0

1 52 23

0 24 341

1 75 23 30.67

0 365 24 6.58

Overall 440 47 10.68

Error Report

Predicted Class

Error Report

Predicted Class

Error Report

Actual Class 1 0

1 168 0

0 0 932

1 168 0 0.00

0 932 0 0.00

Overall 1100 0 0.00

Actual Class 1 0

1 80 35

0 47 498

1 115 35 30.43

0 545 47 8.62

Overall 660 82 12.42

Actual Class 1 0

1 52 23

0 24 341

1 75 23 30.67

0 365 24 6.58

Overall 440 47 10.68

Error Report

Predicted Class

Error Report

Predicted Class

Error Report

P a g e | 17

Naïve Bayes

The Naïve Bayes method resulted in a higher error rate of approx 16%, when compared to the KNN method.

AgeLev_Educatio

nOccupation travel_mode no_of_trip travel_time w ait_time w age_loss serv_charge

expedite_pro

total_paymen

Variables

# Input Variables 11

Input variables

Output variable Bribe > 100

Prior class probabilities

0.152727273

0.847272727

<-- Success Class

According to relative occurrences in training data

Training Data scoring - Summary Report

Actual Class 1 0

1 148 20

0 91 841

1 168 20 11.90

0 932 91 9.76

Overall 1100 111 10.09

Validation Data scoring - Summary Report

Actual Class 1 0

1 79 36

0 74 471

1 115 36 31.30

0 545 74 13.58

Overall 660 110 16.67

Test Data scoring - Summary Report

Actual Class 1 0

1 49 26

0 40 325

1 75 26 34.67

0 365 40 10.96

Overall 440 66 15.00

Error Report

Predicted Class

Error Report

Predicted Class

Error Report

Predicted Class

Actual Class 1 0

1 148 20

0 91 841

1 168 20 11.90

0 932 91 9.76

Overall 1100 111 10.09

Actual Class 1 0

1 79 36

0 74 471

1 115 36 31.30

0 545 74 13.58

Overall 660 110 16.67

Actual Class 1 0

1 49 26

0 40 325

1 75 26 34.67

0 365 40 10.96

Overall 440 66 15.00

Error Report

Predicted Class

Error Report

Predicted Class

Error Report

Predicted Class

Actual Class 1 0

1 148 20

0 91 841

1 168 20 11.90

0 932 91 9.76

Overall 1100 111 10.09

Actual Class 1 0

1 79 36

0 74 471

1 115 36 31.30

0 545 74 13.58

Overall 660 110 16.67

Actual Class 1 0

1 49 26

0 40 325

1 75 26 34.67

0 365 40 10.96

Overall 440 66 15.00

Error Report

Predicted Class

Error Report

Predicted Class

Error Report

Predicted Class

P a g e | 18

Conclusion & Further Analysis

When we started with the raw data, a tremendous amount of clean up and classification was needed to make the

data useable. We also had to define our goals clearly, keeping in mind the practicality and usefulness of the model we

were building.

Initially the idea was to estimate the amount of bribe a person would pay. A relatively small number of people had

reportedly paid bribes, many of which were very small amounts. Therefore it was more useful to classify records that

paid over a certain threshold – in our case, Rs. 100, and create a model based around this end goal, i.e. categorical ‘Y’

of ‘Bribe > 100’.

Records %

Initial Benchmark - # Paid Bribes 397 / 2200 18%

Initial Benchmark - # Paid > 100 358 / 2200 16.3%

The results of the prediction models are as follows:

Method Used Error Rate Accuracy Sensitivity Specificity

K-NN 12% 88% 69.6% 91.4%

Naïve Bayes 16% 84% 68.7% 86.4%

Therefore a drastic increase in accuracy was seen by applying the KNN and Naïve Bayes model. Obviously the KNN

method yielded better results than Naïve Bayes since KNN is not simply a majority vote.

The predictors of interest are as below, each of which could be estimated or determined by direct and indirect

probing by the service provider or already known to him (for e.g. Official Service Charge). The idea of this model is

that suitable prospects are approached by the service provider, who will find out the relevant information for each

parameter, mostly through a conversational strategy.

Predictor Method of Determination

Age Estimated / Indirectly determined from conversation

Level of Education Estimated / Indirectly determined from conversation

Occupation Directly queried from prospect

Mode of Travel Determined from conversation

No of trips Determined from conversation

Travel Time Determined from the ‘Mode of Travel’ query

Wait Time If first trip – communicated to prospect based on general wait times for the type of service required. If more than one trip – query prospect herself

Wage Loss Determined from ‘Occupation’ query

Official Service Charge Known to service provider – communicated to prospect

Desired Expediency Directly queried from prospect

Total Payment (charge) for services Known to service provider – communicated to prospect

Therefore using the above probes, a service provider should have great success in targeting prospective customers.

Bribes EGovREPORT

Documents

Criminal Indictment Against Ex-Cuomo Aide Relating to Bribes for Power Plant Project

Ornaments of the bribes - archive.org

Les Belles Bribes

Bribes, Gifts, and Corrupt Practices:

Threats, Bribes and the Power of Persuasion

Legalizing Bribes - World Banksiteresources.worldbank.org/EXTLACOFFICEOFCE/Resources/...Motivation Corruption remains an endemic problem in many countries Central elections’ theme

Bribes de notes ALORS… (2010-2011) JEAN OURY Séminaire de … · 2012-06-30 · Jean OURY Alors…/avril 2011 (6) Bribes de notes ALORS… (2010-2011) JEAN OURY Séminaire de Sainte-Anne

House of Bribes: How the United States led to a Nuclear Iran ......2 House of Bribes: How the United States Led the Way to a Nuclear Iran is a product of the New Coalition of Concerned

digests of cases and review releases relating to bribes to foreign

Artas Bartas: Baksheesh Confidential. Lessons learnt from tracking Petty Bribes

Bribes de Cours L'Expansion de l'Occident Pr Blog

Offsets: Performance, Compliance & Risk Management€¦ · suppliers as Indian Offset Partners ... – Anti-bribery provisions prohibit bribes ... • “The ‘VVIP’ Helicopter

Does Centralized Corruption Reduce Bribes? A Stochastic ...vcm-6003.vm.duke.edu/blog/wp-content/uploads/2019/10/Xing-Jiali_Econ... · If bribes are considered as a shadow cost, then

THIMUN LII Pre-conference Jan. 27th, 2020€¦ · learn that it is unaccept-able to cheat, pay bribes (or demand bribes), and ... voices of those in power. While those in power are

Fighting Corruption and the Use of Bribes in the …2 Fighting Corruption and the Use of Bribes in the Palestinian Territories: With or Without Social Capital? Luca Andriani * Department

Are Bribes the Only Way to Get Things Done? An Analysis of ... · Fall 2013 Are Bribes the Only Way to Get Things Done? An ... urgency of the issue; local initiatives and international

Dossier presse "Bribes de vie"

WHO BRIBES? Evidence from the United Nations' Oil-for-Food

Gifts, Bribes and Development in Post-Soviet Kazakstan Cynthia Werner Human Organization. Vol 59, No. 1

GIFTS OR BRIBES? ATTITUDES ON INFORMAL PAYMENTS IN